? ? Pending

User tests: Successful: Unsuccessful:

avatar Hackwar
Hackwar
17 Apr 2018

Finder has an artificial sharding, which splits up the links between terms and content items onto 16 seperate tables. This PR removes that sharding. It makes the code extremely complex and bloats it unnecessarily and it also slows down the whole finder system. Indexing the original sample data of Joomla 4.0 with the current finder indexer takes about 2:40 minutes on my system. With these changes, we are already down to 1:30 minute.

This PR also fixes a few other bugs that prevented com_finder from working in Joomla 4.0 right now and I standardised all the language fields to be 7 characters long. While I plan to provide more PRs with further improvements to make com_finder more usable, this PR is in itself complete and can be merged like this.

How to test

Install 4.0-dev and go to the backend of Smart Search. Enable the Smart Search Plugin and start the indexing process. Notice that it takes a long time (preferably measure the time with a stopwatch). Go to the frontend and open the Smart Search menu item. See that it throws errors and doesn't work right now.
Apply this PR and clear the index in the backend. Start the re-indexing and notice that it finishes quicker. Go to the frontend and see that that menu item no longer throws errors and that all features work like expected.

When typing in a phrase and then pressing enter, sending the request to the server, a message "error" is displayed before the page is reloaded both before and after applying the patch. I'm still looking for the source of that issue.

I want to thank cloudaccess.net for supporting these changes. Without their generous help, this wouldn't be possible.

avatar Hackwar Hackwar - open - 17 Apr 2018
avatar Hackwar Hackwar - change - 17 Apr 2018
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 17 Apr 2018
Category SQL Administration com_admin com_finder Front End Installation Postgresql
avatar brianteeman
brianteeman - comment - 17 Apr 2018

As there is administrator/components/com_admin/sql/updates/mysql/4.0.0-2018-04-14.sql
I assume that there should also be an equivalent postgres file?

avatar Hackwar Hackwar - change - 17 Apr 2018
Labels Added: ?
avatar Hackwar
Hackwar - comment - 17 Apr 2018

I would need help with the postgres stuff. I don't have postgres on my system and would need someone to provide such a file and test that part.

avatar Hackwar
Hackwar - comment - 17 Apr 2018

@alikon Could you help me with adding the right migration script for postgres?

avatar alikon
alikon - comment - 18 Apr 2018

yes i'll do it

avatar Hackwar
Hackwar - comment - 18 Apr 2018

Thank you!

avatar infograf768
infograf768 - comment - 18 Apr 2018

Will test. Please let me know, apart from the indexing speed (and obvious code corrections), what I can expect, as a multilingual user, from this PR?

avatar Hackwar
Hackwar - comment - 18 Apr 2018

At this moment, there is no change in terms of multilingual users. In order to keep these changes testable, I'm restricting my changes to one thing at a time. So this only removes the sharding and gets it all working again in 4.0. The language changes are coming in the next few steps.

avatar alikon
alikon - comment - 21 Apr 2018

made a pr Hackwar#22
unfortunately without this one #19707 merged
unable to test/work

avatar Hackwar
Hackwar - comment - 21 Apr 2018

Merged

avatar brianteeman
brianteeman - comment - 9 May 2018

I was unable to replicate the error that you mentioned before applying this PR

After applying the PR the indexer did not work at all

avatar brianteeman
brianteeman - comment - 9 May 2018

I have tested this item ? unsuccessfully on 7716cd6


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/20185.

avatar brianteeman brianteeman - test_item - 9 May 2018 - Tested unsuccessfully
avatar Hackwar
Hackwar - comment - 11 May 2018

The current 4.0-dev branch fails in displaying the frontend when opening the smart search component and having debug mode enabled with the following error message:
1055 'j4search2.t.parent_id' isn't in GROUP BY
This has been fixed in this PR.

I have updated the branch with the latest changes from 4.0-dev and checked it again. It does work. Could it be that you did not apply the database changes, that this introduces? I would be happy if you could test this again.

avatar Hackwar
Hackwar - comment - 20 May 2018

@brianteeman Did you see my last message? Could you check your test again, especially with the database changes?

avatar brianteeman
brianteeman - comment - 20 May 2018

Sorry with JAB and JDFR it slipped my radar - thanks for the reminder I will put it on the todo for tomorrow

avatar Hackwar
Hackwar - comment - 21 May 2018

Just to list all current PRs for Finder/Smart Search:

  • #20185 [4.0] Removing artificial sharding from com_finder
  • #20384 [4.0] Removing terms tuples from com_finder
  • #20391 [4.0] Using language specific tokeniser and stemmer for com_finder
avatar brianteeman
brianteeman - comment - 23 May 2018

@Hackwar no change from my previous report

avatar Hackwar
Hackwar - comment - 23 May 2018

Can you see what the ajax call from the indexer returns as error?

avatar brianteeman
brianteeman - comment - 23 May 2018
core.min.js?bcb349ab2b37648abc4c1018be364433:1 GET http://127.0.0.1/cms4/administrator/index.php?option=com_finder&tmpl=component&format=json&task=indexer.batch&ecfbe9f00cc7263cb121fc56d4885d68=1 500 (Internal Server Error)
e.request @ core.min.js?bcb349ab2b37648abc4c1018be364433:1
i @ indexer.min.js?bcb349ab2b37648abc4c1018be364433:1
l @ indexer.min.js?bcb349ab2b37648abc4c1018be364433:1
onSuccess @ indexer.min.js?bcb349ab2b37648abc4c1018be364433:1
n.onreadystatechange @ core.min.js?bcb349ab2b37648abc4c1018be364433:1
XMLHttpRequest.send (async)
e.request @ core.min.js?bcb349ab2b37648abc4c1018be364433:1
i @ indexer.min.js?bcb349ab2b37648abc4c1018be364433:1
e.finderIndexer @ indexer.min.js?bcb349ab2b37648abc4c1018be364433:1
(anonymous) @ indexer.min.js?bcb349ab2b37648abc4c1018be364433:1
indexer.min.js?bcb349ab2b37648abc4c1018be364433:1 Uncaught ReferenceError: json is not defined
    at m (indexer.min.js?bcb349ab2b37648abc4c1018be364433:1)
    at onError (indexer.min.js?bcb349ab2b37648abc4c1018be364433:1)
    at XMLHttpRequest.n.onreadystatechange (core.min.js?bcb349ab2b37648abc4c1018be364433:1)
avatar Hackwar
Hackwar - comment - 23 May 2018

Sorry to bother you, but can you also see what the specific response is from the ajax call? So far we only know that it throws a 500 error... In best case, enable debugging first.

avatar brianteeman
brianteeman - comment - 23 May 2018
core.js?35800ffe54a73a00bb236e36e4b1839f:890 GET http://127.0.0.1/cms4/administrator/index.php?option=com_finder&tmpl=component&format=json&task=indexer.batch&ecfbe9f00cc7263cb121fc56d4885d68=1 500 (Internal Server Error)
Joomla.request @ core.js?35800ffe54a73a00bb236e36e4b1839f:890
getRequest @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:29
handleResponse @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:68
onSuccess @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:36
xhr.onreadystatechange @ core.js?35800ffe54a73a00bb236e36e4b1839f:876
XMLHttpRequest.send (async)
Joomla.request @ core.js?35800ffe54a73a00bb236e36e4b1839f:890
getRequest @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:29
initialize @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:25
Joomla.finderIndexer @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:164
(anonymous) @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:169
indexer.js?35800ffe54a73a00bb236e36e4b1839f:107 Uncaught ReferenceError: json is not defined
    at handleFailure (indexer.js?35800ffe54a73a00bb236e36e4b1839f:107)
    at onError (indexer.js?35800ffe54a73a00bb236e36e4b1839f:39)
    at XMLHttpRequest.xhr.onreadystatechange (core.js?35800ffe54a73a00bb236e36e4b1839f:879)
handleFailure @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:107
onError @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:39
xhr.onreadystatechange @ core.js?35800ffe54a73a00bb236e36e4b1839f:879
XMLHttpRequest.send (async)
Joomla.request @ core.js?35800ffe54a73a00bb236e36e4b1839f:890
getRequest @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:29
handleResponse @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:68
onSuccess @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:36
xhr.onreadystatechange @ core.js?35800ffe54a73a00bb236e36e4b1839f:876
XMLHttpRequest.send (async)
Joomla.request @ core.js?35800ffe54a73a00bb236e36e4b1839f:890
getRequest @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:29
initialize @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:25
Joomla.finderIndexer @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:164
(anonymous) @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:169

avatar Hackwar
Hackwar - comment - 23 May 2018

That is still the JS trace, not the HTML error that Joomla is returning.

avatar infograf768
infograf768 - comment - 24 May 2018

Tested here on a clean patched 4.0-dev on Firefox
I could index default sample data without any error.

Search via the module in frontend does work fine although I have a small js error
TypeError: t is null[Learn More] finder.min.js:1:985

This error is only present when searching from the module, i.e. once the the results are displayed and one searches from the search field displayed on top of the results, the error is not thrown.

avatar infograf768
infograf768 - comment - 24 May 2018

After enabling debug, I get a different js error
TypeError: advanced is null[Learn More] finder.js:62:1

avatar Hackwar
Hackwar - comment - 24 May 2018

I did not change anything in the output or in the JS. Those errors have been present before and are stuff for yet another PR.

avatar carlitorweb
carlitorweb - comment - 24 May 2018

I have tested this item ? unsuccessfully on aec1df6

Start the indexing process. Notice that it takes a long time. Go to the frontend and open the Smart Search menu item. See that it throws errors and doesn't work right now.

I confirm this

Apply this PR and clear the index in the backend. Start the re-indexing and notice that it finishes quicker. Go to the frontend and see that that menu item no longer throws errors and that all features work like expected.

After applying the PR the indexer did not work at all


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/20185.

avatar carlitorweb carlitorweb - test_item - 24 May 2018 - Tested unsuccessfully
avatar Hackwar
Hackwar - comment - 24 May 2018

Please make sure that the db changes are applied.Am 24.05.2018 22:22 schrieb Carlos Rodriguez notifications@github.com:I have tested this item ? unsuccessfully on aec1df6> start the indexing process. Notice that it takes a long time. Go to the frontend and open the Smart Search menu item. See that it throws errors and doesn't work right now.
I confirm this

Apply this PR and clear the index in the backend. Start the re-indexing and notice that it finishes quicker. Go to the frontend and see that that menu item no longer throws errors and that all features work like expected.

After applying the PR the indexer did not work at allThis comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/20185.

—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.

avatar carlitorweb
carlitorweb - comment - 24 May 2018

I have tested this item ✅ successfully on aec1df6

Please make sure that the db changes are applied

Sorry @Hackwar I not saw this

Apply this PR and clear the index in the backend. Start the re-indexing and notice that it finishes quicker. Go to the frontend and see that that menu item no longer throws errors and that all features work like expected.

Now work, as described


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/20185.

avatar carlitorweb carlitorweb - test_item - 24 May 2018 - Tested successfully
avatar carlitorweb
carlitorweb - comment - 24 May 2018

Will be good if we can have this merged, for can test others com_finder PR you did.

avatar Hackwar
Hackwar - comment - 25 May 2018

@brianteeman can you again make sure that the DB changes have been applied and test this again?

@infograf768 is your test a successfull test or do you consider that as an error?

If you could mark/update your tests, we could merge this. ?

avatar infograf768
infograf768 - comment - 25 May 2018

Before setting this to OK, I need the advice of @chrisdavenport as this does not only solves a bug but also changes the database structure of com_finder.

avatar brianteeman
brianteeman - comment - 25 May 2018

I have tested this item ✅ successfully on aec1df6


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/20185.

avatar brianteeman brianteeman - test_item - 25 May 2018 - Tested successfully
avatar franz-wohlkoenig franz-wohlkoenig - change - 25 May 2018
Status Pending Ready to Commit
avatar franz-wohlkoenig
franz-wohlkoenig - comment - 25 May 2018

Ready to Commit after two successful tests.

avatar carlitorweb
carlitorweb - comment - 25 May 2018

@wilsonge @laoneo can we have this merged? I will like test others com_finder PR, Which depend on this

avatar chrisdavenport
chrisdavenport - comment - 25 May 2018

@infograf768 I haven't tested this PR, but I agree in principle with the intention. I ran some tests a while back that convinced me that there should be a performance improvement in most cases.

Watch out for older versions of the database having columns with default values that are no longer valid, as this could cause the ALTER TABLE statements to fail.

avatar Hackwar
Hackwar - comment - 26 May 2018

I don't think that the old default values are an issue. I've proposed quite a lot of changes, which in the end requires rebuilding the index anyway. When we are done with all changes, I would consolidate all SQL changes into one file and prepend that with a few TRUNCATEs to clear the index.

avatar laoneo
laoneo - comment - 28 May 2018

I couldn't really figure out what would be the consequences of this pr. What will the then not work as before when this pr get merged?

avatar Hackwar
Hackwar - comment - 28 May 2018

Everything works as before. We only use 15 db tables less. It speeds up indexing and makes the code a lot easier to read.

avatar Hackwar
Hackwar - comment - 3 Jun 2018

Should I now revert the router changes again or can this be merged the way it is?

avatar laoneo
laoneo - comment - 3 Jun 2018

I just can't believe that removing 16 tables has not a single bit of consequences. Something must not work as before. As long as we do not know that, I don't suggest that we merge this pr.

avatar brianteeman
brianteeman - comment - 3 Jun 2018

@laoneo all that is happening is that an artificial separation of the data into 16 tables no longer happens and the data is just in a single table

avatar laoneo
laoneo - comment - 3 Jun 2018

Why was that done in that way and now we can move it to one table. Don't get me wrong I'm not against it, just want to make sure we do not miss anything.

avatar Hackwar
Hackwar - comment - 3 Jun 2018

According to a discussion that I had with Louis Landry half a millenia ago, that splitting was meant to improve performance. According to him, inserting all those tens of thousands of rows into a single table would slow down the indexing process a lot, since the DB at some point has to catch up with all the inserts. Splitting this up into these 16 tables (16 because it allowed for easy calculations with hex values) was supposed to prevent that, because now the insertions would count for each table separately, thus increasing the throughput by the factor of 16.

I don't want to judge if that actually was true at the time that finder was developed, but according to my tests and a few people that I talked to in the SQL IRC channel, that is not the case. Having these split tables actually is a big performance hit when indexing and requires up to 16 queries to get all search results from our system. This is a useless feature that no one needs, that makes our system slower, the code unmaintainable and which has to go as soon as possible.

If you have a website that is so large that you really have issues here, then you can have a look at the built in sharding feature of MySQL. But I would honestly instead expect you to use a standalone search system like Elastic or Solr or similar.

avatar laoneo
laoneo - comment - 3 Jun 2018

Thanks for explanation. That was the information I was looking for.

avatar Hackwar Hackwar - change - 3 Jun 2018
Labels Added: ?
avatar chrisdavenport
chrisdavenport - comment - 3 Jun 2018

For the record, I had similar conversations with Louis and Rob many years ago and I seem to recall them saying that the decision to shard was based on real-world experience at the time. Consequently, when Hannes originally suggested that sharding be removed I ran some tests of my own. I modified the code to allow a variable number of shards (actually only a power of 2). In all cases, averaging over many test runs, there was a performance improvement on indexing. In theory, I would expect a performance improvement on retrieval too, but I didn't test that.

The performance improvements (relative to 16 shards) were are follows:
8 shards: 26.1%
4 shards: 35.4%
2 shards: 43.7%
1 shard (ie. no sharding): 44.4%
Tests were carried out on a real-world data set consisting of 1445 articles containing 454046 terms resulting in 1215490 links-terms map entries.

A maintainer should decide whether the router changes, which have nothing to do with sharding, should be in a separate PR as they need a different kind of testing.

avatar laoneo
laoneo - comment - 4 Jun 2018

Yes please one pr per task.

avatar laoneo
laoneo - comment - 4 Jun 2018

Thanks Chris for the explanation.

avatar Hackwar
Hackwar - comment - 5 Jun 2018

I've reverted the changes of the router and cleaned up the commits.

avatar laoneo
laoneo - comment - 5 Jun 2018

Would be good if we can get another round of testing and then I think it is good to merge.

avatar Hackwar
Hackwar - comment - 5 Jun 2018

We have 2 successfull tests. Why do we need another round of tests now?

avatar carlitorweb
carlitorweb - comment - 5 Jun 2018

I have tested this item ✅ successfully on c80b975

Tested again. Only the route change was removed from this PR, so I think will be same. Anyway, I tested again, and work as before.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/20185.

avatar carlitorweb carlitorweb - test_item - 5 Jun 2018 - Tested successfully
avatar laoneo
laoneo - comment - 7 Jun 2018

@Hackwar can you please add the suggested changes to make the DB fixing work and then we are good to go here.

avatar brianteeman
brianteeman - comment - 7 Jun 2018

there is no reason to make database>fix work
That is a misuse of the functionality

avatar laoneo
laoneo - comment - 7 Jun 2018

Ok, misinterpreted that comment. What about the suggestion from @Quy?

avatar Hackwar
Hackwar - comment - 7 Jun 2018

I'm hesitant to change the SQL again, mainly because I don't see how what is supposed to work how and more importantly because there are quite a few additional changes on the way for the finder tables. At one point we will have to make one PR that combines all these changes into one SQL file. Can we tackle that issue then? (There are 12 open PRs for finder from me, 4 of which have DB changes)

avatar laoneo
laoneo - comment - 7 Jun 2018

Does the upgrade script work without the recommendations from @Quy?

avatar Quy
Quy - comment - 7 Jun 2018

With fresh install, it works.
With upgrade/patch tester, it stalls at Starting Indexer.

avatar mbabker
mbabker - comment - 7 Jun 2018

This has database schema changes. Patch tester can't deal with database schema changes (you can apply the patch and run Database > Fix but you can't reverse the schema changes if you do that). The component, as useful as it can be, is not a definitive way of identifying issues with patches.

avatar Hackwar
Hackwar - comment - 7 Jun 2018

Just to clarify @chrisdavenport s comment here: With the sharding removed, he had a performance improvement of 44%, not a performance reduction.

avatar wilsonge wilsonge - change - 11 Jun 2018
Status Ready to Commit Fixed in Code Base
Closed_Date 0000-00-00 00:00:00 2018-06-11 21:51:00
Closed_By wilsonge
avatar wilsonge wilsonge - close - 11 Jun 2018
avatar wilsonge wilsonge - merge - 11 Jun 2018
avatar Hackwar
Hackwar - comment - 12 Jun 2018

Thank you!

avatar wilsonge
wilsonge - comment - 12 Jun 2018

Thankyou for being so patient with this! Sorry it took so long to deal with!

Add a Comment

Login with GitHub to post a comment