User tests: Successful: Unsuccessful:
Finder has an artificial sharding, which splits up the links between terms and content items onto 16 seperate tables. This PR removes that sharding. It makes the code extremely complex and bloats it unnecessarily and it also slows down the whole finder system. Indexing the original sample data of Joomla 4.0 with the current finder indexer takes about 2:40 minutes on my system. With these changes, we are already down to 1:30 minute.
This PR also fixes a few other bugs that prevented com_finder from working in Joomla 4.0 right now and I standardised all the language fields to be 7 characters long. While I plan to provide more PRs with further improvements to make com_finder more usable, this PR is in itself complete and can be merged like this.
Install 4.0-dev and go to the backend of Smart Search. Enable the Smart Search Plugin and start the indexing process. Notice that it takes a long time (preferably measure the time with a stopwatch). Go to the frontend and open the Smart Search menu item. See that it throws errors and doesn't work right now.
Apply this PR and clear the index in the backend. Start the re-indexing and notice that it finishes quicker. Go to the frontend and see that that menu item no longer throws errors and that all features work like expected.
When typing in a phrase and then pressing enter, sending the request to the server, a message "error" is displayed before the page is reloaded both before and after applying the patch. I'm still looking for the source of that issue.
I want to thank cloudaccess.net for supporting these changes. Without their generous help, this wouldn't be possible.
Status | New | ⇒ | Pending |
Category | ⇒ | SQL Administration com_admin com_finder Front End Installation Postgresql |
Labels |
Added:
?
|
I would need help with the postgres stuff. I don't have postgres on my system and would need someone to provide such a file and test that part.
yes i'll do it
Thank you!
Will test. Please let me know, apart from the indexing speed (and obvious code corrections), what I can expect, as a multilingual user, from this PR?
At this moment, there is no change in terms of multilingual users. In order to keep these changes testable, I'm restricting my changes to one thing at a time. So this only removes the sharding and gets it all working again in 4.0. The language changes are coming in the next few steps.
made a pr Hackwar#22
unfortunately without this one #19707 merged
unable to test/work
Merged
I was unable to replicate the error that you mentioned before applying this PR
After applying the PR the indexer did not work at all
I have tested this item
The current 4.0-dev branch fails in displaying the frontend when opening the smart search component and having debug mode enabled with the following error message:
1055 'j4search2.t.parent_id' isn't in GROUP BY
This has been fixed in this PR.
I have updated the branch with the latest changes from 4.0-dev and checked it again. It does work. Could it be that you did not apply the database changes, that this introduces? I would be happy if you could test this again.
@brianteeman Did you see my last message? Could you check your test again, especially with the database changes?
Sorry with JAB and JDFR it slipped my radar - thanks for the reminder I will put it on the todo for tomorrow
Can you see what the ajax call from the indexer returns as error?
core.min.js?bcb349ab2b37648abc4c1018be364433:1 GET http://127.0.0.1/cms4/administrator/index.php?option=com_finder&tmpl=component&format=json&task=indexer.batch&ecfbe9f00cc7263cb121fc56d4885d68=1 500 (Internal Server Error)
e.request @ core.min.js?bcb349ab2b37648abc4c1018be364433:1
i @ indexer.min.js?bcb349ab2b37648abc4c1018be364433:1
l @ indexer.min.js?bcb349ab2b37648abc4c1018be364433:1
onSuccess @ indexer.min.js?bcb349ab2b37648abc4c1018be364433:1
n.onreadystatechange @ core.min.js?bcb349ab2b37648abc4c1018be364433:1
XMLHttpRequest.send (async)
e.request @ core.min.js?bcb349ab2b37648abc4c1018be364433:1
i @ indexer.min.js?bcb349ab2b37648abc4c1018be364433:1
e.finderIndexer @ indexer.min.js?bcb349ab2b37648abc4c1018be364433:1
(anonymous) @ indexer.min.js?bcb349ab2b37648abc4c1018be364433:1
indexer.min.js?bcb349ab2b37648abc4c1018be364433:1 Uncaught ReferenceError: json is not defined
at m (indexer.min.js?bcb349ab2b37648abc4c1018be364433:1)
at onError (indexer.min.js?bcb349ab2b37648abc4c1018be364433:1)
at XMLHttpRequest.n.onreadystatechange (core.min.js?bcb349ab2b37648abc4c1018be364433:1)
Sorry to bother you, but can you also see what the specific response is from the ajax call? So far we only know that it throws a 500 error... In best case, enable debugging first.
core.js?35800ffe54a73a00bb236e36e4b1839f:890 GET http://127.0.0.1/cms4/administrator/index.php?option=com_finder&tmpl=component&format=json&task=indexer.batch&ecfbe9f00cc7263cb121fc56d4885d68=1 500 (Internal Server Error)
Joomla.request @ core.js?35800ffe54a73a00bb236e36e4b1839f:890
getRequest @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:29
handleResponse @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:68
onSuccess @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:36
xhr.onreadystatechange @ core.js?35800ffe54a73a00bb236e36e4b1839f:876
XMLHttpRequest.send (async)
Joomla.request @ core.js?35800ffe54a73a00bb236e36e4b1839f:890
getRequest @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:29
initialize @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:25
Joomla.finderIndexer @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:164
(anonymous) @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:169
indexer.js?35800ffe54a73a00bb236e36e4b1839f:107 Uncaught ReferenceError: json is not defined
at handleFailure (indexer.js?35800ffe54a73a00bb236e36e4b1839f:107)
at onError (indexer.js?35800ffe54a73a00bb236e36e4b1839f:39)
at XMLHttpRequest.xhr.onreadystatechange (core.js?35800ffe54a73a00bb236e36e4b1839f:879)
handleFailure @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:107
onError @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:39
xhr.onreadystatechange @ core.js?35800ffe54a73a00bb236e36e4b1839f:879
XMLHttpRequest.send (async)
Joomla.request @ core.js?35800ffe54a73a00bb236e36e4b1839f:890
getRequest @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:29
handleResponse @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:68
onSuccess @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:36
xhr.onreadystatechange @ core.js?35800ffe54a73a00bb236e36e4b1839f:876
XMLHttpRequest.send (async)
Joomla.request @ core.js?35800ffe54a73a00bb236e36e4b1839f:890
getRequest @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:29
initialize @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:25
Joomla.finderIndexer @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:164
(anonymous) @ indexer.js?35800ffe54a73a00bb236e36e4b1839f:169
That is still the JS trace, not the HTML error that Joomla is returning.
Tested here on a clean patched 4.0-dev on Firefox
I could index default sample data without any error.
Search via the module in frontend does work fine although I have a small js error
TypeError: t is null[Learn More] finder.min.js:1:985
This error is only present when searching from the module, i.e. once the the results are displayed and one searches from the search field displayed on top of the results, the error is not thrown.
After enabling debug, I get a different js error
TypeError: advanced is null[Learn More] finder.js:62:1
I did not change anything in the output or in the JS. Those errors have been present before and are stuff for yet another PR.
I have tested this item
Start the indexing process. Notice that it takes a long time. Go to the frontend and open the Smart Search menu item. See that it throws errors and doesn't work right now.
I confirm this
Apply this PR and clear the index in the backend. Start the re-indexing and notice that it finishes quicker. Go to the frontend and see that that menu item no longer throws errors and that all features work like expected.
After applying the PR the indexer did not work at all
Please make sure that the db changes are applied.Am 24.05.2018 22:22 schrieb Carlos Rodriguez notifications@github.com:I have tested this item
I confirm this
Apply this PR and clear the index in the backend. Start the re-indexing and notice that it finishes quicker. Go to the frontend and see that that menu item no longer throws errors and that all features work like expected.
After applying the PR the indexer did not work at allThis comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/20185.
—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.
I have tested this item
Please make sure that the db changes are applied
Sorry @Hackwar I not saw this
Apply this PR and clear the index in the backend. Start the re-indexing and notice that it finishes quicker. Go to the frontend and see that that menu item no longer throws errors and that all features work like expected.
Now work, as described
Will be good if we can have this merged, for can test others com_finder PR you did.
@brianteeman can you again make sure that the DB changes have been applied and test this again?
@infograf768 is your test a successfull test or do you consider that as an error?
If you could mark/update your tests, we could merge this.
Before setting this to OK, I need the advice of @chrisdavenport as this does not only solves a bug but also changes the database structure of com_finder.
I have tested this item
Status | Pending | ⇒ | Ready to Commit |
Ready to Commit after two successful tests.
@infograf768 I haven't tested this PR, but I agree in principle with the intention. I ran some tests a while back that convinced me that there should be a performance improvement in most cases.
Watch out for older versions of the database having columns with default values that are no longer valid, as this could cause the ALTER TABLE statements to fail.
I don't think that the old default values are an issue. I've proposed quite a lot of changes, which in the end requires rebuilding the index anyway. When we are done with all changes, I would consolidate all SQL changes into one file and prepend that with a few TRUNCATEs to clear the index.
I couldn't really figure out what would be the consequences of this pr. What will the then not work as before when this pr get merged?
Everything works as before. We only use 15 db tables less. It speeds up indexing and makes the code a lot easier to read.
Should I now revert the router changes again or can this be merged the way it is?
I just can't believe that removing 16 tables has not a single bit of consequences. Something must not work as before. As long as we do not know that, I don't suggest that we merge this pr.
Why was that done in that way and now we can move it to one table. Don't get me wrong I'm not against it, just want to make sure we do not miss anything.
According to a discussion that I had with Louis Landry half a millenia ago, that splitting was meant to improve performance. According to him, inserting all those tens of thousands of rows into a single table would slow down the indexing process a lot, since the DB at some point has to catch up with all the inserts. Splitting this up into these 16 tables (16 because it allowed for easy calculations with hex values) was supposed to prevent that, because now the insertions would count for each table separately, thus increasing the throughput by the factor of 16.
I don't want to judge if that actually was true at the time that finder was developed, but according to my tests and a few people that I talked to in the SQL IRC channel, that is not the case. Having these split tables actually is a big performance hit when indexing and requires up to 16 queries to get all search results from our system. This is a useless feature that no one needs, that makes our system slower, the code unmaintainable and which has to go as soon as possible.
If you have a website that is so large that you really have issues here, then you can have a look at the built in sharding feature of MySQL. But I would honestly instead expect you to use a standalone search system like Elastic or Solr or similar.
Thanks for explanation. That was the information I was looking for.
Labels |
Added:
?
|
For the record, I had similar conversations with Louis and Rob many years ago and I seem to recall them saying that the decision to shard was based on real-world experience at the time. Consequently, when Hannes originally suggested that sharding be removed I ran some tests of my own. I modified the code to allow a variable number of shards (actually only a power of 2). In all cases, averaging over many test runs, there was a performance improvement on indexing. In theory, I would expect a performance improvement on retrieval too, but I didn't test that.
The performance improvements (relative to 16 shards) were are follows:
8 shards: 26.1%
4 shards: 35.4%
2 shards: 43.7%
1 shard (ie. no sharding): 44.4%
Tests were carried out on a real-world data set consisting of 1445 articles containing 454046 terms resulting in 1215490 links-terms map entries.
A maintainer should decide whether the router changes, which have nothing to do with sharding, should be in a separate PR as they need a different kind of testing.
Yes please one pr per task.
Thanks Chris for the explanation.
I've reverted the changes of the router and cleaned up the commits.
Would be good if we can get another round of testing and then I think it is good to merge.
We have 2 successfull tests. Why do we need another round of tests now?
I have tested this item
Tested again. Only the route change was removed from this PR, so I think will be same. Anyway, I tested again, and work as before.
there is no reason to make database>fix work
That is a misuse of the functionality
I'm hesitant to change the SQL again, mainly because I don't see how what is supposed to work how and more importantly because there are quite a few additional changes on the way for the finder tables. At one point we will have to make one PR that combines all these changes into one SQL file. Can we tackle that issue then? (There are 12 open PRs for finder from me, 4 of which have DB changes)
With fresh install, it works.
With upgrade/patch tester, it stalls at Starting Indexer
.
This has database schema changes. Patch tester can't deal with database schema changes (you can apply the patch and run Database > Fix
but you can't reverse the schema changes if you do that). The component, as useful as it can be, is not a definitive way of identifying issues with patches.
Just to clarify @chrisdavenport s comment here: With the sharding removed, he had a performance improvement of 44%, not a performance reduction.
Status | Ready to Commit | ⇒ | Fixed in Code Base |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2018-06-11 21:51:00 |
Closed_By | ⇒ | wilsonge |
Thank you!
Thankyou for being so patient with this! Sorry it took so long to deal with!
As there is administrator/components/com_admin/sql/updates/mysql/4.0.0-2018-04-14.sql
I assume that there should also be an equivalent postgres file?