User tests: Successful: Unsuccessful:
Smart Search has issues with larger content and to be honest, this already triggers rather early. It fails for content larger than 200kb.
The first issue is, that the description column in the #__finder_links table is not large enough. The solution here is to truncate the description to a reasonable size. That reasonable size is 32000 characters and I pulled that number out of thin air. I hope that it is low enough to allow for a string with multi-byte chars still to be saved.
The second issue is, that we are trying to store all tokens at once in the DB. That means if we have a text with 50k words, we are creating a query that tries to insert 50k rows at once. That fails spectacularly. So now we do this in chunks of 128 rows at a time.
Third issue is the tokens table is running full rather easily. The number of tokens storable in the tokens table is depending on the max_heap_table_size, which on my system is set to 16MB and which in that case fails at just 22k entries. To get around this issue, we have the toggleTables() method, but that has only been run so far between the different properties to index. If you text has 2MB in size, it tries to insert all those 2MB into the DB and it fails right away. Thus we are checking the limit also when storing in the additional code of issue 2. I also reduced the number of entries to 10k instead of 30k.
This fixes #27807.
Find a large text, for example the bible and create an article with that as content. Save and previously see that it takes a very long time and fails at some point. Apply this patch and try again. See now that it doesn't take as long and actually works successfully now.
Status | New | ⇒ | Pending |
Category | ⇒ | Administration com_finder |
Do you have more than that one article in the installation? I'm asking because indexing several articles does require more memory than a single one. You could test if it works when you lower the number of items to index in one pass.
Its a git development site with no content at all - installed for the purposes of testing this one PR before being destroyed again.
Labels |
Added:
?
|
I've reviewed the code and the cache actually doesn't really makes sense. We are storing the tokenised result of the whole string, which either doesn't help a lot, since the string is very short (category names, etc.) or it never has a cache hit, since we would have to index the exactly same text twice. This new cache now caches the tokens and looks those up instead. This actually reduces the necessary memory by several megabytes. I've had results that went as far as reducing from 30+MB to just 14MB.
I also tested the cache size and 1024 is better than 2048. The later meant a significant increase in memory consumption, but no real reduction in execution time.
I would assume that this would also solve the issue around #16621...
@PhilETaylor can you test again if this change fixes your memory issue?
Tested again this morning using a brand new installation of e5ae33c
Installed Joomla 4
Login to admin as super admin
Navigate Content -> Articles -> New
Title: Bible
Pasted in this whole text into Article Text
Click Save
Get generic error page
Enable Debug mode in global config, try the above again and get this:
I should note that Im using PHP 7.4.7, Mysql 8.0.20, and a default PHP Memory Limit of 128M
PHP 7.3.5 and MariaDB 10.1.39 here. I've now gone and used the "worst" settings for indexing possible. Enabled tuple search, enabled stemming, had TinyMCE as editor and copied the whole bee movie script in there and it worked fine. My memory had a peak usage of 18MB. Can you debug where the huge memory usage is coming from? allocating 33MB at that point at once seems strange.
copied the whole bee movie script in there and it worked fine
Like I said before, the Bee Movie DOES SAVE but when I manually clear the index and try to manually index again using the admin console I get an error, and the popup stalls, inspecting the response gives a memory exhausted error.
TODAY: I can save the bee movie perfectly. I can then manually clear the index and reindex PERFECTLY (for the bee movie).
Then I made another clean install ...
TODAY: When I try to save the whole bible from this link I get a different issue.
I paste in the text and click save and close
It takes a long time... and then reloads the edit screen refilled with what I pasted in (and my typed title)
The article is saved to jos_content table - HOWEVER doesnt show up in the admin console of Joomla!
Strange...
Regarding the bee movie: I did try the re-indexing as well and that worked fine, too.
Regarding the bible: We have a general issue with very large texts. It does not fail gracefully here and I don't know why (yet). I had similar issues where it loaded an empty form after saving for an eternity...
This PR is hopefully a step towards handling larger documents better, but we should do larger, more systematic tests with large documents, too...
On that basis let me mark it as a successful test, because its not made things worse, and if you feel its a good step towards handling larger documents better
then LGTM.
I have tested this item
because its not made things worse, and if you feel its a good step towards handling larger documents better then LGTM.
I have tested this item
Status | Pending | ⇒ | Ready to Commit |
RTC
Status | Ready to Commit | ⇒ | Fixed in Code Base |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2020-07-19 20:39:13 |
Closed_By | ⇒ | wilsonge | |
Labels |
Added:
?
|
Thanks!
I have tested this item? unsuccessfully on e06528d
With the Bee Movie - the article saved quickly - but when I manually clear the index and try to manually index again I get an error in the popup, and inspecting the Ajax call the error is:
Then when I try to save the whole bible from this link I get a different issue. It just doesnt save.
This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/30008.