? ? Pending

User tests: Successful: Unsuccessful:

avatar Hackwar
Hackwar
5 Jul 2020

Summary of Changes

Smart Search has issues with larger content and to be honest, this already triggers rather early. It fails for content larger than 200kb.

The first issue is, that the description column in the #__finder_links table is not large enough. The solution here is to truncate the description to a reasonable size. That reasonable size is 32000 characters and I pulled that number out of thin air. I hope that it is low enough to allow for a string with multi-byte chars still to be saved.

The second issue is, that we are trying to store all tokens at once in the DB. That means if we have a text with 50k words, we are creating a query that tries to insert 50k rows at once. That fails spectacularly. So now we do this in chunks of 128 rows at a time.

Third issue is the tokens table is running full rather easily. The number of tokens storable in the tokens table is depending on the max_heap_table_size, which on my system is set to 16MB and which in that case fails at just 22k entries. To get around this issue, we have the toggleTables() method, but that has only been run so far between the different properties to index. If you text has 2MB in size, it tries to insert all those 2MB into the DB and it fails right away. Thus we are checking the limit also when storing in the additional code of issue 2. I also reduced the number of entries to 10k instead of 30k.

This fixes #27807.

Testing Instructions

Find a large text, for example the bible and create an article with that as content. Save and previously see that it takes a very long time and fails at some point. Apply this patch and try again. See now that it doesn't take as long and actually works successfully now.

avatar Hackwar Hackwar - open - 5 Jul 2020
avatar Hackwar Hackwar - change - 5 Jul 2020
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 5 Jul 2020
Category Administration com_finder
avatar PhilETaylor PhilETaylor - test_item - 5 Jul 2020 - Tested unsuccessfully
avatar PhilETaylor
PhilETaylor - comment - 5 Jul 2020

I have tested this item ? unsuccessfully on e06528d

With the Bee Movie - the article saved quickly - but when I manually clear the index and try to manually index again I get an error in the popup, and inspecting the Ajax call the error is:

<b>Fatal error</b>
:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 33554440 bytes) in 
<b>/application/administrator/components/com_finder/src/Indexer/Helper.php</b>
 on line 
<b>118</b>

Then when I try to save the whole bible from this link I get a different issue. It just doesnt save.

Screenshot 2020-07-05 at 22 13 57


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/30008.
avatar Hackwar
Hackwar - comment - 6 Jul 2020

Do you have more than that one article in the installation? I'm asking because indexing several articles does require more memory than a single one. You could test if it works when you lower the number of items to index in one pass.

avatar PhilETaylor
PhilETaylor - comment - 6 Jul 2020

Its a git development site with no content at all - installed for the purposes of testing this one PR before being destroyed again.

avatar chrisdavenport
chrisdavenport - comment - 6 Jul 2020

This looks to be the problem that #16621 was intended to fix.

avatar Hackwar Hackwar - change - 6 Jul 2020
Labels Added: ?
avatar Hackwar
Hackwar - comment - 6 Jul 2020

I've reviewed the code and the cache actually doesn't really makes sense. We are storing the tokenised result of the whole string, which either doesn't help a lot, since the string is very short (category names, etc.) or it never has a cache hit, since we would have to index the exactly same text twice. This new cache now caches the tokens and looks those up instead. This actually reduces the necessary memory by several megabytes. I've had results that went as far as reducing from 30+MB to just 14MB.

I also tested the cache size and 1024 is better than 2048. The later meant a significant increase in memory consumption, but no real reduction in execution time.

I would assume that this would also solve the issue around #16621...

@PhilETaylor can you test again if this change fixes your memory issue?

avatar PhilETaylor
PhilETaylor - comment - 7 Jul 2020

Tested again this morning using a brand new installation of e5ae33c

Installed Joomla 4
Login to admin as super admin
Navigate Content -> Articles -> New
Title: Bible
Pasted in this whole text into Article Text
Click Save

Get generic error page

Screenshot 2020-07-07 at 09 58 44

Enable Debug mode in global config, try the above again and get this:

Screenshot 2020-07-07 at 09 57 08

avatar PhilETaylor
PhilETaylor - comment - 7 Jul 2020

I should note that Im using PHP 7.4.7, Mysql 8.0.20, and a default PHP Memory Limit of 128M

avatar Hackwar
Hackwar - comment - 7 Jul 2020

PHP 7.3.5 and MariaDB 10.1.39 here. I've now gone and used the "worst" settings for indexing possible. Enabled tuple search, enabled stemming, had TinyMCE as editor and copied the whole bee movie script in there and it worked fine. My memory had a peak usage of 18MB. Can you debug where the huge memory usage is coming from? allocating 33MB at that point at once seems strange.

avatar PhilETaylor
PhilETaylor - comment - 9 Jul 2020

copied the whole bee movie script in there and it worked fine

Like I said before, the Bee Movie DOES SAVE but when I manually clear the index and try to manually index again using the admin console I get an error, and the popup stalls, inspecting the response gives a memory exhausted error.

TODAY: I can save the bee movie perfectly. I can then manually clear the index and reindex PERFECTLY (for the bee movie).


Then I made another clean install ...


TODAY: When I try to save the whole bible from this link I get a different issue.

I paste in the text and click save and close

It takes a long time... and then reloads the edit screen refilled with what I pasted in (and my typed title)

The article is saved to jos_content table - HOWEVER doesnt show up in the admin console of Joomla!

Strange...

avatar Hackwar
Hackwar - comment - 10 Jul 2020

Regarding the bee movie: I did try the re-indexing as well and that worked fine, too.

Regarding the bible: We have a general issue with very large texts. It does not fail gracefully here and I don't know why (yet). I had similar issues where it loaded an empty form after saving for an eternity...

This PR is hopefully a step towards handling larger documents better, but we should do larger, more systematic tests with large documents, too...

avatar PhilETaylor
PhilETaylor - comment - 10 Jul 2020

On that basis let me mark it as a successful test, because its not made things worse, and if you feel its a good step towards handling larger documents better then LGTM.

avatar PhilETaylor PhilETaylor - test_item - 10 Jul 2020 - Tested successfully
avatar PhilETaylor
PhilETaylor - comment - 10 Jul 2020

I have tested this item successfully on e5ae33c

because its not made things worse, and if you feel its a good step towards handling larger documents better then LGTM.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/30008.
avatar Quy Quy - test_item - 10 Jul 2020 - Tested successfully
avatar Quy
Quy - comment - 10 Jul 2020

I have tested this item successfully on e5ae33c


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/30008.

avatar Quy Quy - change - 10 Jul 2020
Status Pending Ready to Commit
avatar Quy
Quy - comment - 10 Jul 2020

RTC


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/30008.

avatar wilsonge wilsonge - close - 19 Jul 2020
avatar wilsonge wilsonge - merge - 19 Jul 2020
avatar wilsonge wilsonge - change - 19 Jul 2020
Status Ready to Commit Fixed in Code Base
Closed_Date 0000-00-00 00:00:00 2020-07-19 20:39:13
Closed_By wilsonge
Labels Added: ?
avatar wilsonge
wilsonge - comment - 19 Jul 2020

Thanks!

Add a Comment

Login with GitHub to post a comment