? ? Pending

User tests: Successful: Unsuccessful:

avatar frankmayer
frankmayer
13 Oct 2018

Pull Request for Issue #22098 .

Summary of Changes

This PR re-adds and optimizes chunking (for the VALUES clause) to the smart search indexer, allowing for larget articles to be saved correctly, and their terms to be correctly added to the index, again.
It also optimizes a condition check in the helper.

With a standard configuration of a PHP memoy limit at 128MB and the default Mysql HEAP limits for MEMORY tables these instructions should provide solid results.

Testing Instructions

Introduction:

There are three scenarios. Each one involves saving a predefined article and getting an expected result.
The first scenario uses a text of ~32000 words. This one should be saved successfully.
In the second scenario, we use a text of ~45000 words. This one will fail to save because of surpassing the finder_token table limits.
Finally, in the third scenario we use a text of ~65000 words. This should cause a php fatal error, stating that the memory has been exhausted.

Preparation:

  • Error reporting should be set to development mode
  • Enable the "Content - Smart Search" plugin (and the other Smart Search related)
  • Open a tab for a new article
  • Open a tab for components/Smart Search

Scenarios:

  1. Article with near maximum possible text
    a) On the "Smart Search" tab, click "Clear Index"
    b) Copy and paste the complete text (select all) from this gist (213,4KiB unformatted text) into the article in the other tab
    c) Give it a title and "save" the article
    d) Expected result: Message Article saved.
    e) On the "Smart Search" tab, click "Statistics" and check that it says "The indexed content on this site includes 11,995 terms across 1 links..."

  2. Article which exceeds the maximum entries for the finder_token table
    a) On the Smart Search tab, click "Clear Index"
    b) Copy and paste the complete text (select all) from this gist (298,8KiB unformatted text) into the article in the other tab
    c) Save the article
    d) Expected result: Error Save failed with the following error: The table '#__finder_tokens' is full
    e) On the "Smart Search" tab, click "Statistics" and check that it says "The indexed content on this site includes 0 terms across 1 links..."
    Remarks
    If you get a message that the acrticle was saved, instead of the error, you probably have higher heap settings for the MEMORY tables in mysql. Still this result would count as expected.
    If you get a php error showing that the memory is exhausted, you are probably running php with a limit of less than 128MB. If that happens, please copy/paste the error message into a comment of this issue.

  3. Article which results in a fatal php error due to memory exhaustion
    a) On the Smart Search tab, click "Clear Index"
    b) Copy and paste the complete text (select all) from this gist into the article in the other tab
    c) Save the article
    d) Expected result: a php error showing that the memory is exhausted
    e) On the "Smart Search" tab, click "Statistics" and check that it says "The indexed content on this site includes 0 terms across 1 links..."
    Remarks
    If you get a message that the acrticle was saved, instead of the error, you probably have higher heap settings for the MEMORY tables in mysql. Still this result would count as expected.
    If you get the error: The table '#__finder_tokens' is full, you might have a higher memory limit than 128MB on PHP.

Documentation Changes Required

none

avatar frankmayer frankmayer - open - 13 Oct 2018
avatar frankmayer frankmayer - change - 13 Oct 2018
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 13 Oct 2018
Category Administration com_finder
avatar frankmayer frankmayer - change - 13 Oct 2018
The description was changed
avatar frankmayer frankmayer - edited - 13 Oct 2018
avatar ggppdk
ggppdk - comment - 14 Oct 2018

I have tested this item successfully on 8d9a9af

This PR
addresses 3 things introduced recently

  • re-adds chunking to avoid the SQL error due to large query length
  • restores the peak memory usage
  • restores a small slow-down when saving large articles

with PHP 7.2 + my old workstation (averages of repetitive runs)
memory_get_peak_usage(false)
memory_get_peak_usage(true)
microtime(true) - $_SERVER["REQUEST_TIME_FLOAT"]

Old code before PR #12253

  • 68.9 MBs / 10.0 MBs / 11.07 seconds

Current code

  • 106.5 MBs / 44.0 MBs / 11.28 seconds

This PR

  • 68.9 MBs / 10.0 MBs / 11.06 seconds

This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/22599.
avatar ggppdk ggppdk - test_item - 14 Oct 2018 - Tested successfully
avatar ggppdk
ggppdk - comment - 14 Oct 2018

About the testing article text (32K words sample) given here
with old and new code i am getting
Save failed with the following error: The table '#__finder_tokens' is full

I had to increase
max_heap_table_size = 64M
to
max_heap_table_size = 128M (or more)

and restart MySql
to be able to save the article

avatar ggppdk
ggppdk - comment - 14 Oct 2018

aaah ,
my last comment was wrong

article will save with max_heap_table_size = 64M,
i thought i was testing with 32K words sample but i had pasted in the editor a bigger sample !

avatar frankmayer
frankmayer - comment - 14 Oct 2018

Old code before PR #12253

  • 68.9 MBs / 10.0 MBs / 11.07 seconds

Current code

  • 106.5 MBs / 44.0 MBs / 11.28 seconds

This PR

  • 68.9 MBs / 10.0 MBs / 11.06 seconds

Thanks for testing this, George. Though I don't know what caused the memory consumtion to rise.. (I believe the "not chunking", without having tested it, though). I ran some tests back then, but cannot seem to find any notes on that. Should have kept them.

avatar frankmayer frankmayer - change - 14 Oct 2018
Labels Added: ?
avatar frankmayer
frankmayer - comment - 14 Oct 2018

@ggppdk I updated this PR according to @infograf768 's remark above. I used strstr() though, for readability and performance.
Sidenote, strstr() performs mostly better than substr() for such cases. So, this functional addition had no negative impact on performance. Of course, since the result is cached anyways, it's only called one time (any impact would not be measurable anyways), but I used this variant for the sake of cleaner coding.

avatar frankmayer frankmayer - change - 14 Oct 2018
The description was changed
avatar frankmayer frankmayer - edited - 14 Oct 2018
avatar ggppdk
ggppdk - comment - 14 Oct 2018

I have tested this item successfully on 2952ae4

Also the default language code calculation looks good,
it splits 2 letter and 3 letters prefixes correctly


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/22599.
avatar ggppdk
ggppdk - comment - 14 Oct 2018

I have tested this item successfully on 2952ae4

Also the default language code calculation looks good,
it splits 2 letter and 3 letters prefixes correctly


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/22599.
avatar ggppdk ggppdk - test_item - 14 Oct 2018 - Tested successfully
avatar frankmayer frankmayer - change - 14 Oct 2018
Status Pending Closed
Closed_Date 0000-00-00 00:00:00 2018-10-14 13:56:42
Closed_By frankmayer
avatar frankmayer frankmayer - close - 14 Oct 2018
avatar frankmayer frankmayer - change - 14 Oct 2018
Status Closed New
Closed_Date 2018-10-14 13:56:42
Closed_By frankmayer
avatar frankmayer frankmayer - change - 14 Oct 2018
Status New Pending
avatar frankmayer frankmayer - reopen - 14 Oct 2018
avatar frankmayer
frankmayer - comment - 14 Oct 2018

@ggppdk @Quy remarked an extra blank line, so I had to commit and push again. Could you pls confirm a successful test? (Nothing changed, so just mark it)

avatar ggppdk
ggppdk - comment - 14 Oct 2018

I have tested this item successfully on be81f0e


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/22599.

avatar ggppdk ggppdk - test_item - 14 Oct 2018 - Tested successfully
avatar dagalumin
dagalumin - comment - 15 Oct 2018

I have tested this item successfully on be81f0e

This new PR works great, bringing back the ability to save large articles as on 3.8.11 version of the indexer, plus adding much better performance:
on the not-so-shiny server of a client with Joomla 3.8.11 version of the indexer we had a limit of about 25k words per article, this PR extended it to 40k with the same 500mb memory limit.
Thank you so much guys for the great work!


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/22599.

avatar dagalumin dagalumin - test_item - 15 Oct 2018 - Tested successfully
avatar Quy Quy - change - 15 Oct 2018
Status Pending Ready to Commit
avatar Quy
Quy - comment - 15 Oct 2018

RTC


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/22599.

avatar mbabker mbabker - change - 16 Oct 2018
Status Ready to Commit Fixed in Code Base
Closed_Date 0000-00-00 00:00:00 2018-10-16 00:04:58
Closed_By mbabker
Labels Added: ?
avatar mbabker mbabker - close - 16 Oct 2018
avatar mbabker mbabker - merge - 16 Oct 2018

Add a Comment

Login with GitHub to post a comment