User tests: Successful: Unsuccessful:
Pull Request for Issue #22098 .
This PR re-adds and optimizes chunking (for the VALUES clause) to the smart search indexer, allowing for larget articles to be saved correctly, and their terms to be correctly added to the index, again.
It also optimizes a condition check in the helper.
With a standard configuration of a PHP memoy limit at 128MB and the default Mysql HEAP limits for MEMORY tables these instructions should provide solid results.
There are three scenarios. Each one involves saving a predefined article and getting an expected result.
The first scenario uses a text of ~32000 words. This one should be saved successfully.
In the second scenario, we use a text of ~45000 words. This one will fail to save because of surpassing the finder_token table limits.
Finally, in the third scenario we use a text of ~65000 words. This should cause a php fatal error, stating that the memory has been exhausted.
Article with near maximum possible text
a) On the "Smart Search" tab, click "Clear Index"
b) Copy and paste the complete text (select all) from this gist (213,4KiB unformatted text) into the article in the other tab
c) Give it a title and "save" the article
d) Expected result: Message Article saved.
e) On the "Smart Search" tab, click "Statistics" and check that it says "The indexed content on this site includes 11,995 terms across 1 links..."
Article which exceeds the maximum entries for the finder_token
table
a) On the Smart Search tab, click "Clear Index"
b) Copy and paste the complete text (select all) from this gist (298,8KiB unformatted text) into the article in the other tab
c) Save the article
d) Expected result: Error Save failed with the following error: The table '#__finder_tokens' is full
e) On the "Smart Search" tab, click "Statistics" and check that it says "The indexed content on this site includes 0 terms across 1 links..."
Remarks
If you get a message that the acrticle was saved, instead of the error, you probably have higher heap settings for the MEMORY tables in mysql. Still this result would count as expected.
If you get a php error showing that the memory is exhausted, you are probably running php with a limit of less than 128MB. If that happens, please copy/paste the error message into a comment of this issue.
Article which results in a fatal php error due to memory exhaustion
a) On the Smart Search tab, click "Clear Index"
b) Copy and paste the complete text (select all) from this gist into the article in the other tab
c) Save the article
d) Expected result: a php error showing that the memory is exhausted
e) On the "Smart Search" tab, click "Statistics" and check that it says "The indexed content on this site includes 0 terms across 1 links..."
Remarks
If you get a message that the acrticle was saved, instead of the error, you probably have higher heap settings for the MEMORY tables in mysql. Still this result would count as expected.
If you get the error: The table '#__finder_tokens' is full, you might have a higher memory limit than 128MB on PHP.
none
Status | New | ⇒ | Pending |
Category | ⇒ | Administration com_finder |
About the testing article text (32K words sample) given here
with old and new code i am getting
Save failed with the following error: The table '#__finder_tokens' is full
I had to increase
max_heap_table_size = 64M
to
max_heap_table_size = 128M (or more)
and restart MySql
to be able to save the article
aaah ,
my last comment was wrong
article will save with max_heap_table_size = 64M,
i thought i was testing with 32K words sample but i had pasted in the editor a bigger sample !
Old code before PR #12253
- 68.9 MBs / 10.0 MBs / 11.07 seconds
Current code
- 106.5 MBs / 44.0 MBs / 11.28 seconds
This PR
- 68.9 MBs / 10.0 MBs / 11.06 seconds
Thanks for testing this, George. Though I don't know what caused the memory consumtion to rise.. (I believe the "not chunking", without having tested it, though). I ran some tests back then, but cannot seem to find any notes on that. Should have kept them.
Labels |
Added:
?
|
@ggppdk I updated this PR according to @infograf768 's remark above. I used strstr()
though, for readability and performance.
Sidenote, strstr()
performs mostly better than substr()
for such cases. So, this functional addition had no negative impact on performance. Of course, since the result is cached anyways, it's only called one time (any impact would not be measurable anyways), but I used this variant for the sake of cleaner coding.
I have tested this item
Also the default language code calculation looks good,
it splits 2 letter and 3 letters prefixes correctly
I have tested this item
Also the default language code calculation looks good,
it splits 2 letter and 3 letters prefixes correctly
Status | Pending | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2018-10-14 13:56:42 |
Closed_By | ⇒ | frankmayer |
Status | Closed | ⇒ | New |
Closed_Date | 2018-10-14 13:56:42 | ⇒ | |
Closed_By | frankmayer | ⇒ |
Status | New | ⇒ | Pending |
I have tested this item
I have tested this item
This new PR works great, bringing back the ability to save large articles as on 3.8.11 version of the indexer, plus adding much better performance:
on the not-so-shiny server of a client with Joomla 3.8.11 version of the indexer we had a limit of about 25k words per article, this PR extended it to 40k with the same 500mb memory limit.
Thank you so much guys for the great work!
Status | Pending | ⇒ | Ready to Commit |
RTC
Status | Ready to Commit | ⇒ | Fixed in Code Base |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2018-10-16 00:04:58 |
Closed_By | ⇒ | mbabker | |
Labels |
Added:
?
|
I have tested this item✅ successfully on 8d9a9af
This PR
addresses 3 things introduced recently
with PHP 7.2 + my old workstation (averages of repetitive runs)
memory_get_peak_usage(false)
memory_get_peak_usage(true)
microtime(true) - $_SERVER["REQUEST_TIME_FLOAT"]
Old code before PR #12253
Current code
This PR
This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/22599.