User tests: Successful: Unsuccessful:
Running the Smart Search indexer on a medium-sized site often results in "out of memory" failures which can only be addressed by running the CLI indexer with more memory allocated, like this:
php -d memory_limit=256M finder_indexer.php
However, this option is not usually available for web-based indexing.
It turns out that the problem is largely due to the tokeniser attempting to reduce execution time by caching all previous tokenisations under 128 bytes long. This PR addresses that issue by limiting the number of cached tokenisations to 100. Note: the number could be made configurable, but I am doubtful of the benefit to be gained so that is not part of this PR.
Hat tip to @mahagr for pointing out the problem with the tokeniser.
Since this might conceivably be of use elsewhere, the fix has been implemented by adding a new storage class to the JCache library. The new JCacheStorageMemory class uses a regular PHP array to store cached items. A maxBuffers option allows you to define the maximum number of items to cache (default 100). If an attempt is made to store a new item when the cache is already full, the least recently used item will be deleted to make room.
Find or construct a site with enough content that the CLI indexer fails with an out of memory error. As a rough guide you'll probably need over 1,000 articles to trigger the error. The size of the articles doesn't matter much because tokenisations of strings more than 128 bytes long are not cached and most articles are likely to be longer than that.
Apply the PR and run the indexer again. Hopefully you won't get the out of memory error!
I am aware that unit tests need to be added for JCacheStorageMemory and I'd appreciate help in writing them.
None.
Status | New | ⇒ | Pending |
Labels |
Added:
?
|
Category | ⇒ | Administration Components Libraries |
Find or construct a site with enough content that the CLI indexer fails with an out of memory error.
So is this PR for testing/fixing the CLI or the web indexer
@mahagr I had to dig deep to find it, but here it is: https://developer.joomla.org/joomlacode-archive/issue-28669.html
Both, since they share the same code. But if you only use the web indexer you're not going to be certain that the failure was caused by an out of memory error. I don't think the web indexer provides any feedback when it fails.
ok - I just tried 1500 articles on the web index with 128m ram and it worked before the patch
@chrisdavenport Oh wow, that's like 4 years ago! Now that you did show that to me, I honestly thought it was already fixed a while ago.
I think we were testing smart search support for Kunena back then and had like 250 000 messages to play with. I think the code may still be around somewhere (maybe old revision/version), but it was never used as we couldn't get it to work with indexer.
@brianteeman Thanks for testing. I think that's on the cusp of failing. I have a site with 1491 articles (451,506 terms) where it fails. On another site I have 2509 articles (751,746 terms) where I have to restart the indexer 3 times to get it to index all the content. Both sites have 128Mb RAM. With this PR both sites are indexed without running out of memory.
Anyway I thought of another way of testing this PR. Instead of trying to get it to actually fail, just look at the amount of memory being used.
I have added a couple of PRs to output peak memory usage information:
#12679 Adds it to the CLI indexer output.
#12680 Adds it to the web indexer log.
You should see a dramatic reduction in the memory usage.
OK I will test that approach
I have tested this item
Category | Administration Components Libraries | ⇒ | Administration com_finder Libraries Components |
I have tested this item
I've tested this on Joomla! 3.7.2 with a smaller number of articles, but saw a good reduction in execution time. .716 seconds to .541 seconds.
Status | Pending | ⇒ | Ready to Commit |
Labels |
Added:
?
|
RTC. Needs to be synced with staging before the merge.
Labels |
branch updated ;)
@nikosdion because it is an addition, it looks good for me. But you offered help to have a look at library code, so here we are :-)
Status | Ready to Commit | ⇒ | Needs Review |
Labels |
set to needs review
Labels |
Removed:
?
|
Category | Administration Components Libraries com_finder | ⇒ | Administration com_admin com_associations com_banners com_cache com_categories com_config com_contact com_content com_contenthistory com_fields Components |
something isnt right @chrisdavenport with 950 changed files
@chrisdavenport you need to rebase your branch and not merge staging into your branch
Grrr. I have no idea how to fix that. Any clues?
Create a new local branch out of your repo branch. Hard reset to your last commit before the merge. Then you should be able to rebase it to staging.
Thanks @nikosdion I'll try that tonight after work.
Closing in favour of #16621 as I wasn't able to fix the git problem despite @nikosdion 's generous assistance.
Status | Needs Review | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2017-06-11 07:45:33 |
Closed_By | ⇒ | chrisdavenport |
I think you meant to thank you someone else than me on this as I wasn't the one who pointed out the issue. :)