? Pending

User tests: Successful: Unsuccessful:

avatar chrisdavenport
chrisdavenport
31 Oct 2016

Summary of Changes

Running the Smart Search indexer on a medium-sized site often results in "out of memory" failures which can only be addressed by running the CLI indexer with more memory allocated, like this:

php -d memory_limit=256M finder_indexer.php

However, this option is not usually available for web-based indexing.

It turns out that the problem is largely due to the tokeniser attempting to reduce execution time by caching all previous tokenisations under 128 bytes long. This PR addresses that issue by limiting the number of cached tokenisations to 100. Note: the number could be made configurable, but I am doubtful of the benefit to be gained so that is not part of this PR.

Hat tip to @mahagr for pointing out the problem with the tokeniser.

Since this might conceivably be of use elsewhere, the fix has been implemented by adding a new storage class to the JCache library. The new JCacheStorageMemory class uses a regular PHP array to store cached items. A maxBuffers option allows you to define the maximum number of items to cache (default 100). If an attempt is made to store a new item when the cache is already full, the least recently used item will be deleted to make room.

Testing Instructions

Find or construct a site with enough content that the CLI indexer fails with an out of memory error. As a rough guide you'll probably need over 1,000 articles to trigger the error. The size of the articles doesn't matter much because tokenisations of strings more than 128 bytes long are not cached and most articles are likely to be longer than that.

Apply the PR and run the indexer again. Hopefully you won't get the out of memory error!

I am aware that unit tests need to be added for JCacheStorageMemory and I'd appreciate help in writing them.

Documentation Changes Required

None.

avatar chrisdavenport chrisdavenport - open - 31 Oct 2016
avatar chrisdavenport chrisdavenport - change - 31 Oct 2016
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 31 Oct 2016
Labels Added: ?
avatar joomla-cms-bot joomla-cms-bot - change - 31 Oct 2016
Category Administration Components Libraries
avatar mahagr
mahagr - comment - 31 Oct 2016

I think you meant to thank you someone else than me on this as I wasn't the one who pointed out the issue. :)

avatar brianteeman
brianteeman - comment - 31 Oct 2016

Find or construct a site with enough content that the CLI indexer fails with an out of memory error.

So is this PR for testing/fixing the CLI or the web indexer

avatar chrisdavenport
chrisdavenport - comment - 31 Oct 2016

@mahagr I had to dig deep to find it, but here it is: https://developer.joomla.org/joomlacode-archive/issue-28669.html

avatar chrisdavenport
chrisdavenport - comment - 31 Oct 2016

Both, since they share the same code. But if you only use the web indexer you're not going to be certain that the failure was caused by an out of memory error. I don't think the web indexer provides any feedback when it fails.

avatar brianteeman
brianteeman - comment - 31 Oct 2016

ok - I just tried 1500 articles on the web index with 128m ram and it worked before the patch

avatar mahagr
mahagr - comment - 1 Nov 2016

@chrisdavenport Oh wow, that's like 4 years ago! Now that you did show that to me, I honestly thought it was already fixed a while ago.

I think we were testing smart search support for Kunena back then and had like 250 000 messages to play with. I think the code may still be around somewhere (maybe old revision/version), but it was never used as we couldn't get it to work with indexer.

avatar chrisdavenport
chrisdavenport - comment - 1 Nov 2016

@brianteeman Thanks for testing. I think that's on the cusp of failing. I have a site with 1491 articles (451,506 terms) where it fails. On another site I have 2509 articles (751,746 terms) where I have to restart the indexer 3 times to get it to index all the content. Both sites have 128Mb RAM. With this PR both sites are indexed without running out of memory.

Anyway I thought of another way of testing this PR. Instead of trying to get it to actually fail, just look at the amount of memory being used.

I have added a couple of PRs to output peak memory usage information:
#12679 Adds it to the CLI indexer output.
#12680 Adds it to the web indexer log.

You should see a dramatic reduction in the memory usage.

avatar brianteeman
brianteeman - comment - 1 Nov 2016

OK I will test that approach

avatar alikon
alikon - comment - 1 Nov 2016

i've tested in the other way via CLI
cli
first run without PR second with PR
something like 10% less memory

avatar alikon alikon - test_item - 6 Nov 2016 - Tested successfully
avatar alikon
alikon - comment - 6 Nov 2016

I have tested this item successfully on b6e14c4


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/12649.

avatar joomla-cms-bot joomla-cms-bot - change - 27 May 2017
Category Administration Components Libraries Administration com_finder Libraries Components
avatar tonypartridge
tonypartridge - comment - 29 May 2017

I have tested this item successfully on c0482b2

I've tested this on Joomla! 3.7.2 with a smaller number of articles, but saw a good reduction in execution time. .716 seconds to .541 seconds.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/12649.

avatar tonypartridge tonypartridge - test_item - 29 May 2017 - Tested successfully
avatar zero-24 zero-24 - alter_testresult - 29 May 2017 - alikon : Tested successfully
avatar zero-24 zero-24 - edited - 29 May 2017
avatar zero-24 zero-24 - change - 29 May 2017
The description was changed
Status Pending Ready to Commit
Labels Added: ?
avatar zero-24
zero-24 - comment - 29 May 2017

RTC. Needs to be synced with staging before the merge.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/12649.

avatar brianteeman brianteeman - change - 29 May 2017
Labels
avatar brianteeman
brianteeman - comment - 29 May 2017

branch updated ;)

avatar rdeutz
rdeutz - comment - 6 Jun 2017

@nikosdion because it is an addition, it looks good for me. But you offered help to have a look at library code, so here we are :-)

avatar rdeutz rdeutz - change - 6 Jun 2017
Status Ready to Commit Needs Review
Labels
avatar rdeutz
rdeutz - comment - 6 Jun 2017

set to needs review


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/12649.

avatar chrisdavenport chrisdavenport - change - 7 Jun 2017
Labels Removed: ?
avatar joomla-cms-bot joomla-cms-bot - change - 7 Jun 2017
Category Administration Components Libraries com_finder Administration com_admin com_associations com_banners com_cache com_categories com_config com_contact com_content com_contenthistory com_fields Components
avatar brianteeman
brianteeman - comment - 7 Jun 2017

something isnt right @chrisdavenport with 950 changed files

avatar rdeutz
rdeutz - comment - 7 Jun 2017

@chrisdavenport you need to rebase your branch and not merge staging into your branch

avatar chrisdavenport
chrisdavenport - comment - 8 Jun 2017

Grrr. I have no idea how to fix that. Any clues?

avatar nikosdion
nikosdion - comment - 8 Jun 2017

Create a new local branch out of your repo branch. Hard reset to your last commit before the merge. Then you should be able to rebase it to staging.

avatar chrisdavenport
chrisdavenport - comment - 8 Jun 2017

Thanks @nikosdion I'll try that tonight after work.

avatar chrisdavenport
chrisdavenport - comment - 11 Jun 2017

Closing in favour of #16621 as I wasn't able to fix the git problem despite @nikosdion 's generous assistance.

avatar chrisdavenport chrisdavenport - change - 11 Jun 2017
Status Needs Review Closed
Closed_Date 0000-00-00 00:00:00 2017-06-11 07:45:33
Closed_By chrisdavenport
avatar chrisdavenport chrisdavenport - close - 11 Jun 2017

Add a Comment

Login with GitHub to post a comment