? ? Pending

User tests: Successful: Unsuccessful:

avatar Hackwar
Hackwar
9 May 2021

Summary of Changes

This PR improves the indexing performance of Smart Search in Joomla 3 by inserting smaller chunks into the database at once and generally using a slightly different receipe to handle the insertion of terms into the database. In my small tests here, it reduced the time to index the content of the testing data from around 16s to 13s.

Testing Instructions

Use a website with content and smart search setup. The more content, the better. In the backend, go to the Smart Search component and clear the index. Go into the command line and call php cli/finder_indexer.php to run the indexing process. Notice the time it takes. Run these steps several times to get a reliable time for the indexing process.
Now apply the patch and again do these steps several times.

Actual result BEFORE applying this Pull Request

Content is indexed, but it takes long.

Expected result AFTER applying this Pull Request

Content is still indexed, but it is about 20% quicker.

avatar Hackwar Hackwar - open - 9 May 2021
avatar Hackwar Hackwar - change - 9 May 2021
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 9 May 2021
Category Administration com_finder
avatar richard67
richard67 - comment - 10 May 2021

@rigin Could you test this PR here? Thanks in advance.

avatar rigin
rigin - comment - 10 May 2021

I will deploy another local copy of the site and experiment on it for some time.
After that, I will publish the report here.

avatar richard67
richard67 - comment - 10 May 2021

I will deploy another local copy of the site and experiment on it for some time.
After that, I will publish the report here.

@rigin Thanks in advance. I don't expect all to be solved what you have mentioned. But there should be a performance gain like described above in the description.

avatar jsubri jsubri - test_item - 13 May 2021 - Tested successfully
avatar jsubri
jsubri - comment - 13 May 2021

I have tested this item successfully on fa47381

Before
Total Processing Time: 65.279 seconds
Peak memory usage: 33,554,432 bytes

After:
Total Processing Time: 47.213 seconds
Peak memory usage: 33,554,432 bytes

Batch 1 & 2 are sightly slower, then batches 3 - 8 are up to 40% faster
Thank you


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/33720.

avatar jsubri
jsubri - comment - 13 May 2021

after a second run Batch 1 & 2 are not slower, all good :-)

avatar rigin
rigin - comment - 13 May 2021

I am currently making a series of 3 measurements.
Tomorrow or the day after tomorrow it will be ready.

avatar alikon alikon - test_item - 15 May 2021 - Tested successfully
avatar alikon
alikon - comment - 15 May 2021

I have tested this item successfully on fa47381


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/33720.

avatar alikon
alikon - comment - 15 May 2021

i'll wait the test result from @rigin for setting this rtc

avatar rigin
rigin - comment - 17 May 2021

Good afternoon.

Measurement results.

The original version.


Total Processing Time: 48880.113 seconds.
Peak memory usage: 25,165,824 bytes

Total Processing Time: 48767.392 seconds.
Peak memory usage: 25,165,824 bytes

Total Processing Time: 45462.329 seconds.
Peak memory usage: 23,068,672 bytes

Total Processing Time: 44495.304 seconds.
Peak memory usage: 23,068,672 bytes


After the changes.


Total Processing Time: 38031.2 seconds.
Peak memory usage: 23,068,672 bytes

Total Processing Time: 37066.64 seconds.
Peak memory usage: 23,068,672 bytes

Total Processing Time: 37427.362 seconds.
Peak memory usage: 23,068,672 bytes


Sorry for the delay. During the measurement process, my MariaDB settings were reset to the default state. (just a coincidence) I had to repeat the tests. But, the results have not changed much.

Also, I tested my database using the mysqltuner utility.

It issues a recommendation there.

join_buffer_size (> 100.0M, or always use indexes with joins)

avatar Hackwar
Hackwar - comment - 17 May 2021

Thank you for your tests. I'd say that supports this PR. ? @alikon can you set this to RTC then?

avatar rigin
rigin - comment - 17 May 2021

I have a question.

Is it possible to modify the cli script so that it performs indexing when the content plugin is disabled?

So that the user can save materials without indexing, and simultaneously conduct the indexing process using the cli script.

And also, whether it is possible to change the algorithm for selecting content for indexing, so that it discards materials that have a change time older than the time of index creation. So that the cli script indexes only the changed materials.

avatar richard67 richard67 - change - 17 May 2021
Status Pending Ready to Commit
Labels Added: ?
avatar richard67
richard67 - comment - 17 May 2021

RTC


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/33720.

avatar Hackwar
Hackwar - comment - 17 May 2021

I've not tested this in Joomla 3, but in Joomla 4 you don't need to have the content plugin enabled for the CLI script to work.

The indexer will not reindex content items which have not changed since last indexing. This is done through calculating the content to index and then comparing an MD5 hash of the current object and the one previously indexed. Processing the items to the point where the indexer can index them will take some time, but it should be at least 10 times faster than actually indexing it completely. Unfortunately the last-changed timestamp is not reliable and is not present in every component. You can do optimisations like this in custom finder plugins for your specific site if you want.

avatar Fedik
Fedik - comment - 17 May 2021

I tested, the script just scrolls the content without indexing.

you need to disable "Content - Smart Search" plugin, it responsible to index "on save".
but the plugin "Smart Search - Content" must be enabled, it responsible for Content indexing in general.

avatar rigin
rigin - comment - 17 May 2021

I figured out the reasons for the brakes.
I had the "Table Memory limit" parameter set too low in the smart search settings.
And accordingly, the #__finder_tokens and #__finder_tokens_aggregate tables were switched to MyISAM mode.

Here I described the situation and ways to solve it (in Russian)

https://rigin.net/joomla/obshchie-voprosy/oshibka-indeksirovaniya-the-table-finder-tokens-is-full.html

avatar HLeithner HLeithner - close - 17 May 2021
avatar HLeithner HLeithner - merge - 17 May 2021
avatar HLeithner HLeithner - change - 17 May 2021
Status Ready to Commit Fixed in Code Base
Closed_Date 0000-00-00 00:00:00 2021-05-17 12:37:09
Closed_By HLeithner
Labels Added: ?
avatar HLeithner
HLeithner - comment - 17 May 2021

thanks

avatar rigin
rigin - comment - 18 May 2021

When indexing for smart search, sometimes there is an overflow error in the temporary tables #__finder_tokens and #__finder_tokens_aggregate .

These tables are designed for caching temporary data and have the MEMORY type, and it is banal that this memory itself is not enough.

MYSQL just ran out of memory for placing data in the heap. This can be easily changed in the MYSQL settings (in Linux, the default is /etc/my. cnf). The max_heap_table_size parameter is responsible for this-the maximum allowed size of the table stored in memory (of the MEMORY type). Its default value is 16 MB. Change it to max_heap_table_size=256M (for example)

However, there is an additional one. complexity, purely administrative-this parameter is usually the responsibility of the hosting administrator )))

But if you write the right letter, it will be changed there.

Using the "Table Memory Limit" parameter.

In the Smart search settings, on the Indexing tab, there is a parameter called "Table memory limit". Its essence is to prevent this error, although at the expense of a sharp decrease in the speed of the algorithm.

The indexing algorithm takes into account the number of records in this temporary table when scanning each material, and if this number exceeds this parameter, the table is converted to the MyISAM type, which prevents this error, but dramatically reduces the speed of the algorithm.

According to my estimates, each record in these tables requires approximately 1.2-1.5 KB per record, so this parameter can be set based on an approximate estimate - the value of max_heap_table_size divided by 1.5. If indexing works without errors, then the parameter can be increased, if errors occur, then reduced.

Shorter-it should be the maximum at which indexing errors do not occur.

Add a Comment

Login with GitHub to post a comment