User tests: Successful: Unsuccessful:
This PR improves the indexing performance of Smart Search in Joomla 3 by inserting smaller chunks into the database at once and generally using a slightly different receipe to handle the insertion of terms into the database. In my small tests here, it reduced the time to index the content of the testing data from around 16s to 13s.
Use a website with content and smart search setup. The more content, the better. In the backend, go to the Smart Search component and clear the index. Go into the command line and call php cli/finder_indexer.php
to run the indexing process. Notice the time it takes. Run these steps several times to get a reliable time for the indexing process.
Now apply the patch and again do these steps several times.
Content is indexed, but it takes long.
Content is still indexed, but it is about 20% quicker.
Status | New | ⇒ | Pending |
Category | ⇒ | Administration com_finder |
I will deploy another local copy of the site and experiment on it for some time.
After that, I will publish the report here.
I will deploy another local copy of the site and experiment on it for some time.
After that, I will publish the report here.
@rigin Thanks in advance. I don't expect all to be solved what you have mentioned. But there should be a performance gain like described above in the description.
I have tested this item
Before
Total Processing Time: 65.279 seconds
Peak memory usage: 33,554,432 bytes
After:
Total Processing Time: 47.213 seconds
Peak memory usage: 33,554,432 bytes
Batch 1 & 2 are sightly slower, then batches 3 - 8 are up to 40% faster
Thank you
after a second run Batch 1 & 2 are not slower, all good :-)
I am currently making a series of 3 measurements.
Tomorrow or the day after tomorrow it will be ready.
I have tested this item
Good afternoon.
Measurement results.
The original version.
Total Processing Time: 48880.113 seconds.
Peak memory usage: 25,165,824 bytes
Total Processing Time: 48767.392 seconds.
Peak memory usage: 25,165,824 bytes
Total Processing Time: 45462.329 seconds.
Peak memory usage: 23,068,672 bytes
Total Processing Time: 44495.304 seconds.
Peak memory usage: 23,068,672 bytes
After the changes.
Total Processing Time: 38031.2 seconds.
Peak memory usage: 23,068,672 bytes
Total Processing Time: 37066.64 seconds.
Peak memory usage: 23,068,672 bytes
Total Processing Time: 37427.362 seconds.
Peak memory usage: 23,068,672 bytes
Sorry for the delay. During the measurement process, my MariaDB settings were reset to the default state. (just a coincidence) I had to repeat the tests. But, the results have not changed much.
Also, I tested my database using the mysqltuner utility.
It issues a recommendation there.
join_buffer_size (> 100.0M, or always use indexes with joins)
I have a question.
Is it possible to modify the cli script so that it performs indexing when the content plugin is disabled?
So that the user can save materials without indexing, and simultaneously conduct the indexing process using the cli script.
And also, whether it is possible to change the algorithm for selecting content for indexing, so that it discards materials that have a change time older than the time of index creation. So that the cli script indexes only the changed materials.
Status | Pending | ⇒ | Ready to Commit |
Labels |
Added:
?
|
RTC
I've not tested this in Joomla 3, but in Joomla 4 you don't need to have the content plugin enabled for the CLI script to work.
The indexer will not reindex content items which have not changed since last indexing. This is done through calculating the content to index and then comparing an MD5 hash of the current object and the one previously indexed. Processing the items to the point where the indexer can index them will take some time, but it should be at least 10 times faster than actually indexing it completely. Unfortunately the last-changed timestamp is not reliable and is not present in every component. You can do optimisations like this in custom finder plugins for your specific site if you want.
I tested, the script just scrolls the content without indexing.
you need to disable "Content - Smart Search" plugin, it responsible to index "on save".
but the plugin "Smart Search - Content" must be enabled, it responsible for Content indexing in general.
I figured out the reasons for the brakes.
I had the "Table Memory limit" parameter set too low in the smart search settings.
And accordingly, the #__finder_tokens and #__finder_tokens_aggregate tables were switched to MyISAM mode.
Here I described the situation and ways to solve it (in Russian)
Status | Ready to Commit | ⇒ | Fixed in Code Base |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2021-05-17 12:37:09 |
Closed_By | ⇒ | HLeithner | |
Labels |
Added:
?
|
thanks
When indexing for smart search, sometimes there is an overflow error in the temporary tables #__finder_tokens and #__finder_tokens_aggregate .
These tables are designed for caching temporary data and have the MEMORY type, and it is banal that this memory itself is not enough.
MYSQL just ran out of memory for placing data in the heap. This can be easily changed in the MYSQL settings (in Linux, the default is /etc/my. cnf). The max_heap_table_size parameter is responsible for this-the maximum allowed size of the table stored in memory (of the MEMORY type). Its default value is 16 MB. Change it to max_heap_table_size=256M (for example)
However, there is an additional one. complexity, purely administrative-this parameter is usually the responsibility of the hosting administrator )))
But if you write the right letter, it will be changed there.
Using the "Table Memory Limit" parameter.
In the Smart search settings, on the Indexing tab, there is a parameter called "Table memory limit". Its essence is to prevent this error, although at the expense of a sharp decrease in the speed of the algorithm.
The indexing algorithm takes into account the number of records in this temporary table when scanning each material, and if this number exceeds this parameter, the table is converted to the MyISAM type, which prevents this error, but dramatically reduces the speed of the algorithm.
According to my estimates, each record in these tables requires approximately 1.2-1.5 KB per record, so this parameter can be set based on an approximate estimate - the value of max_heap_table_size divided by 1.5. If indexing works without errors, then the parameter can be increased, if errors occur, then reduced.
Shorter-it should be the maximum at which indexing errors do not occur.
@rigin Could you test this PR here? Thanks in advance.