? Pending

User tests: Successful: Unsuccessful:

avatar Hackwar
Hackwar
12 May 2021

Summary of Changes

In real life tests with rather large data sets and complex search queries, it turned out that there are some performance improvements possible in Smart Search. This PR improves this by changes to the database search query.

The database query results of Smart Search consist of 2 parts: The row ID belonging to the result to display and a serialized object of what to display as a result. This object can contain a lot of data, like the whole text of an article, prices, publishing dates, tags, categories and potentially lots of other data.

Right now, when retrieving the results, the binary blob column object is always "dragged along" during the calculations, which can result in long running queries. Basically, the database will first load all rows including that column and then filter all of it down to what we actually want. This change does not load the object column initially, since we don't need it to calculate the result set, but attaches it later on with a merge. The initial calculation of the result set is done in the $query variable, which is then used as a subquery in the $wrappingQuery. In this wrapping query, the links table is joined to the results again and instead of the content of the object column of all rows, we are now only loading it for those rows which we are actually going to return. In order to really only load those rows which we currently need, the subquery is using a LIMIT in the query, which requires additional code to properly calculate the total number of results.

The access check is changed to not use bound parameters, since the DatabaseQuery code here fails otherwise. For the total count of results, the query with the wrapping query is left in place, even though this could be simplified to simply returning the $query variable. However I feel it is easier to debug when in the frontend you still have mostly the same query. The retrieve() and store() methods were removed because we actually aren't using them anymore. These are from a time long gone.

Testing Instructions

  1. Setup a site with a large index. I've tested this on a data set from a customer with around 8000 articles, etc.
  2. Test the search in the frontend with a complex search query. A complex search query would not just be a search word, but further filtering by category, author or similar.

Actual result BEFORE applying this Pull Request

Retrieving the search results takes some time. In my specific tests it took more than 7 seconds to just run the query on the database.

Expected result AFTER applying this Pull Request

Retrieving the results is a lot faster. In my specific tests it took slightly more than 2 seconds to run the query on the database.

Explanation to counter criticism

Please be aware that the above mentioned times are from my personal development setup and don't represent the performance of Smart Search in real life. My setup is slow and intentionally not optimized. The search query I used is intentionally complex, specifically to measure the effect of changes like this one. On a live system the times are much more reasonable. I'm noting them here to give a feeling of the performance improvements we can expect.

avatar Hackwar Hackwar - open - 12 May 2021
avatar Hackwar Hackwar - change - 12 May 2021
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 12 May 2021
Category Front End com_finder
avatar PhilETaylor
PhilETaylor - comment - 13 May 2021

If you are interested, you might want to look at these performance and security issues too :) https://github.com/joomla/joomla-cms/issues?q=is%3Aopen+is%3Aissue+author%3APhilETaylor+finder+

avatar Hackwar Hackwar - change - 10 Aug 2021
Title
[4.0] Performance optimizations for Smart Search search query
[4.1] Performance optimizations for Smart Search search query
avatar Hackwar Hackwar - edited - 10 Aug 2021
avatar Hackwar Hackwar - change - 19 Jan 2022
Status Pending Closed
Closed_Date 0000-00-00 00:00:00 2022-01-19 09:41:55
Closed_By Hackwar
Labels Added: ?
avatar Hackwar Hackwar - close - 19 Jan 2022

Add a Comment

Login with GitHub to post a comment