Feature PR-5.0-dev PR-5.2-dev Pending

User tests: Successful: Unsuccessful:

avatar scout507
scout507
24 Aug 2021

Introduction

While testing the performance of the smart search using a large amount of articles (10.000+) we noticed a big increase in response time. Checking the queries with the debug tool revealed two queries which were performing quite poorly, taking up around 1.5s of the total 1.7s response time (12.000 articles). This effect scales up with an increasing amount of articles. In this PR we replaced these two queries with more efficent ones, reducing the total response time of these queries to approximately 350ms (from 1.5s).

Summary of Changes

We replaced the old query, which was responsible for searching the db given one or multiple search terms and returning the weighted results:

SELECT l.link_id, l.object,SUM(m.weight) AS ordering
FROM #__finder_links AS l
INNER JOIN `#__finder_links_terms` AS m ON m.link_id = l.link_id
WHERE `l`.`access` IN (:preparedArray1,:preparedArray2,:preparedArray3) AND l.state = 1 AND l.published = 1 AND (l.publish_start_date IS NULL OR l.publish_start_date <= '2021-08-24 12:03:00') AND (l.publish_end_date IS NULL OR l.publish_end_date >= '2021-08-24 12:03:00') AND m.term_id IN (18,103)
GROUP BY l.link_id,l.object
HAVING SUM(CASE WHEN m.term_id IN (18) THEN 1 ELSE 0 END) > 0 AND SUM(CASE WHEN m.term_id IN (103) THEN 1 ELSE 0 END) > 0
ORDER BY ordering DESC LIMIT 20

With an updated, faster query:

SELECT l.link_id, l.object,m.weight AS ordering
FROM (
	SELECT link_id, SUM(weight) as weight
	FROM #__finder_links_terms
	WHERE term_id IN (18,103)
	GROUP BY link_id
	HAVING COUNT(link_id) > 1) AS m
INNER JOIN `#__finder_links` AS l ON l.link_id = m.link_id
WHERE `l`.`access` IN (:preparedArray1,:preparedArray2,:preparedArray3) AND l.state = 1 AND l.published = 1 AND (l.publish_start_date IS NULL OR l.publish_start_date <= '2021-08-24 12:04:00') AND (l.publish_end_date IS NULL OR l.publish_end_date >= '2021-08-24 12:04:00')
ORDER BY ordering DESC LIMIT 20

The difference in performance comes from grouping the data before joining the two large tables. We also reduced the GROUP BY to only group by l.link_id, since there are no duplicate l.object for any given l.link_id because both are part of the #_finder_links table, where link_id is the primary key.

This also has the positive side effect, that the new query is more efficient to use with the _getListCount($query) of the BaseDatabaseModel-class.

These changes were implemented in the getListQuery() of the SearchModel-class. We've also refactored the method to work with the new query.

Testing Instructions

Test this on a clean version of 4.0

  1. Create a large amount of articles (overload), preferably 10k+.
  2. Activate the debug tool: System -> Global Config -> Debug System on
  3. Create a smart search Content->Site Modules->New->Smart Search , select a position
  4. Test the search with various terms ('et' should be the worst case for overload), also try the advanced filters
  5. While testing take a look at the debug tool and note the time
  6. Checkout this PR and test the search again (step 4&5)
  7. Check for difference in loading time and if the results are the same

Actual result BEFORE applying this Pull Request

The loading time is pretty slow (approximately 1.5 s for 12k articles). These were the two responsible queries before the changes:

grafik

grafik

Expected result AFTER applying this Pull Request

The loading time should be significantly lower and the search results should be the same. Also these two queries should appear:

grafik

grafik

Documentation Changes Required

Probably not.

Contributers

@socke300 @Waleet

Thanks to @Hackwar for answering our questions.

avatar scout507 scout507 - open - 24 Aug 2021
avatar scout507 scout507 - change - 24 Aug 2021
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 24 Aug 2021
Category Front End com_finder
avatar scout507 scout507 - change - 24 Aug 2021
The description was changed
avatar scout507 scout507 - edited - 24 Aug 2021
avatar PhilETaylor
PhilETaylor - comment - 24 Aug 2021

Please can you post the EXPLAIN results for each query (The before and after ones), as run, with your data set, as a screenshot.

avatar scout507
scout507 - comment - 24 Aug 2021

@PhilETaylor

Old:

grafik

New:

grafik

avatar scout507 scout507 - change - 24 Aug 2021
The description was changed
avatar scout507 scout507 - edited - 24 Aug 2021
avatar richard67
richard67 - comment - 25 Aug 2021

At a first view the change looks valid to me. I have to check more detailed when I find enough time.

For those who like to understand better: The thing in the new query in the "FROM" clause is not a subquery, it is an instant view. It would be a subquery if it would appear in the "WHERE" clause.

Instant views can be used to reduce the result set at an early point in a query and - if done well and having indexes where needed - can improve performance a lot according to my experience, in opposite to subqueries which can be bad for performance.

avatar scout507
scout507 - comment - 25 Aug 2021

@richard67

Thanks for the feedback and taking the time to provide a more detailed explanation!

avatar scout507 scout507 - change - 25 Aug 2021
Labels Added: ?
avatar PhilETaylor
PhilETaylor - comment - 25 Aug 2021

This new part

SELECT link_id, SUM(weight) as weight
	FROM #__finder_links_terms
	WHERE term_id IN (18,103)
	GROUP BY link_id
	HAVING COUNT(link_id) > 1)

is still pulling back almost 26,000 rows though...

avatar socke300
socke300 - comment - 25 Aug 2021

@PhilETaylor

This new part

SELECT link_id, SUM(weight) as weight
	FROM #__finder_links_terms
	WHERE term_id IN (18,103)
	GROUP BY link_id
	HAVING COUNT(link_id) > 1)

is still pulling back almost 26,000 rows though...

Yes, there are still 26.000 rows which have to be searched but it is still more performant because the new behaviour explained here.

avatar Quy
Quy - comment - 6 Feb 2022
avatar HLeithner
HLeithner - comment - 27 Jun 2022

This pull request has automatically rebased to 4.2-dev.

avatar HLeithner
HLeithner - comment - 2 May 2023

This pull request has been automatically rebased to 4.3-dev.

avatar obuisard obuisard - change - 9 Jun 2023
Title
[4.0] Smart search performance improvement
[5.0] Smart search performance improvement
avatar obuisard obuisard - edited - 9 Jun 2023
avatar HLeithner
HLeithner - comment - 24 Apr 2024

This pull request has been automatically rebased to 5.2-dev.

avatar HLeithner HLeithner - change - 24 Apr 2024
Title
[5.0] Smart search performance improvement
[5.2] Smart search performance improvement
avatar HLeithner HLeithner - edited - 24 Apr 2024
avatar Hackwar Hackwar - change - 20 Jul 2024
Status Pending Closed
Closed_Date 0000-00-00 00:00:00 2024-07-20 21:55:58
Closed_By Hackwar
Labels Added: Feature PR-5.0-dev PR-5.2-dev
Removed: ?
avatar Hackwar
Hackwar - comment - 20 Jul 2024

I'm really sorry for keeping you waiting for so long. I wanted to look at this several times in the last few years, but I feared that I wouldn't come with good news and thus procrastinated reviewing this PR. Unfortunately this wasn't entirely unfounded, since this PR unfortunately does not work. I tried a search in our testing sampledata with two taxonomies and the result was, that I only got one result instead of the 12 I expected. I also didn't see a way to fix that and thus unfortunately have to refuse this PR. Thank you for your work and I hope maybe you still want to contribute in the future.

avatar Hackwar Hackwar - close - 20 Jul 2024

Add a Comment

Login with GitHub to post a comment