Smart Search currently adds a join for every taxonomy group and for every required search term. This is the naive way of making sure that each required taxonomy and each required search term is included. Unfortunately, that means for somewhat large indexes (in my specific case ~6000 indexed documents with half a million different terms and 2.2 million mappings between terms and documents) the load on the database system gets very high. If someone now just goes ahead and copies a sentence out of a text and searches for that with maybe 15 words, that makes 15 joins over those 2.2 million mappings, so > 30+ million rows to process (again, doing this naively. It works slightly differently internally.) Long story short, you can dDoS a large website by simply making a few dozen calls to the search with a long search query and the MySQL server will come to a total gridlock and die. This normally requires an admin to go in and restart MySQL to stop the table lock.
This change should fix that issue and improve the performance greatly. Instead of doing 15 joins for a 15 word search query, it does 1 join that contains all search terms and in the HAVING clause we are checking if for every search term, there is at least one match in the result. So now we are just doing an index lookup once on the mapping table and then in the HAVING clause we are just working on the (hopefully) small result set.
How to test?
Index a large site, at least a 1000 articles
Go to a random article and copy a longer sentence from it
Search for this sentence in Smart Search in the frontend.
See that it takes a very long time and maybe even your MySQL server crashes. hooray!
Apply the patch, run the search again.
See that you get a result in a reasonable timeframe.
I'm not seeing any tests here - and I think the best way to get tests is to merge it in plenty of time for stable and see what happens here. More than happy to revert if any issues get found testing it in the main trunk
I'm not seeing any tests here - and I think the best way to get tests is to merge it in plenty of time for stable and see what happens here. More than happy to revert if any issues get found testing it in the main trunk