Failure

User tests: Successful: Unsuccessful:

avatar Hackwar
Hackwar
20 Nov 2024

Summary of Changes

Smart Search currently can fail under certain conditions when two terms would actually be handled as identical due to collation, but because of a bug are treated as 2 separate terms. This results in the unique index for term, language being violated for the terms table. This PR fixes the GROUP BY statement.

Testing Instructions

Unfortunately, this is rather difficult to reproduce. I had content which contained the words messsystem and meßsystem, which triggered the problem on one server, but then again I can't reproduce it locally. So... Codereview?

Actual result BEFORE applying this Pull Request

Expected result AFTER applying this Pull Request

Link to documentations

Please select:

  • Documentation link for docs.joomla.org:

  • No documentation changes for docs.joomla.org needed

  • Pull Request link for manual.joomla.org:

  • No documentation changes for manual.joomla.org needed

avatar Hackwar Hackwar - open - 20 Nov 2024
avatar Hackwar Hackwar - change - 20 Nov 2024
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 20 Nov 2024
Category Administration com_finder
avatar Hackwar
Hackwar - comment - 20 Nov 2024

Some more explanation: This problem results in an exception which at least aborts the mass-indexing in Smart Search and potentially also creates a fatal error during saving of the content in question. While content is already saved, it is still rather bad that we get a fatal error during that process. I've encountered this on several occasions in the past, especially in combination with Falang (which probably mainly is because sites with Falang have more non-english content) Unfortunately it isn't as simple as putting those similar words into an article and saving it. I couldn't find out how to reproduce the problem or how to create a minimal example.

The main problem seems to be, that the GROUP BY differentiates the terms based on (among other things) the weight and thus thinks that two terms are different, even though they are actually identical by collations standards.

Add a Comment

Login with GitHub to post a comment