PR-5.2-dev Pending

User tests: Successful: Unsuccessful:

avatar Hackwar
Hackwar
9 Sep 2024

Summary of Changes

Smart Search in certain conditions fails the indexing process with a violation of the unique index on #__finder_terms. This is because of these wrong and overly complex queries. I actually can't explain why this is the case right now... I will investigate this further.

Testing Instructions

I also don't have instructions on how to reliably recreate this right now.

Actual result BEFORE applying this Pull Request

Expected result AFTER applying this Pull Request

Link to documentations

Please select:

  • Documentation link for docs.joomla.org:

  • No documentation changes for docs.joomla.org needed

  • Pull Request link for manual.joomla.org:

  • No documentation changes for manual.joomla.org needed

avatar Hackwar Hackwar - open - 9 Sep 2024
avatar Hackwar Hackwar - change - 9 Sep 2024
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 9 Sep 2024
Category Administration com_finder
avatar richard67
richard67 - comment - 9 Sep 2024

I think this change is wrong and will cause an SQL error at least on PostgreSQL (which applies SQL syntax rules more strictly than MySQL or MariaDB, but the rule applies to standard SQL in general):

In a group by statement, columns which are not in the group by columns shall not appear in the select list of columns except if they are used with an aggregate function.

Here you remove columns from the group by but keep them without aggregate function in the select list. This violates the rule.

@alikon may confirm.

avatar richard67
richard67 - comment - 9 Sep 2024

P.S.: Here a link to an explanation of that rule for Oracle SQL, but this applies to ALL SQL databases following SQL standards, only that MariaDB and MySQL might be less strict with raising an error when that rule is violated due to compatibility with older versions which did not follow SQL standards: https://stackoverflow.com/questions/20074562/group-by-without-aggregate-function

avatar richard67
richard67 - comment - 9 Sep 2024

P.P.S. I know the Joomla core violates that rule at many places, but that doesn’t mean that it’s right and should be done more often.

avatar richard67
richard67 - comment - 9 Sep 2024

Proof of my previous comments is screenshot from system tests on PostgreSQL:

avatar richard67 richard67 - test_item - 9 Sep 2024 - Tested unsuccessfully
avatar richard67
richard67 - comment - 9 Sep 2024

I have tested this item 🔴 unsuccessfully on 26a46fe

Causes SQL error at least on PostgreSQL and possibly also on new 8.x MySQL versions, see my previous comments.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/44051.

avatar Hackwar Hackwar - change - 10 Sep 2024
Labels Added: PR-5.2-dev
avatar Hackwar Hackwar - change - 10 Sep 2024
Title
[5.2] Smart Search: Use sane GROUP BY when filling terms
[5.2] Smart Search: Refactor Queries when filling terms
avatar Hackwar Hackwar - edited - 10 Sep 2024
avatar Hackwar Hackwar - change - 10 Sep 2024
The description was changed
avatar Hackwar Hackwar - edited - 10 Sep 2024
avatar richard67
richard67 - comment - 10 Sep 2024

PostgreSQL system tests still failing. Not the usual random thing so it’s related to this PR.

avatar softarius
softarius - comment - 30 Oct 2024

If into potencial danger INSERT query (by primary key) to add predicate ON DUPLICATE for MySQL or ON CONFLICT for PostgreSQL it will be safe.

Violation just can not rise.

What about this code?

Joomla\Component\Finder\Administrator\Indexer index method

        $ondubl='';
        if ($serverType=='mysql') {$ondubl=' ON DUPLICATE KEY UPDATE links=links';}
        if ($serverType=='postgresql') {$ondubl=' ON CONFLICT (term_id) DO NOTHING';}

        $db->setQuery(
            'INSERT INTO ' . $db->quoteName('#__finder_terms') .
            ' (' . $db->quoteName('term') .
            ', ' . $db->quoteName('stem') .
            ', ' . $db->quoteName('common') .
            ', ' . $db->quoteName('phrase') .
            ', ' . $db->quoteName('weight') .
            ', ' . $db->quoteName('soundex') .
            ', ' . $db->quoteName('language') . ')' .
            ' SELECT ta.term, ta.stem, ta.common, ta.phrase, ta.term_weight, SOUNDEX(ta.term), ta.language' .
            ' FROM ' . $db->quoteName('#__finder_tokens_aggregate') . ' AS ta' .
            ' WHERE ta.term_id = 0' .
            ' GROUP BY ta.term, ta.stem, ta.common, ta.phrase, ta.term_weight, SOUNDEX(ta.term), ta.language' 
            . $ondubl 

        );

I check it in MySQL 8.0.39 and PostgreSQL 16.4.
It works.

If conflict violate on unique index (term, language) then modify on ON CONFLICT (term, language) for PostgreSQL
and insert ignore for MySQL
Like this

       $ondubl='';
        $ignore='';
        if ($serverType=='mysql') { $ignore=' IGNORE '; }
        if ($serverType=='postgresql') {$ondubl=' ON CONFLICT (term, language) DO NOTHING';}

        $db->setQuery(
            "INSERT $ignore INTO " . $db->quoteName('#__finder_terms') .
            ' (' . $db->quoteName('term') .
            ', ' . $db->quoteName('stem') .
            ', ' . $db->quoteName('common') .
            ', ' . $db->quoteName('phrase') .
            ', ' . $db->quoteName('weight') .
            ', ' . $db->quoteName('soundex') .
            ', ' . $db->quoteName('language') . ')' .
            ' SELECT ta.term, ta.stem, ta.common, ta.phrase, ta.term_weight, SOUNDEX(ta.term), ta.language' .
            ' FROM ' . $db->quoteName('#__finder_tokens_aggregate') . ' AS ta' .
            ' WHERE ta.term_id = 0' .
            ' GROUP BY ta.term, ta.stem, ta.common, ta.phrase, ta.term_weight, SOUNDEX(ta.term), ta.language' 
            . $ondubl 
        );

avatar alikon
alikon - comment - 31 Oct 2024

as it is now the pr for the postgresql it's working

avatar alikon
alikon - comment - 31 Oct 2024

i would like to avoid ON DUPLICATE or ON CONFLICT

avatar softarius
softarius - comment - 31 Oct 2024

i would like to avoid ON DUPLICATE or ON CONFLICT

On PG it's possible. ON CONFLICT works on unique index. MySQL ON DUBLICATE works only for primary key, not unique index . On MySQL may use REPLACE INTO.

What about it?

        $insert='INSERT ';
        $ondubl='';
        if ($serverType=='mysql') { $insert=' REPLACE'; }
        if ($serverType=='postgresql') {$ondubl=' ON CONFLICT (term, language) DO NOTHING';}

        $db->setQuery(
            "$insert INTO " . $db->quoteName('#__finder_terms') .
            ' (' . $db->quoteName('term') .
            ', ' . $db->quoteName('stem') .
            ', ' . $db->quoteName('common') .
            ', ' . $db->quoteName('phrase') .
            ', ' . $db->quoteName('weight') .
            ', ' . $db->quoteName('soundex') .
            ', ' . $db->quoteName('language') . ')' .
            ' SELECT ta.term, ta.stem, ta.common, ta.phrase, ta.term_weight, SOUNDEX(ta.term), ta.language' .
            ' FROM ' . $db->quoteName('#__finder_tokens_aggregate') . ' AS ta' .
            ' WHERE ta.term_id = 0' .
            ' GROUP BY ta.term, ta.stem, ta.common, ta.phrase, ta.term_weight, SOUNDEX(ta.term), ta.language' 
            . $ondubl 
        );

It works for me on PG and MySQL

Add a Comment

Login with GitHub to post a comment