? ? Pending

User tests: Successful: Unsuccessful:

avatar Hackwar
Hackwar
4 Jun 2020

Smart Search right now is broken for Postgres. This doesn't fix everything, but I hope with these issues fixed, solving the other issues should become easier.

Clearing where without bounded

In the frontend, we are clearing most of our query at one point in time. However with the parameterized queries, this fails since the bound parameters are still present. So we have to clear that one as well.
Unfortunately now, the search in frontend still doesn't work with Postgres, since it fails on the HAVING clause. If someone can help here, that would be good. This should be solved now.

Indexing doesn't work in Postgres

In the code for the postgres indexing, a bunch of errors have crept in. This fixes those and while doing just that, I noticed something: The code is identical except for just a few lines, so we can thus drop the whole db driver system and simplify the code a lot.

@wilsonge Can we merge this without fixing the frontend search completely? I hope that I can find someone who can fix this when they see the whole code...

avatar Hackwar Hackwar - open - 4 Jun 2020
avatar Hackwar Hackwar - change - 4 Jun 2020
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 4 Jun 2020
Category Administration com_finder Front End
avatar richard67
richard67 - comment - 4 Jun 2020

@alikon Could you have a look on the having clause mentioned in the description of this PR?

You can find it here:
https://github.com/joomla/joomla-cms/blob/4.0-dev/components/com_finder/src/Model/SearchModel.php#L175

I tried it already, and to me it seems to be wrong, but I dont know how it was intended to work.

The SUM(t.node_id IN (' . implode(',', $groups[$i]) . ')) definitely means to build the sum of a boolean expression, SUM(t.node_id IN (1,2,3)), with t.node_id IN (1,2,3) being the boolean condition. MySQL seems to automatically cast the boolean condition to integer for summing it up, PostgreSQL correctly doesn't do such a sh..t.

I don't really understand what shall be achieved with this ... maybe you can give me a help?

Even it if shall not be solved with this PR, but maybe we can solve it soon?

avatar Hackwar
Hackwar - comment - 5 Jun 2020

That is not the only place we are adding to the having clause.

avatar richard67
richard67 - comment - 5 Jun 2020

@Hackwar I am busy with real job now, so won't have time before tonight or weekend, but maybe you can test following.
I think it should be 2 having clauses and not the wrong one with summing up a boolean expression (IN clause):

$query->having('t.node_id IN (' . implode(',', $groups[$i]) . ')');
$query->having('SUM(t.node_id) > 0');

Can you test if this works on PostgreSQL and verify if it delivers the same result on MySQL as the existing stuff does?

avatar alikon
alikon - comment - 6 Jun 2020

as this pr fix the index from backend in postgresql + more clean code + etc....
i wish to have this merged asap
so after we can look at:

I don't really understand what shall be achieved with this ... maybe you can give me a help?
Even it if shall not be solved with this PR, but maybe we can solve it soon?

@richard67 i'm in your same shoes here..... better @Hackwar can explain us "what shall be achieved" with that query

to me
https://github.com/joomla/joomla-cms/blob/4.0-dev/components/com_finder/src/Model/SearchModel.php#L172-L176

// Iterate through each taxonomy group.
for ($i = 0, $c = count($groups); $i < $c; $i++)
{
	$query->having('SUM(t.node_id IN (' . implode(',', $groups[$i]) . ')) > 0');
}

smells wrong, but , again i still don't understand the goal of that query ....

avatar alikon alikon - test_item - 6 Jun 2020 - Tested successfully
avatar alikon
alikon - comment - 6 Jun 2020

I have tested this item successfully on 6a0cc68

index from backend works now on postgresql


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/29428.

avatar Hackwar Hackwar - change - 6 Jun 2020
Labels Added: ?
avatar richard67
richard67 - comment - 6 Jun 2020

@Hackwar Yes, I think the last change was it. LGTM.

avatar Hackwar
Hackwar - comment - 6 Jun 2020

The whole having clauses thing is related to making the query performant and making sure that all taxonomies and terms are included in the query result.

We have taxonomies like author, category, tags and we have the search phrase. The search phrase splits up into its terms and for each term we can have several hits in the terms table with stemmed terms, etc.
Consider a search for "Joomla awesome release" in the categories "Documentation" and "News" from the author "Hannes". Those taxonomies are all stored in the same table. The search phrase would be split up into its terms and by stemming the terms, we would end up with matches on "Joomla", "joomlaians", "awesome", "awesomeness", "release", "releaser", "released". We want only search results which contain a match for every term group (the term itself and all stemmed versions of it), which are from the author and which are from at least one of the 2 categories we selected. To achieve that in 3.x, we did a join for each taxonomy group and for every term group. So when I want to DoS a website, I just have to search for a 20-word-sentence and the database is cooking with a query with 20 joins...

In 4.0 I rewrote this to only join once for the taxonomies and once for the terms mapping table. In the where clause we restrict the result set down to whatever matches to any of the matching terms (so it matches not just to "Joomla", "awesome" and "release", but also to only "Joomla", only "awesome" and only "release") and then in the having clause, we require that it has to have at least one match for every term group. For this, we do a SUM() over the checks if the taxonomies or terms are in the results and if that is >0, we are good.

And while I'm writing all this, I found the solution for this and now it all works. Woohoo! Please test both for MySQL and Postgres, also with more complex searches.

avatar richard67
richard67 - comment - 6 Jun 2020

@alikon Could you test it again? Thanks in advance.

avatar alikon
alikon - comment - 7 Jun 2020

while testing i've found some issues with bytea and Gather Search Statistics enabled
sended a fix
Hackwar#46
Hackwar#47

avatar Hackwar
Hackwar - comment - 7 Jun 2020

Merged both. Good catch.

avatar richard67
richard67 - comment - 7 Jun 2020

@alikon Well, even if it was your fix: Could you test again so we have it counted on GitHub as a good test, too? It resets the counter after a new commit.

avatar richard67 richard67 - change - 7 Jun 2020
The description was changed
avatar richard67 richard67 - edited - 7 Jun 2020
avatar richard67 richard67 - test_item - 7 Jun 2020 - Tested successfully
avatar richard67
richard67 - comment - 7 Jun 2020

I have tested this item successfully on 24e8633

Tested with MySQLi, MySQL (PDO) and PostgreSQL (PDO).
Search in frontend works, also if search term is not found.
Backend works (index rebuild, gathering statistics).
No SQL errors in database server log file.
No PHP errors or warnings or notices in PHP log file.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/29428.

avatar alikon alikon - test_item - 7 Jun 2020 - Tested successfully
avatar alikon
alikon - comment - 7 Jun 2020

I have tested this item successfully on 24e8633


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/29428.

avatar alikon alikon - change - 7 Jun 2020
Status Pending Ready to Commit
avatar alikon
alikon - comment - 7 Jun 2020

rtc


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/29428.

avatar wilsonge
wilsonge - comment - 7 Jun 2020

@alikon @richard67 what I don't know enough about postgres is the pg_escape_bytea calls in the postgres files that we are removing here. That is going to be escaping quotes in some way. I want to be sure we're not going to be allowing in some sort of weird SQL Injection. I suspect with prepared statements we might now be covered by this. But I'm not really comfortable enough with postgres to be sure

avatar richard67
richard67 - comment - 7 Jun 2020

@wilsonge I don't know much about this either, so let's hope @alikon can answer that.

avatar alikon
alikon - comment - 11 Jun 2020

for what i can tell you we should be covered by:
https://github.com/joomla/joomla-cms/pull/29428/files#diff-7690a193a5a0fe8abc5651f7bddce847R95
->bind(2, $entry->query, ParameterType::LARGE_OBJECT)

avatar wilsonge wilsonge - change - 11 Jun 2020
Status Ready to Commit Fixed in Code Base
Closed_Date 0000-00-00 00:00:00 2020-06-11 20:51:26
Closed_By wilsonge
Labels Added: ?
avatar wilsonge wilsonge - close - 11 Jun 2020
avatar wilsonge wilsonge - merge - 11 Jun 2020
avatar wilsonge
wilsonge - comment - 11 Jun 2020

Thanks!

Add a Comment

Login with GitHub to post a comment