User tests: Successful: Unsuccessful:
Smart Search right now is broken for Postgres. This doesn't fix everything, but I hope with these issues fixed, solving the other issues should become easier.
In the frontend, we are clearing most of our query at one point in time. However with the parameterized queries, this fails since the bound parameters are still present. So we have to clear that one as well.
Unfortunately now, the search in frontend still doesn't work with Postgres, since it fails on the HAVING clause. If someone can help here, that would be good. This should be solved now.
In the code for the postgres indexing, a bunch of errors have crept in. This fixes those and while doing just that, I noticed something: The code is identical except for just a few lines, so we can thus drop the whole db driver system and simplify the code a lot.
@wilsonge Can we merge this without fixing the frontend search completely? I hope that I can find someone who can fix this when they see the whole code...
Status | New | ⇒ | Pending |
Category | ⇒ | Administration com_finder Front End |
That is not the only place we are adding to the having clause.
@Hackwar I am busy with real job now, so won't have time before tonight or weekend, but maybe you can test following.
I think it should be 2 having clauses and not the wrong one with summing up a boolean expression (IN clause):
$query->having('t.node_id IN (' . implode(',', $groups[$i]) . ')');
$query->having('SUM(t.node_id) > 0');
Can you test if this works on PostgreSQL and verify if it delivers the same result on MySQL as the existing stuff does?
as this pr fix the index from backend in postgresql + more clean code + etc....
i wish to have this merged asap
so after we can look at:
I don't really understand what shall be achieved with this ... maybe you can give me a help?
Even it if shall not be solved with this PR, but maybe we can solve it soon?
@richard67 i'm in your same shoes here..... better @Hackwar can explain us "what shall be achieved" with that query
// Iterate through each taxonomy group.
for ($i = 0, $c = count($groups); $i < $c; $i++)
{
$query->having('SUM(t.node_id IN (' . implode(',', $groups[$i]) . ')) > 0');
}
smells wrong, but , again i still don't understand the goal of that query ....
I have tested this item
index from backend works now on postgresql
Labels |
Added:
?
|
The whole having clauses thing is related to making the query performant and making sure that all taxonomies and terms are included in the query result.
We have taxonomies like author, category, tags and we have the search phrase. The search phrase splits up into its terms and for each term we can have several hits in the terms table with stemmed terms, etc.
Consider a search for "Joomla awesome release" in the categories "Documentation" and "News" from the author "Hannes". Those taxonomies are all stored in the same table. The search phrase would be split up into its terms and by stemming the terms, we would end up with matches on "Joomla", "joomlaians", "awesome", "awesomeness", "release", "releaser", "released". We want only search results which contain a match for every term group (the term itself and all stemmed versions of it), which are from the author and which are from at least one of the 2 categories we selected. To achieve that in 3.x, we did a join for each taxonomy group and for every term group. So when I want to DoS a website, I just have to search for a 20-word-sentence and the database is cooking with a query with 20 joins...
In 4.0 I rewrote this to only join once for the taxonomies and once for the terms mapping table. In the where clause we restrict the result set down to whatever matches to any of the matching terms (so it matches not just to "Joomla", "awesome" and "release", but also to only "Joomla", only "awesome" and only "release") and then in the having clause, we require that it has to have at least one match for every term group. For this, we do a SUM() over the checks if the taxonomies or terms are in the results and if that is >0, we are good.
And while I'm writing all this, I found the solution for this and now it all works. Woohoo! Please test both for MySQL and Postgres, also with more complex searches.
while testing i've found some issues with bytea and Gather Search Statistics enabled
sended a fix
Hackwar#46
Hackwar#47
Merged both. Good catch.
I have tested this item
Tested with MySQLi, MySQL (PDO) and PostgreSQL (PDO).
Search in frontend works, also if search term is not found.
Backend works (index rebuild, gathering statistics).
No SQL errors in database server log file.
No PHP errors or warnings or notices in PHP log file.
I have tested this item
Status | Pending | ⇒ | Ready to Commit |
rtc
@alikon @richard67 what I don't know enough about postgres is the pg_escape_bytea
calls in the postgres files that we are removing here. That is going to be escaping quotes in some way. I want to be sure we're not going to be allowing in some sort of weird SQL Injection. I suspect with prepared statements we might now be covered by this. But I'm not really comfortable enough with postgres to be sure
for what i can tell you we should be covered by:
https://github.com/joomla/joomla-cms/pull/29428/files#diff-7690a193a5a0fe8abc5651f7bddce847R95
->bind(2, $entry->query, ParameterType::LARGE_OBJECT)
Status | Ready to Commit | ⇒ | Fixed in Code Base |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2020-06-11 20:51:26 |
Closed_By | ⇒ | wilsonge | |
Labels |
Added:
?
|
Thanks!
@alikon Could you have a look on the
having
clause mentioned in the description of this PR?You can find it here:
https://github.com/joomla/joomla-cms/blob/4.0-dev/components/com_finder/src/Model/SearchModel.php#L175
I tried it already, and to me it seems to be wrong, but I dont know how it was intended to work.
The
SUM(t.node_id IN (' . implode(',', $groups[$i]) . '))
definitely means to build the sum of a boolean expression,SUM(t.node_id IN (1,2,3))
, witht.node_id IN (1,2,3)
being the boolean condition. MySQL seems to automatically cast the boolean condition to integer for summing it up, PostgreSQL correctly doesn't do such a sh..t.I don't really understand what shall be achieved with this ... maybe you can give me a help?
Even it if shall not be solved with this PR, but maybe we can solve it soon?