? ? ? Pending

User tests: Successful: Unsuccessful:

avatar Hackwar
Hackwar
13 May 2018

[UPDATE] After the initial discussion, I changed this to make the tuple length of com_finder be configurable. This should make this less controversial.

Original text for the old PR

Finder is indexing not only every word, but also every combination of 2- and 3-word-tuples in the text to be indexed. This results in gigantic storage consumption by the index and a performance decrease when indexing of about 20%. At the same time, it only marginally improves the search results. When removing this feature, we are sacrificing the possibility to search for specific phrases of up to 3 terms (like "Joomla is great") for a reduction in necessary storage space of about 2/3. (and the performance increase in indexing)

How to test

Install 4.0-dev and go to the backend of Smart Search. Enable the Smart Search Plugin and start the indexing process. Notice that it takes a long time (preferably measure the time with a stopwatch). Go to the frontend and open the Smart Search menu item. Search for something and notice the results.
Now apply these changes and go into the backend of Smart Search. Clear the index and rebuild the index, again checking with a stopwatch. You should see a 20% performance increase here. Now go to the frontend and do the same search. You should get the same result.
Do a search for a specific phrase, wrapping the phrase in quotes. See that this does not work anymore.

Please also see #20185 for further improvements for com_finder.

I want to thank cloudaccess.net for supporting these changes. Without their generous help, this wouldn't be possible.

avatar Hackwar Hackwar - open - 13 May 2018
avatar Hackwar Hackwar - change - 13 May 2018
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 13 May 2018
Category Administration com_finder
avatar Hackwar Hackwar - change - 13 May 2018
Labels Added: ?
avatar chrisdavenport
chrisdavenport - comment - 13 May 2018

I'd be opposed to this change as it is removing a feature that is required on some sites. A better solution would be to make the number of words per tuple a configurable parameter of the indexer (default to 1 for new installations).

avatar Hackwar
Hackwar - comment - 13 May 2018

Making this configurable would mean yet another option and quite a bit more code for something that 98% out there don't need. At the same time I think that smart search is not and can not be the ultimate site search engine, simply because we are writing a CMS and not a search engine. This component is a feature for all those sites out there that need a slightly better search than com_search, but if you need something really sophisticated, use a pro tool like Elastic Search.

So I'm advocating bit less complexity here. Right now this component is so complex that no one really worked out the issues in the last 5 years. Don't force this to stay abandonware.

avatar Hackwar
Hackwar - comment - 21 May 2018

Just to list all current PRs for Finder/Smart Search:

  • #20185 [4.0] Removing artificial sharding from com_finder
  • #20384 [4.0] Removing terms tuples from com_finder
  • #20391 [4.0] Using language specific tokeniser and stemmer for com_finder
avatar laoneo
laoneo - comment - 28 May 2018

I'm here with @chrisdavenport. This is a feature I like. Perhaps we can think of a new option to enable/disable tuples. But I would not remove the functionality. I'm sure with some use of the showon property, we can make to options more lightweight.

avatar joomla-cms-bot joomla-cms-bot - change - 3 Jun 2018
Category Administration com_finder Administration com_finder Language & Strings
avatar Hackwar Hackwar - change - 3 Jun 2018
Title
[4.0] Removing terms tuples from com_finder
[4.0] Make the tuple length of com_finder configurable
avatar Hackwar Hackwar - edited - 3 Jun 2018
avatar Hackwar Hackwar - change - 3 Jun 2018
The description was changed
avatar Hackwar Hackwar - edited - 3 Jun 2018
avatar Hackwar
Hackwar - comment - 3 Jun 2018

I've now made this configurable.

avatar Hackwar Hackwar - change - 3 Jun 2018
Labels Added: ?
avatar infograf768
infograf768 - comment - 4 Jun 2018

FYI I looked all over for a wording in French for tuple easily understandable (specialists use the word tuple).
Found "Ligne d'enregistrement d'une table relationnelle. " is also a definition for specialists...
Therefore, without a tip to explain what it is about, "Words per phrase" looks like the less bad solution.

avatar Hackwar
Hackwar - comment - 12 Jun 2018

I changed the wording. I would put the change with the advanced option for the configuration in another PR, since there are a few changes in the other PRs for com_finder.

avatar carlitorweb
carlitorweb - comment - 18 Jun 2018

WITHOUT THE PR (Index perform)

without-pr

WITH THE PR (Index perform)

with-pr

WITHOUT THE PR (Search perform)

search-without-pr

WITH THE PR (Search perform)

search-with-pr

IMO Sacrifice the ability to search for a specific phrase, if with this we obtain a better performance, worth it. It may be that after we have all these PRs inside, we can find a way to return this ability, or something similar.

avatar carlitorweb
carlitorweb - comment - 18 Jun 2018

Also, I notice, while I try search, the focus of the input gone, and I needed to click on the input again to continue writing. Some JS wrong? need check the console. Anyway, this is not related to this PR

avatar carlitorweb carlitorweb - test_item - 18 Jun 2018 - Tested successfully
avatar carlitorweb
carlitorweb - comment - 18 Jun 2018

I have tested this item successfully on 0e20307


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/20384.

avatar Hackwar
Hackwar - comment - 18 Jun 2018

I've incorporated the changes that @Quy requested. One more tester and we are good to go. 😄

avatar infograf768
infograf768 - comment - 19 Jun 2018

One more tester and we are good to go.

Not sure. The stemmers PR has to go first as it corrects some bugs. Then this one has to be modified to fit as the indexer/helper.php is modified. Can't apply this PR with stemmers already merged.
I confirm it works after patching that file manually. (see my file below).
Choosing 4 doubles the number of rows compared to 3.
Still a small issue that can be related to this or to the stemmer patch: highlighting does not work when searching a full phrase, i.e. when the group of words is surrounded with double quotes.

Capture when searching for activer le plugin filtre

screen shot 2018-06-19 at 09 20 32

capture when searching for "activer le plugin filtre" (Number set to 4)

screen shot 2018-06-19 at 09 20 51

Possible issue: can we limit the Words per phrase to 3? As we have no tip and the field let's choose up to 10, it may create issues for users who have no idea what this means.

@Hackwar
This is the helper I did after modification, i.e. including the stemmer patch. Is it OK?
helper.php.zip

avatar infograf768
infograf768 - comment - 19 Jun 2018

Hmm, found another issue.
When searching a phrase between quotes and the tuple is set to a different number of words (default is 1...), then I get no results...

avatar Hackwar
Hackwar - comment - 19 Jun 2018

Searching for phrases is not possible with tuple count set to 1. We'll see what we can do with the rest when the stemmers are merged.

avatar carlitorweb
carlitorweb - comment - 19 Jun 2018

when the group of words is surrounded with double quotes

@infograf768 with this PR that was removed

avatar carlitorweb
carlitorweb - comment - 20 Jun 2018

Now with stemmers merged, we are ready to go with this one

avatar carlitorweb carlitorweb - test_item - 20 Jun 2018 - Tested successfully
avatar carlitorweb
carlitorweb - comment - 20 Jun 2018

I have tested this item successfully on 6b6fc4c


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/20384.

avatar infograf768
infograf768 - comment - 21 Jun 2018

Searching for phrases is not possible with tuple count set to 1. We'll see what we can do with the rest when the stemmers are merged.

I am concerned by the We'll see what we can do... as you made a PR to take off com_search.

avatar Hackwar
Hackwar - comment - 21 Jun 2018

@infograf768 the code works as expected and in a future PR (most likely this weekend), I will hide most of the indexer options in the backend behind an "advanced" switch. So I don't really know where your issue is right now. At one point, we will have to remove com_search, or do you think that it is sane that we have 2 search implementations in core?

avatar infograf768
infograf768 - comment - 21 Jun 2018

@Hackwar
As I said, until I see that, even with tuple set to 1, we can search a phrase, I would consider finder as missing an important feature which is present in com_search. The stemmers have now been merged and hiding the word count somewhere with default set to 1 is not a solution.

avatar Hackwar
Hackwar - comment - 21 Jun 2018

It is impossible to search for phrases when you have tuple set to 1. It is that simple. Neither you nor I will be able to change that. If others see the wording of the option as an issue, I'm open to change that, but there will be no change on the feature itself. If tuplecount == 1, then you are effectively disabling searching for phrases.

You are acting as if we had to prevent people from ever entering the configuration of their site, but that is bullshit. People using Joomla will have to go through the different settings and learn what they mean and what they do. If they are not able to do that, then they have to use different means to create their website, maybe Siteground or hire an agency.

avatar infograf768
infograf768 - comment - 22 Jun 2018

Are these now broken (not only speaking about the phrase ?

Entering this and that into the search form will return results containing both "this" and "that".

Entering this not that into the search form will return results containing "this" and not "that".

Entering this or that into the search form will return results containing either "this" or "that".

Entering "this and that" (with quotes) into the search form will return results containing the exact phrase "this and that".
avatar carlitorweb
carlitorweb - comment - 22 Jun 2018

Doing a fast test with and without this PR, work
screenshot_20180622080722

avatar infograf768
infograf768 - comment - 22 Jun 2018

@carlitorweb
The search using and or or or not do not need to use doublequotes and tests should be done in another language stemmer than English. Doublequotes are normally only used to surround a phrase as per instructions displayed in Advanced..

and, not and or depend on the language. For example in French they are et, pas, ou

Strings are

COM_FINDER_QUERY_OPERATOR_AND="and"
COM_FINDER_QUERY_OPERATOR_OR="or"
COM_FINDER_QUERY_OPERATOR_NOT="not"

Phrases are queried in /administrator/components/com_finder/helpers/indexer/query.php and in the protected function processString($input, $lang, $mode) method starting line 869

		/*
		 * Extract the tokens enclosed in double quotes so that we can handle
		 * them as phrases.
		 */
		if (StringHelper::strpos($input, '"') !== false)
		{
			$matches = array();
[etc.]
avatar carlitorweb
carlitorweb - comment - 22 Jun 2018

I got now what you mean. Then yes, are broken.

avatar Hackwar
Hackwar - comment - 3 Jul 2018

I've worked over this feature again and did some changes. First of all, the search modifiers should now work correctly. (At least they do for me now.) I also renamed the parameter and instead of explaining people complex stuff around tuplecounts and all, I simply named it "Allow searching for phrases" and you can enable or disable that, which either sets the tuplecount to 1 or 3. I hope that makes it a lot simpler for people.

Please test again and maybe we can finally get this merged. 😉

avatar infograf768
infograf768 - comment - 3 Jul 2018

With these changes a lot of the "Advanced" tips are not necessary in the frontend form when "tuplecount' is set to 1. String is COM_FINDER_ADVANCED_TIPS

I am speaking of
screen shot 2018-07-03 at 16 33 31

I suggest therefore to condition their display.

Other than that, this looks fine. Thank you.

avatar joomla-cms-bot joomla-cms-bot - change - 4 Jul 2018
Category Administration com_finder Language & Strings Administration com_finder Language & Strings Front End
avatar Hackwar
Hackwar - comment - 4 Jul 2018

I added separate translation strings for each search feature and an intro and outro string.

avatar infograf768
infograf768 - comment - 5 Jul 2018

will test asap

avatar Hackwar
Hackwar - comment - 7 Jul 2018

@carlitorweb @infograf768 would you have a few minutes to test this again?

avatar Quy
Quy - comment - 8 Jul 2018

Installed the sample demo, searched "blog" and not "sample" and got the following error:

PHP Notice:  Undefined offset: 0 in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1180
PHP Notice:  Trying to get property 'phrase' of non-object in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1262
PHP Notice:  Trying to get property 'term' of non-object in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1271
PHP Notice:  Trying to get property 'stem' of non-object in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1271
PHP Warning:  Creating default object from empty value in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1280
PHP Notice:  Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1301
PHP Notice:  Undefined property: stdClass::$phrase in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1302
PHP Notice:  Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1321
PHP Notice:  Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1321
PHP Notice:  Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1321
PHP Notice:  Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1321
PHP Notice:  Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1321
PHP Notice:  Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1321
PHP Notice:  Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1321
PHP Notice:  Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1321
PHP Notice:  Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1321
PHP Notice:  Undefined offset: 0 in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1191
PHP Notice:  Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 458
avatar infograf768
infograf768 - comment - 8 Jul 2018

@Quy
The format is not
"blog" and not "sample"
but
blog not sample

EDIT: One should be able to also obtain the right results with
"blog" not "sample"
can you test again?

avatar infograf768
infograf768 - comment - 8 Jul 2018

I have fatal error here before and after patch
[08-Jul-2018 11:02:36 Europe/Berlin] PHP Fatal error: Cannot use Joomla\CMS\Factory as Factory because the name is already in use in /Applications/MAMP/htdocs/installmulti/joomla40/components/com_finder/View/Search/HtmlView.php on line 20

Looking at the file I see that
use Joomla\CMS\Factory;
is used twice
IMHO this needs a specific urgent PR or direct change by maintainers of 4.0 ( @Webdongle @laoneo @mbabker )

avatar infograf768
infograf768 - comment - 8 Jul 2018

After correcting this locally, all works fine.

avatar infograf768
infograf768 - comment - 8 Jul 2018

PR for the Fatal Error : #21016

avatar Hackwar
Hackwar - comment - 8 Jul 2018

@Quy This is an interesting error. I just tested this and this is also present in 3.x. Can we merge/test this one without fixing that bug here and now? I will provide another PR to fix this soon. Just want this one merged in...

avatar Hackwar
Hackwar - comment - 8 Jul 2018

@infograf768 thanks for your PR. 😄

avatar infograf768
infograf768 - comment - 8 Jul 2018

How may using a wrong format be a code error?

avatar Hackwar
Hackwar - comment - 8 Jul 2018

using a "wrong" format should indeed not be a big issue. In worst case you get no result at all. But it should NEVER throw an error like it does right now. @Quy might not get the result that he is looking for, but he definitely should not get the warnings that he got right now. That is an issue that we also have to fix in 3.x and you could consider that a (low-level) security issue. It is an information disclosure.

avatar carlitorweb
carlitorweb - comment - 8 Jul 2018

@wilsonge

Can we merge/test this one without fixing that bug here and now? I will provide another PR to fix this soon. Just want this one merged in...

avatar infograf768
infograf768 - comment - 10 Jul 2018

I have tested this item successfully on 5f21dc9


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/20384.

avatar infograf768 infograf768 - test_item - 10 Jul 2018 - Tested successfully
avatar franz-wohlkoenig
franz-wohlkoenig - comment - 10 Jul 2018

@carlitorweb can you please retest?

avatar carlitorweb carlitorweb - test_item - 10 Jul 2018 - Tested successfully
avatar carlitorweb
carlitorweb - comment - 10 Jul 2018

I have tested this item successfully on 5f21dc9


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/20384.

avatar franz-wohlkoenig franz-wohlkoenig - change - 10 Jul 2018
Status Pending Ready to Commit
avatar franz-wohlkoenig
franz-wohlkoenig - comment - 10 Jul 2018

Ready to Commit after two successful tests.

avatar Hackwar Hackwar - change - 16 Jul 2018
Labels Added: ?
avatar wilsonge wilsonge - close - 18 Jul 2018
avatar wilsonge wilsonge - merge - 18 Jul 2018
avatar wilsonge wilsonge - change - 18 Jul 2018
Status Ready to Commit Fixed in Code Base
Closed_Date 0000-00-00 00:00:00 2018-07-18 15:11:06
Closed_By wilsonge
avatar Hackwar
Hackwar - comment - 18 Jul 2018

Thank you!

avatar wilsonge
wilsonge - comment - 18 Jul 2018

Thankyou!

Add a Comment

Login with GitHub to post a comment