User tests: Successful: Unsuccessful:
[UPDATE] After the initial discussion, I changed this to make the tuple length of com_finder be configurable. This should make this less controversial.
Finder is indexing not only every word, but also every combination of 2- and 3-word-tuples in the text to be indexed. This results in gigantic storage consumption by the index and a performance decrease when indexing of about 20%. At the same time, it only marginally improves the search results. When removing this feature, we are sacrificing the possibility to search for specific phrases of up to 3 terms (like "Joomla is great") for a reduction in necessary storage space of about 2/3. (and the performance increase in indexing)
Install 4.0-dev and go to the backend of Smart Search. Enable the Smart Search Plugin and start the indexing process. Notice that it takes a long time (preferably measure the time with a stopwatch). Go to the frontend and open the Smart Search menu item. Search for something and notice the results.
Now apply these changes and go into the backend of Smart Search. Clear the index and rebuild the index, again checking with a stopwatch. You should see a 20% performance increase here. Now go to the frontend and do the same search. You should get the same result.
Do a search for a specific phrase, wrapping the phrase in quotes. See that this does not work anymore.
Please also see #20185 for further improvements for com_finder.
I want to thank cloudaccess.net for supporting these changes. Without their generous help, this wouldn't be possible.
Status | New | ⇒ | Pending |
Category | ⇒ | Administration com_finder |
Labels |
Added:
?
|
Making this configurable would mean yet another option and quite a bit more code for something that 98% out there don't need. At the same time I think that smart search is not and can not be the ultimate site search engine, simply because we are writing a CMS and not a search engine. This component is a feature for all those sites out there that need a slightly better search than com_search, but if you need something really sophisticated, use a pro tool like Elastic Search.
So I'm advocating bit less complexity here. Right now this component is so complex that no one really worked out the issues in the last 5 years. Don't force this to stay abandonware.
I'm here with @chrisdavenport. This is a feature I like. Perhaps we can think of a new option to enable/disable tuples. But I would not remove the functionality. I'm sure with some use of the showon
property, we can make to options more lightweight.
Category | Administration com_finder | ⇒ | Administration com_finder Language & Strings |
Title |
|
I've now made this configurable.
Labels |
Added:
?
|
FYI I looked all over for a wording in French for tuple
easily understandable (specialists use the word tuple).
Found "Ligne d'enregistrement d'une table relationnelle. " is also a definition for specialists...
Therefore, without a tip to explain what it is about, "Words per phrase" looks like the less bad solution.
I changed the wording. I would put the change with the advanced option for the configuration in another PR, since there are a few changes in the other PRs for com_finder.
WITHOUT THE PR (Index perform)
WITH THE PR (Index perform)
WITHOUT THE PR (Search perform)
WITH THE PR (Search perform)
IMO Sacrifice the ability to search for a specific phrase, if with this we obtain a better performance, worth it. It may be that after we have all these PRs inside, we can find a way to return this ability, or something similar.
Also, I notice, while I try search, the focus of the input gone, and I needed to click on the input again to continue writing. Some JS wrong? need check the console. Anyway, this is not related to this PR
I have tested this item
One more tester and we are good to go.
Not sure. The stemmers PR has to go first as it corrects some bugs. Then this one has to be modified to fit as the indexer/helper.php
is modified. Can't apply this PR with stemmers already merged.
I confirm it works after patching that file manually. (see my file below).
Choosing 4 doubles the number of rows compared to 3.
Still a small issue that can be related to this or to the stemmer patch: highlighting does not work when searching a full phrase, i.e. when the group of words is surrounded with double quotes.
Capture when searching for activer le plugin filtre
capture when searching for "activer le plugin filtre"
(Number set to 4)
Possible issue: can we limit the Words per phrase to 3? As we have no tip and the field let's choose up to 10, it may create issues for users who have no idea what this means.
@Hackwar
This is the helper I did after modification, i.e. including the stemmer patch. Is it OK?
helper.php.zip
Hmm, found another issue.
When searching a phrase between quotes and the tuple is set to a different number of words (default is 1...), then I get no results...
Searching for phrases is not possible with tuple count set to 1. We'll see what we can do with the rest when the stemmers are merged.
when the group of words is surrounded with double quotes
@infograf768 with this PR that was removed
Now with stemmers merged, we are ready to go with this one
I have tested this item
Searching for phrases is not possible with tuple count set to 1. We'll see what we can do with the rest when the stemmers are merged.
I am concerned by the We'll see what we can do...
as you made a PR to take off com_search.
@infograf768 the code works as expected and in a future PR (most likely this weekend), I will hide most of the indexer options in the backend behind an "advanced" switch. So I don't really know where your issue is right now. At one point, we will have to remove com_search, or do you think that it is sane that we have 2 search implementations in core?
@Hackwar
As I said, until I see that, even with tuple set to 1, we can search a phrase, I would consider finder as missing an important feature which is present in com_search. The stemmers have now been merged and hiding the word count somewhere with default set to 1 is not a solution.
It is impossible to search for phrases when you have tuple set to 1. It is that simple. Neither you nor I will be able to change that. If others see the wording of the option as an issue, I'm open to change that, but there will be no change on the feature itself. If tuplecount == 1, then you are effectively disabling searching for phrases.
You are acting as if we had to prevent people from ever entering the configuration of their site, but that is bullshit. People using Joomla will have to go through the different settings and learn what they mean and what they do. If they are not able to do that, then they have to use different means to create their website, maybe Siteground or hire an agency.
Are these now broken (not only speaking about the phrase ?
Entering this and that into the search form will return results containing both "this" and "that".
Entering this not that into the search form will return results containing "this" and not "that".
Entering this or that into the search form will return results containing either "this" or "that".
Entering "this and that" (with quotes) into the search form will return results containing the exact phrase "this and that".
@carlitorweb
The search using and
or or
or not
do not need to use doublequotes and tests should be done in another language stemmer than English. Doublequotes are normally only used to surround a phrase as per instructions displayed in Advanced..
and
, not
and or
depend on the language. For example in French they are et
, pas
, ou
Strings are
COM_FINDER_QUERY_OPERATOR_AND="and"
COM_FINDER_QUERY_OPERATOR_OR="or"
COM_FINDER_QUERY_OPERATOR_NOT="not"
Phrases are queried in /administrator/components/com_finder/helpers/indexer/query.php
and in the protected function processString($input, $lang, $mode)
method starting line 869
/*
* Extract the tokens enclosed in double quotes so that we can handle
* them as phrases.
*/
if (StringHelper::strpos($input, '"') !== false)
{
$matches = array();
[etc.]
I got now what you mean. Then yes, are broken.
I've worked over this feature again and did some changes. First of all, the search modifiers should now work correctly. (At least they do for me now.) I also renamed the parameter and instead of explaining people complex stuff around tuplecounts and all, I simply named it "Allow searching for phrases" and you can enable or disable that, which either sets the tuplecount to 1 or 3. I hope that makes it a lot simpler for people.
Please test again and maybe we can finally get this merged.
Category | Administration com_finder Language & Strings | ⇒ | Administration com_finder Language & Strings Front End |
I added separate translation strings for each search feature and an intro and outro string.
will test asap
@carlitorweb @infograf768 would you have a few minutes to test this again?
Installed the sample demo, searched "blog" and not "sample"
and got the following error:
PHP Notice: Undefined offset: 0 in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1180
PHP Notice: Trying to get property 'phrase' of non-object in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1262
PHP Notice: Trying to get property 'term' of non-object in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1271
PHP Notice: Trying to get property 'stem' of non-object in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1271
PHP Warning: Creating default object from empty value in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1280
PHP Notice: Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1301
PHP Notice: Undefined property: stdClass::$phrase in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1302
PHP Notice: Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1321
PHP Notice: Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1321
PHP Notice: Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1321
PHP Notice: Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1321
PHP Notice: Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1321
PHP Notice: Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1321
PHP Notice: Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1321
PHP Notice: Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1321
PHP Notice: Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1321
PHP Notice: Undefined offset: 0 in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 1191
PHP Notice: Undefined property: stdClass::$term in C:\xampp\htdocs\joomla-cms-j4finder_termsphrases\administrator\components\com_finder\helpers\indexer\query.php on line 458
I have fatal error here before and after patch
[08-Jul-2018 11:02:36 Europe/Berlin] PHP Fatal error: Cannot use Joomla\CMS\Factory as Factory because the name is already in use in /Applications/MAMP/htdocs/installmulti/joomla40/components/com_finder/View/Search/HtmlView.php on line 20
Looking at the file I see that
use Joomla\CMS\Factory;
is used twice
IMHO this needs a specific urgent PR or direct change by maintainers of 4.0 ( @Webdongle @laoneo @mbabker )
After correcting this locally, all works fine.
@infograf768 thanks for your PR.
How may using a wrong format be a code error?
using a "wrong" format should indeed not be a big issue. In worst case you get no result at all. But it should NEVER throw an error like it does right now. @Quy might not get the result that he is looking for, but he definitely should not get the warnings that he got right now. That is an issue that we also have to fix in 3.x and you could consider that a (low-level) security issue. It is an information disclosure.
I have tested this item
@carlitorweb can you please retest?
I have tested this item
Status | Pending | ⇒ | Ready to Commit |
Ready to Commit after two successful tests.
Labels |
Added:
?
|
Status | Ready to Commit | ⇒ | Fixed in Code Base |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2018-07-18 15:11:06 |
Closed_By | ⇒ | wilsonge |
Thank you!
Thankyou!
I'd be opposed to this change as it is removing a feature that is required on some sites. A better solution would be to make the number of words per tuple a configurable parameter of the indexer (default to 1 for new installations).