?
avatar dwreski
dwreski
26 Mar 2020

Steps to reproduce the issue

  • update finder terms using finder_indexer.php script

Expected result

terms listed in ###_finder_terms_common table to be ignored

Actual result

terms listed in ###_finder_terms_common table are not ignored and continue to appear in finder tables

System information (as much as possible)

joomla-3.9.16
fedora31
php-7.3.14

Additional comments

Am I misunderstanding the intention of the finder_terms_common table? Is there an interface for updating it, or must it be done manually?

Words like 'of' and 'as' continue to appear in the terms table. We have a very large site with a multi-gigabyte finder_terms table that we'd like to minimize.

avatar dwreski dwreski - open - 26 Mar 2020
avatar joomla-cms-bot joomla-cms-bot - labeled - 26 Mar 2020
avatar richard67
richard67 - comment - 26 Mar 2020

@dwreski Which kind and version of database?

@Hackwar Could you have a look?

avatar dwreski
dwreski - comment - 26 Mar 2020

I'm using mariadb-10.3.21

Thank you. Sorry I forgot to include that.

avatar dwreski
dwreski - comment - 29 Mar 2020

Any updates on this? Can I ask someone to test ###_finder_terms_common on their system and see if it's working properly to exclude terms from the finder tables on joomla-3.9.15/16?


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/28467.
avatar richard67
richard67 - comment - 29 Mar 2020

terms listed in #_finder_terms_common table to be ignored

Am I misunderstanding the intention of the finder_terms_common table? Is there an interface for updating it, or must it be done manually?

@Hackwar @chrisdavenport Could someone answer this question? I am unfortunately not familiar with smart search (com_finder).

avatar richard67
richard67 - comment - 29 Mar 2020

@dwreski I can confirm that on both Joonla 3.x and in 4.0-dev you can find all common terms defined in database table #__finder_terms_common also in table #__finder_terms.

The good news: In J4 I could find in the Smart Search component's options in the "Index" tab an option "Filter Common Words". If you switch that on and do a reindex, the common terms should disappear from the terms table.

The bad news: In J3 I didn't find this option.

What I also could find out meanwhile by asking around is that the #__finder_terms_common is intended to be maintained "manually", i.e. with SQL in database client (e.g. phpMyAdmin).

avatar dwreski
dwreski - comment - 29 Mar 2020

@richard67 thanks so much for your help.

So have we confirmed that there is a bug because the terms exist in both #__finder_terms_common also in table #__finder_terms?

avatar richard67
richard67 - comment - 29 Mar 2020

@dwreski Not sure if it is a bug or just missing functionality.

In J4 the finder aka smart search was restructured and widely rewritten, and missing functionality has been added.

@HLeithner Would you say it's a bug that common terms like "and" are included in finder terms and there is no way to filter them out in J3 like we have it in J4? Without that filter, the #__finder_terms_common table is useless in J3.

avatar HLeithner
HLeithner - comment - 29 Mar 2020

It depends, if it really should work and is documented that it should work by adding terms with phpmyadmin (wth) then it should work, if only the table exists without reference that it should work then it would be a new feature. If it's only a missing join or similar it should be fixed, if you need a new gui for this I think it has to wait for j4

avatar richard67
richard67 - comment - 29 Mar 2020

In J4 we have:
j4-finder-options-index

In J3 we don't have that, or I did not find it.

avatar HLeithner
HLeithner - comment - 29 Mar 2020

Why is there an option for this?!

avatar richard67
richard67 - comment - 29 Mar 2020

Good question. And I haven't tested if it really works in J4.

avatar dwreski
dwreski - comment - 29 Mar 2020

It's not even ignoring the default terms added in J3. See attached.

joomla-3.9.15-common-terms.txt

avatar Hackwar
Hackwar - comment - 7 Jul 2020

The common words feature is not really clear how it should work. In J3 the feature is only supposed to mark terms as "common" words by setting the flag in the table and later on, this is supposed to somehow improve the search results by ommitting common words in the search query. However this works sketchy at best and is not really reliable from my POV. There is no GUI or API to change that table except by modifying it in phpmyadmin directly. I don't think we will modify this in J3 anymore.

J4 has the option to work like in J3, but also to not index common words at all. There is a way to add additional (language specific) words to the table with the language packs by adding a txt file, otherwise editing via phpmyadmin is a possibility. One idea in J4 was also to allow for filtering out problematic words in the indexed search which you maybe don't want to see to be easily found, for example for political dissidents.

Unfortunately what you are seeing is the "correct" behavior. Fortunately for you, the effects of this aren't as severe as they seem. The common words will only appear once in the terms table and there will be only one mapping entry per content item per common word. The additional storage space for this wouldn't really make THAT big a difference...

avatar richard67
richard67 - comment - 7 Jul 2020

@Hackwar Do I get you right that this issue should be closed?

avatar Hackwar
Hackwar - comment - 7 Jul 2020

yes

avatar richard67 richard67 - change - 7 Jul 2020
Status New Expected Behaviour
Closed_Date 0000-00-00 00:00:00 2020-07-07 19:26:44
Closed_By richard67
avatar richard67 richard67 - close - 7 Jul 2020
avatar richard67
richard67 - comment - 7 Jul 2020

Closing as expected behavior. Thanks @Hackwar for the explanations.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/28467.

Add a Comment

Login with GitHub to post a comment