User tests: Successful: Unsuccessful:
This PR filters common words and numeric terms from the index and the search query and also fixes a multilanguage issue in com_finder.
Searching for I, and, or, it,...
in com_finder will give you lots of results, but might not be really usefull. So it is common for index based searches, to filter such common words both from the index (in order to keep it smaller) and from the search queries in order to only search for the relevant words. With #20781 I provided a way to add language specific sets of common words, this PR provides a switch to filter out these common words and a second one to filter out numeric terms.
I also found an issue with the multi-language capabilities of com_finder. Up till now, com_finder did not differentiate between terms from different languages. As long as they were written the same, they were okay. However with the stemming and the common words per language, we run into issues here. All of a sudden, a common word might still appear in the index for a content item, because it is correctly filtered out in english, but in the french article that has an english citation, it is still included. Thus it all of a sudden reappears and has some (albeit minor) negative effects on the search results. We are also having issues with different words from different languages which have the same stemmed form. Thus this change now does not silently drop identical words from different languages, but instead inserts each duplicate term with its specific language flag.
Please note that this PR has DB updates which need to be applied before testing.
Status | New | ⇒ | Pending |
Category | ⇒ | Administration com_finder Language & Strings Front End |
Labels |
Added:
?
?
|
Category | Administration com_finder Language & Strings Front End | ⇒ | SQL Administration com_admin Postgresql com_finder Language & Strings Front End Installation |
YesAm 11.08.2018 13:47 schrieb Tobi notifications@github.com:Can this be tested without #20781?
—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or mute the thread.
Status | Pending | ⇒ | Fixed in Code Base |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2019-01-30 20:32:44 |
Closed_By | ⇒ | wilsonge |
Thankyou!
And now?