User tests: Successful: Unsuccessful:
Pull Request for Issue #30372 .
Try to find the correct language when "default star (*)" is given in finder stemmmer.
See: #30372
Current language not found when "*"
Current language found.
Status | New | ⇒ | Pending |
Category | ⇒ | Administration com_finder |
Labels |
Added:
?
?
|
There are loads of languages which have no stemmer, thus we can't assume a stemmer is present. If the language is *
, we shouldn't even try finding a stemmer in the constructor of the Language
class. Generally however we are handling this correctly from my perspective, since we are catching the exception correctly. If xdebug is too eager here, I don't really think this is our fault... Anyway, the correct and easiest solution here is to simply exclude *
from getting a stemmer and that's it.
To extend a bit: stemming is a process that takes processing power and I can see several situations where you don't want to have a stemmer at all. *
is the equivalent of "I don't want a stemmer"
It was not the intention to prevent the exception but to find a stemmer if possible.
But I have no feelings if this improvement should be merge or not.
Anyway, the correct and easiest solution here is to simply exclude
*
from getting a stemmer and that's it.
@bembelimen Could you change this PR so it follows @Hackwar 's suggestion? To me it seems to be the better way, too.
That would mean that a default install will never use the stemmer even though the stemmer is available for english, wouldnt it?
That would mean that a default install will never use the stemmer even though the stemmer is available for english, wouldnt it?
Hmm, that's true ... not what's desired.
That is intended behavior. The stemmer is resource intensive and I want people to make a conscious decision to enable this, also to select the right one/configure the search in its entirety for their system instead of simply enabling it and not looking into it further.
That is intended behavior. The stemmer is resource intensive and I want people to make a conscious decision to enable this, also to select the right one/configure the search in its entirety for their system instead of simply enabling it and not looking into it further.
And where do they do that?
*
is the equivalent of "I don't want a stemmer"
*
is used to mean either required or everything - I have never seen it used to mean nothing
That is intended behavior. The stemmer is resource intensive and I want people to make a conscious decision to enable this, also to select the right one/configure the search in its entirety for their system instead of simply enabling it and not looking into it further.
And where do they do that?
The option is in the component configuration.
*
is the equivalent of "I don't want a stemmer"
*
is used to mean either required or everything - I have never seen it used to mean nothing
That is why that isn't visible to the user. I did not decide that *
is the internal identifier for items without a language. *
means I don't know the language (unless set in the component configuration) and thus I wouldn't know which stemmer to use. Since the wrong stemmer would be worse than no stemmer, people should make a conscious decision what they need.
That is intended behavior. The stemmer is resource intensive and I want people to make a conscious decision to enable this, also to select the right one/configure the search in its entirety for their system instead of simply enabling it and not looking into it further.
And where do they do that?
The option is in the component configuration.
I asked as I dont see anything regarding a stemmer in the component configuration for Joomla 4.
I can see it in Joomla 3 but not 4
It is the "Default Language" option.
Oh that is so obvious - not
How on earth is anyone to know then that by selecting a default language there is (as you claim) a performance hit.
That list is showing the languages on the site. Nothing to do with if those languages have a stemmer available
And as the j3 field no longer exists you're breaking functionality on upgrade without any documentation
Can we fix this with better text in the default language select?
Maybe
None -> None (Stemmer disabled)
Default Site Languge (Stemmer enabled)
Because it's not only the stemmer. This setting selects which language to use for indexing the content. That selection both influences how the text is tokenised as well as the stemmer to use. For example chinese texts don't have spaces between words, so we can't simply do a $words = explode(' ', $text);
to split up the text into words. It also influences which list of stopwords to use and maybe in the future this could also select a synonym/thesaurus system. If it were just about the stemmer, it would be called "Select the stemmer".
WOW - it's no wonder we get nowhere when you keep changing your mind on what information you will share about your code.
This is the first time in this entire thread that you have said it was about something more than a stemmer. Every comment you have made here has only referred to this as being a stemmer. @bembelimen obviously thought it was just about a stemmer as he wrote in the PR. You didnt correct him then. I thought it was about a stemmer and you didnt correct me then.
Only now three months later you wake up and start telling everyone its more than a stemmer.
As I said before and now even more re-enforced by your latest revelation this is all an undocumented backwards compatibility break that is being hidden by you from everyone.
Since this feature was broken in J3, there is no b/c to be broken. Besides the fact that our b/c claims don't actually extend to components and a major version actually does allow us to break b/c.
I've described the functionality in the original PR here: #20391
I did not go into an elaborate rant about how the complete indexing system of Smart Search works, because I tend to not write 6-page-long essays to shift blame or try to make others feel stupid. I was asked to give feedback to this PR and concentrated on the context of this PR, which was the code to generate a stemmer object.
All the relevant code in the Joomla\Component\Finder\Administrator\Indexer\Language
class has been there since Joomla 2.5, except it was spread out over several classes and actually did not work, especially for languages other than english. It didn't bother you until now.
Breaking b/c is fine it just has to be documented
So the missing point is that we need to document this somewhere, can @Hackwar or @bembelimen do that please, Thanks.
I dont understand why you have removed the release blocker. Its not resolved
Labels |
Added:
?
Removed: ? ? |
This pull request has automatically rebased to 4.2-dev.
This pull requests has been automatically converted to the PSR-12 coding standard.
Status | Pending | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2022-10-21 11:05:12 |
Closed_By | ⇒ | rdeutz | |
Labels |
Added:
?
?
Removed: ? |
@Hackwar could you please check here and give your feedback?