? ? Pending

User tests: Successful: Unsuccessful:

avatar bembelimen
bembelimen
28 Dec 2020

Pull Request for Issue #30372 .

Summary of Changes

Try to find the correct language when "default star (*)" is given in finder stemmmer.

Testing Instructions

See: #30372

Actual result BEFORE applying this Pull Request

Current language not found when "*"

Expected result AFTER applying this Pull Request

Current language found.

avatar bembelimen bembelimen - open - 28 Dec 2020
avatar bembelimen bembelimen - change - 28 Dec 2020
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 28 Dec 2020
Category Administration com_finder
avatar bembelimen bembelimen - change - 28 Dec 2020
Labels Added: ? ?
avatar bembelimen
bembelimen - comment - 28 Dec 2020

@Hackwar could you please check here and give your feedback?

avatar Hackwar
Hackwar - comment - 28 Dec 2020

There are loads of languages which have no stemmer, thus we can't assume a stemmer is present. If the language is *, we shouldn't even try finding a stemmer in the constructor of the Language class. Generally however we are handling this correctly from my perspective, since we are catching the exception correctly. If xdebug is too eager here, I don't really think this is our fault... Anyway, the correct and easiest solution here is to simply exclude * from getting a stemmer and that's it.

avatar Hackwar
Hackwar - comment - 28 Dec 2020

To extend a bit: stemming is a process that takes processing power and I can see several situations where you don't want to have a stemmer at all. * is the equivalent of "I don't want a stemmer"

avatar bembelimen
bembelimen - comment - 28 Dec 2020

It was not the intention to prevent the exception but to find a stemmer if possible.

But I have no feelings if this improvement should be merge or not.

avatar richard67
richard67 - comment - 1 Feb 2021

Anyway, the correct and easiest solution here is to simply exclude * from getting a stemmer and that's it.

@bembelimen Could you change this PR so it follows @Hackwar 's suggestion? To me it seems to be the better way, too.

avatar brianteeman
brianteeman - comment - 1 Feb 2021

That would mean that a default install will never use the stemmer even though the stemmer is available for english, wouldnt it?

avatar richard67
richard67 - comment - 1 Feb 2021

That would mean that a default install will never use the stemmer even though the stemmer is available for english, wouldnt it?

Hmm, that's true ... not what's desired.

avatar Hackwar
Hackwar - comment - 1 Feb 2021

That is intended behavior. The stemmer is resource intensive and I want people to make a conscious decision to enable this, also to select the right one/configure the search in its entirety for their system instead of simply enabling it and not looking into it further.

avatar brianteeman
brianteeman - comment - 1 Feb 2021

That is intended behavior. The stemmer is resource intensive and I want people to make a conscious decision to enable this, also to select the right one/configure the search in its entirety for their system instead of simply enabling it and not looking into it further.

And where do they do that?

* is the equivalent of "I don't want a stemmer"

* is used to mean either required or everything - I have never seen it used to mean nothing

avatar Hackwar
Hackwar - comment - 1 Feb 2021

That is intended behavior. The stemmer is resource intensive and I want people to make a conscious decision to enable this, also to select the right one/configure the search in its entirety for their system instead of simply enabling it and not looking into it further.

And where do they do that?

The option is in the component configuration.

* is the equivalent of "I don't want a stemmer"

* is used to mean either required or everything - I have never seen it used to mean nothing

That is why that isn't visible to the user. I did not decide that * is the internal identifier for items without a language. *means I don't know the language (unless set in the component configuration) and thus I wouldn't know which stemmer to use. Since the wrong stemmer would be worse than no stemmer, people should make a conscious decision what they need.

avatar brianteeman
brianteeman - comment - 1 Feb 2021

That is intended behavior. The stemmer is resource intensive and I want people to make a conscious decision to enable this, also to select the right one/configure the search in its entirety for their system instead of simply enabling it and not looking into it further.

And where do they do that?

The option is in the component configuration.

I asked as I dont see anything regarding a stemmer in the component configuration for Joomla 4.

I can see it in Joomla 3 but not 4

image

avatar Hackwar
Hackwar - comment - 1 Feb 2021

It is the "Default Language" option.

avatar brianteeman
brianteeman - comment - 2 Feb 2021

Oh that is so obvious - not

How on earth is anyone to know then that by selecting a default language there is (as you claim) a performance hit.

That list is showing the languages on the site. Nothing to do with if those languages have a stemmer available

avatar brianteeman
brianteeman - comment - 2 Feb 2021

And as the j3 field no longer exists you're breaking functionality on upgrade without any documentation

avatar rdeutz
rdeutz - comment - 23 Mar 2021

Can we fix this with better text in the default language select?

Maybe

None -> None (Stemmer disabled)
Default Site Languge (Stemmer enabled)

avatar brianteeman
brianteeman - comment - 23 Mar 2021

From what @Hackwar has said then no because the default site language might not have a stemmer available.

avatar brianteeman
brianteeman - comment - 23 Mar 2021

Plus there is nothing in the option to indicate that it is anything to do with a stemmer at all

image

avatar Hackwar
Hackwar - comment - 23 Mar 2021

Because it's not only the stemmer. This setting selects which language to use for indexing the content. That selection both influences how the text is tokenised as well as the stemmer to use. For example chinese texts don't have spaces between words, so we can't simply do a $words = explode(' ', $text); to split up the text into words. It also influences which list of stopwords to use and maybe in the future this could also select a synonym/thesaurus system. If it were just about the stemmer, it would be called "Select the stemmer".

avatar brianteeman
brianteeman - comment - 23 Mar 2021

WOW - it's no wonder we get nowhere when you keep changing your mind on what information you will share about your code.

This is the first time in this entire thread that you have said it was about something more than a stemmer. Every comment you have made here has only referred to this as being a stemmer. @bembelimen obviously thought it was just about a stemmer as he wrote in the PR. You didnt correct him then. I thought it was about a stemmer and you didnt correct me then.

Only now three months later you wake up and start telling everyone its more than a stemmer.

As I said before and now even more re-enforced by your latest revelation this is all an undocumented backwards compatibility break that is being hidden by you from everyone.

avatar Hackwar
Hackwar - comment - 23 Mar 2021

Since this feature was broken in J3, there is no b/c to be broken. Besides the fact that our b/c claims don't actually extend to components and a major version actually does allow us to break b/c.

I've described the functionality in the original PR here: #20391

I did not go into an elaborate rant about how the complete indexing system of Smart Search works, because I tend to not write 6-page-long essays to shift blame or try to make others feel stupid. I was asked to give feedback to this PR and concentrated on the context of this PR, which was the code to generate a stemmer object.

All the relevant code in the Joomla\Component\Finder\Administrator\Indexer\Language class has been there since Joomla 2.5, except it was spread out over several classes and actually did not work, especially for languages other than english. It didn't bother you until now.

avatar brianteeman
brianteeman - comment - 24 Mar 2021

Breaking b/c is fine it just has to be documented

avatar rdeutz
rdeutz - comment - 6 Apr 2021

So the missing point is that we need to document this somewhere, can @Hackwar or @bembelimen do that please, Thanks.

avatar brianteeman
brianteeman - comment - 6 Apr 2021

I dont understand why you have removed the release blocker. Its not resolved

avatar chmst chmst - change - 31 Jan 2022
Labels Added: ?
Removed: ? ?
avatar HLeithner
HLeithner - comment - 27 Jun 2022

This pull request has automatically rebased to 4.2-dev.

avatar joomla-bot
joomla-bot - comment - 27 Jun 2022

This pull requests has been automatically converted to the PSR-12 coding standard.

avatar rdeutz
rdeutz - comment - 21 Oct 2022

Closing because this change is not a solution for the initial problem. Reopened issue #30372

avatar rdeutz rdeutz - change - 21 Oct 2022
Status Pending Closed
Closed_Date 0000-00-00 00:00:00 2022-10-21 11:05:12
Closed_By rdeutz
Labels Added: ? ?
Removed: ?
avatar rdeutz rdeutz - close - 21 Oct 2022

Add a Comment

Login with GitHub to post a comment