Joomla! Issue Tracker | Joomla! CMS #20776 - Optimize plural suffix logic for en-GB

? ? ?

Closed
16 Jun 2018
Medium
Build: staging
# 20776
Diff
mbabker:plural-suffix

Pending

User tests: Successful: Unsuccessful:

mbabker
16 Jun 2018

Summary of Changes

In the English language, often times when using a text string for 0 items, the text string will read the same as if you have 2 or more items. Therefore, our plural suffixes should allow a 0 count to use the _MORE suffixed key; in the case of our en-GB strings this can potentially eliminate redundant plural strings where the _0 and _MORE suffixed strings are the same.
JText::plural() automatically prepends the item count to the suffix array when trying to look for language keys, therefore there is no need for the callback method in the localise class to include an extra definition for that suffix (with the current code when the count is 1 basically the suffix lookup array is array('1', '1'), for 2 or more in en-GB it's array($count, 'MORE'). So, we can simplify the logic here and basically only return our MORE suffix when the count is anything but one.
Added an inline comment to the method to explain the rationale here, and to make sure it's you only need to define extra suffixes for whatever rules you need to follow since Joomla will automatically try to find a matching string for the specific item count before processing the extra suffixes.

Testing Instructions

Pluralized strings should still correctly be processed for en-GB. With this change, some of the _0 suffixed strings for mod_status could be removed (i.e. MOD_STATUS_TOTAL_USERS_0) since the corresponding _MORE suffixed string is the same text and there is no need in this case for a contextually different string for a 0 count versus 2+.

mbabker - open - 16 Jun 2018

mbabker - change - 16 Jun 2018

Status

New

⇒

Pending

joomla-cms-bot - change - 16 Jun 2018

Category

Administration Language & Strings

⇒

Administration Language & Strings Unit Tests

infograf768 - comment - 16 Jun 2018

Although I do understand this change in localise.php for en-GB, I am concerned that removing en-GB strings of type WHATEVER_0 would have an impact on other languages as the value for them would be different than the value for WHATEVER_MORE in many cases.
Most languages do use the existing default en-GB plural format and by removing the reference strings it would create havoc.

Therefore, if the intention of this PR is to remove a number of strings with _0 as constant suffix, I would not be in favour of it as the string would disapear from crowdin and com_localise.

Example for French:

MOD_STATUS_BACKEND_USERS_0="Administrateur"
MOD_STATUS_BACKEND_USERS_1="Administrateur"
MOD_STATUS_BACKEND_USERS_MORE="Administrateurs"

MOD_STATUS_USERS_0="Utilisateur"
MOD_STATUS_USERS_1="Utilisateur connecté"
MOD_STATUS_USERS_MORE="Utilisateurs connectés"

Bakual - comment - 16 Jun 2018

JM is right here. We need the _0 strings in the source so Crowdin (and com_localise) will propose it for translation. The case "zero" isn't automatically done like the real plural forms (singular, paucal, ...) by those tools since they don't know if that case can happen or not.
So if the case zero can happen (eg "0 users online"), we need the string in the en-GB source anyway.

This PR is still fine as it is, but if the intention is to remove the now unused _0 strings, that would not work.

infograf768 - comment - 16 Jun 2018

Let me add that evidently, if some strings with _0 suffix are useless (in any language and there are quite a few in core) these should be removed in 4.0 (no use to mark them as deprecated in 3.x as it is obvious and it is not necessary to give more work to TTs in 3.x just for that).

Example: in the past (1.6/1.7 I guess), when choosing multiple items to be checked in and none did need to be checked in, the message was "0 items were checked-in".
Now, it is always a positive message, even if no items needed to be checked-in.
We get "# items were checked in" (where # is the number of items chosen) which makes no sense really compared to the true action, but that is not a big deal.

brianteeman - comment - 16 Jun 2018

There already is a pr to remove those O value checked in strings

mbabker - comment - 16 Jun 2018

@infograf768 & @Bakual unless translators are ONLY providing plural strings for the same values we have in en-GB (0, 1, and MORE), there is no translation issue other than potentially dealing with Crowdin needing a way to give more strings to a translation than the source data has. If I'm not mistaken, there are some languages out there which have pluralization rules where they already have to add additional keys/suffixes for their needs. So no, this change actually should not affect translation workflow at all.

@infograf768 your case for the French strings would still be supported; the source en-GB language does not have to have 0, 1, and MORE strings for French to make use of them, otherwise those languages with extra suffix logic would not be able to build their strings/rules correctly or en-GB would have to have all the possible keys in the main language file for them to be used.

The intent is not to turn around and remove all the 0 suffixed strings. For most cases where they are used in en-GB a message specific to the 0 count is used, which is fine. For other cases, like those in the status module, they are redundant strings in the en-GB context. The entire reason I pushed this PR is because last night I ended up writing strings where I needed a redundant string for a 0 value:

COM_PRIVACY_DASHBOARD_BADGE_ACTIVE_REQUESTS_0="<span class=\"badge badge-warning\">%d</span> Active Requests"
COM_PRIVACY_DASHBOARD_BADGE_ACTIVE_REQUESTS_1="<span class=\"badge badge-warning\">%d</span> Active Request"
COM_PRIVACY_DASHBOARD_BADGE_ACTIVE_REQUESTS_MORE="<span class=\"badge badge-warning\">%d</span> Active Requests"

infograf768 - comment - 16 Jun 2018

@mbabker
I am not sure you understood.
If some en-GB string with _0 are no more in en-GB language folder, then they will not display to be translated on Crowdin as well as for com_localise as en-GB IS the reference language. This means that we would have to add them manually.

ggppdk - comment - 16 Jun 2018

Also needed in greek

mbabker - comment - 16 Jun 2018

Nevermind, not going to sit here and argue over stupid crap.

mbabker - change - 16 Jun 2018

Status	Pending	⇒	Closed
Closed_Date	0000-00-00 00:00:00	⇒	2018-06-16 13:56:05
Closed_By		⇒	mbabker
Labels	Added: ?

mbabker - close - 16 Jun 2018

infograf768 - comment - 16 Jun 2018

Thanks. A few redundant strings in en-GB are indeed not worth creating issues for other languages.

mbabker - comment - 16 Jun 2018

It's not creating an issue that doesn't already exist. By your argument, we should have a _234 suffix in en-GB because Serbian uses that, or a _2 suffix because Polish uses that, or whatever other suffixes exist in languages that I don't have an immediate backup of by way of the downloads site.

Your argument is "m'eh, we have a software limitation in that the software WE WROTE doesn't know how to deal with pluralized strings, no changes for you". That's ? and you know it.

ggppdk - comment - 16 Jun 2018

By your argument, we should have a _234 suffix in en-GB

Right but we do not ask to add such edge cases into the english language file,
just ask to keep the current way which covers the great majority of languages ?
so it is worth the troubles and redundancy in the english language file, it is a tradeoff

infograf768 - comment - 16 Jun 2018

I did not write Crowdin... The 0 plural state is different than other states on Crowdin. Crowdin just deals with plural, i.e. from 2 to whatever form is needed in a specific language.
Ask @Bakual about this. He knows better.
If there was a comment before the _1 string explaining that some languages may need to add a _0 string, it would be easier to do it in com_localise (which I heavily indeed participated in writing), because we can edit a raw version of the file in the component itself, which is totally impossible on Crowdin.

brianteeman - comment - 16 Jun 2018

Based on the information here http://www.unicode.org/cldr/charts/33/supplemental/language_plural_rules.html about languages with more than 1 plural I did a very quick install and check on several of those languages and i didnt find a single one that was using any extra strings than those provided in the en-gb strings.

mbabker - comment - 16 Jun 2018

You know what's funny, had I not even mentioned the possibility of potentially removing redundant strings nobody would've batted an eye at this PR. Instead, it's flat out rejected because heaven forbid someone mention touching the language files. There are exactly ZERO changes to the existing language strings and not once did I explicitly say "oh we should remove them", I pointed out what is redundant and removable to simplify testing of this pull request. Whatever. I'm done here. Glad to see a couple of people still get to dictate things in a supposedly open project.

brianteeman - comment - 16 Jun 2018

They only dictate because they are allowed to.

Bakual - comment - 16 Jun 2018

Instead, it's flat out rejected

@mbabker Actually, both JM and me said the PR itself is fine. It could be merged.
The thing is that it becomes quite useless if you can't remove the _0 strings and mey even cause confusion.

unless translators are ONLY providing plural strings for the same values we have in en-GB (0, 1, and MORE), there is no translation issue other than potentially dealing with Crowdin needing a way to give more strings to a translation than the source data has. If I'm not mistaken, there are some languages out there which have pluralization rules where they already have to add additional keys/suffixes for their needs. So no, this change actually should not affect translation workflow at all.

With all due respect, but you're mistaken here.
Crowdin knows the possible plural forms for each language and offers proper tabs in the translation form. Eg in german (with two forms like english)

Russian with its 4 variants looks like this

That is automatically done by Crowdin.
Please note that there is no plural form proposed for the _0 case. That string is proposed separate for translation if it is present in the source. If it is missing in source, it will rightfully not be proposed.

By your argument, we should have a _234 suffix in en-GB because Serbian uses that, or a _2 suffix because Polish uses that, or whatever other suffixes exist in languages that I don't have an immediate backup of by way of the downloads site.

No it is not. The plural forms are taken care of automatically. It's just the _0 case.

mbabker - comment - 16 Jun 2018

Can we just stop here? I offered an improvement and withdrew it because everyone decided to focus on something completely off-topic and unrelated to the change proposal. I'm not going to keep arguing against the gatekeepers of the longest standing silo of Joomla.

brianteeman - comment - 16 Jun 2018

Blow up the silo!

Bakual - comment - 16 Jun 2018

It would of course help if you stop insulting people and making wrong assumptions.

mbabker - comment - 16 Jun 2018

And it would help if other people would stop assuming I'm out to cause malicious harm to teams and workflows.

Bakual - comment - 16 Jun 2018

Agreed.

Add a Comment

Older
Newer

Joomla! Issue Tracker - CMS

[#20776] - Optimize plural suffix logic for en-GB

Summary of Changes

Testing Instructions

Add a Comment