? ?
avatar 7at1blow
7at1blow
1 Aug 2021

Steps to reproduce the issue

  • J!4.0.0-rc5
  • add an article with a title which includes a german Umlaut (or serveral)
  • save article
  • see what the alias looks like

Expected result

  • that the german Umlaut(s) "ä, ü and ö" will be converted in the article alias into "ae, ue and oe" like it is done/managed automatically in J!3.x

Actual result

  • The german Umlaut(s) "ä, ü and ö" won't be converted automatically in the article alias into "ae, ue and oe".

System information (as much as possible)

  • J!4.0.0-rc5
  • PHP 8.0.8

Additional comments

I hope that I have not posted something that is already known.

avatar 7at1blow 7at1blow - open - 1 Aug 2021
avatar joomla-cms-bot joomla-cms-bot - change - 1 Aug 2021
Labels Added: ?
avatar joomla-cms-bot joomla-cms-bot - labeled - 1 Aug 2021
avatar 7at1blow 7at1blow - change - 1 Aug 2021
Title
The German umlauts of an article title are no longer considered in its alias (like it is in J!3.x)
J!4.0.0-rc5: The German Umlauts of an article title are no longer considered in its alias (like it is in J!3.x)
avatar 7at1blow 7at1blow - edited - 1 Aug 2021
avatar Kostelano
Kostelano - comment - 1 Aug 2021

This is not a question of Joomla itself, but should be regulated by a localization file (in your case, Germany). Here is an example of how this is implemented in the Russian-language package.

If I understand correctly, there is no such function in the file of the German language pack.

Sorry if I misunderstood the problem.

avatar Fedik
Fedik - comment - 1 Aug 2021

check global configuration,
The option "unicode slugs" is set to "Yes" by default (that keeps umlauts and all UTF8 symbols untouched).
If you do not want it, you should set it to "No"

avatar 7at1blow
7at1blow - comment - 1 Aug 2021

Yes, when I choose (in J!4.0.0-rc5) the option "unicode slugs" (in German: "Unicode Aliase") = "ON" all the Umlauts in the title alias are staying as they are in the title. Means an "ä" stays an "ä" and analog the same with the ü and ö.
But when I switch this option to OFF, then the "ä" will only be converted into an "a" instead of an "ae".

In J!3.x the Umlauts are all be converted correctly from the title to the title alias, means the Umlaut "ä" will automatically become an "ae", the "ö" an "oe" and the "ü" an "ue". I'm missing this J!3-feature in J!4.

avatar richard67
richard67 - comment - 1 Aug 2021

In J!3.x the Umlauts are all be converted correctly from the title to the title alias, means the Umlaut "ä" will automatically become an "ae", the "ö" an "oe" and the "ü" an "ue". I'm missing this J!3-feature in J!4.

That’s what @Kostelano wrote, transliteration.

avatar brianteeman
brianteeman - comment - 1 Aug 2021

there is a library function for this.

avatar richard67
richard67 - comment - 1 Aug 2021

Can it be that it depends on the availability of the ICONV library?

avatar richard67
richard67 - comment - 1 Aug 2021

or was it INTL? Or either one of these?

avatar brianteeman
brianteeman - comment - 1 Aug 2021

libraries\src\Language\Transliterate.php

avatar Fedik
Fedik - comment - 1 Aug 2021

if I understand correctly, there is no such function in the file of the German language pack.

Hm, there also no such thing for Joomla 3 version, but there it works.
Something else changed.

avatar ReLater
ReLater - comment - 1 Aug 2021

Just for the sake of completeness. Additional test on same server, same environment, same PHP libraries.
Joomla 4: Räsepöse gets converted to rasepose and that's wrong.
Joomla 3: Räsepöse gets converted to raesepoese and that's right.

"Same" Joomla configuration.

Please don't tell me that users now have to adapt any obscure files ;-)

avatar brianteeman
brianteeman - comment - 1 Aug 2021

@Fedik it works because the library is being used. Russian language has to override it I suspect.
Sorry dont have time right now to look and see how/where it is being used.

Language isnt exactly my thing but I would start by looking at the code that generates the alias

avatar brianteeman
brianteeman - comment - 2 Aug 2021
  1. Maybe I am missing something but I couldnt get the umlauts to be transliterated as expected even with Joomla 3

The relevant commit from 5 years ago with detailed explanation of how it is supposed to work etc in the coments is #10348

avatar brianteeman
brianteeman - comment - 2 Aug 2021

Maybe I am missing something but I couldnt get the umlauts to be transliterated as expected even with Joomla 3

Worked it out for joomla 3. utf8 alias must be set to no and German has to be the only content language or German has to be selected as the language for the item.

Doesn't seem to work in J4 though

avatar Fedik
Fedik - comment - 2 Aug 2021

@richard67 was right, it somehow depends from iconv library, and its behavior.

In j4 it does not reach Transliterate::utf8_latin_to_ascii if the library exists

// Override custom and core transliterate method with native php function if enabled
if (function_exists('transliterator_transliterate') && function_exists('iconv'))
{
return iconv("UTF-8", "ASCII//TRANSLIT//IGNORE", transliterator_transliterate('Any-Latin; Latin-ASCII; Lower()', $string));
}

but strange that iconv returns a instead of ae,

avatar Fedik
Fedik - comment - 2 Aug 2021

Quick googling suggests to use de-ASCII for it, eg de-ASCII; Any-Latin; Latin-ASCII ...

avatar 7at1blow
7at1blow - comment - 2 Aug 2021

Thank you in the round that you have taken on the matter!

@ReLater
Were the two installations in your two tests including an installed German language file or still only with the English?

avatar brianteeman
brianteeman - comment - 2 Aug 2021

@Fedik
wow thats weird to override something in custom and core transliterations

avatar brianteeman
brianteeman - comment - 2 Aug 2021

So the pr that introduced that claimed to be fully b/c but was not tested by anyone before it was merged #27974

Now we see why every PR must be tested no matter who the contributor is.

@richard67 please mark this as a release blocker

As it is right now any custom transliteration eg in the Russian as shown above will never be used which is surely wrong.

avatar richard67 richard67 - change - 2 Aug 2021
Labels Added: ?
avatar richard67 richard67 - labeled - 2 Aug 2021
avatar Fedik
Fedik - comment - 2 Aug 2021

I think use of transliterator_transliterate is a good idea in general,
it just need some more config

confirming that removing that code does correctly use the core transliterations and for german you correctly get

can you please also check whether this iconv() call will work:

return iconv("UTF-8", "ASCII//TRANSLIT//IGNORE", transliterator_transliterate('de-ASCII; Any-Latin; Latin-ASCII; Lower()', $string)); 
avatar Fedik
Fedik - comment - 2 Aug 2021

On quick test 'de-ASCII; Any-Latin; Latin-ASCII; Lower()' works good for umlauts, cyrillic, chinese,

I will try make PR later, if no one will be faster ;)

avatar brianteeman
brianteeman - comment - 2 Aug 2021

can you please also check whether this iconv() call will work:

confirmed for german

avatar ReLater
ReLater - comment - 2 Aug 2021

Sorry, when my prefered backend or main language is english or whatever and I edit a German article how do you want to identify de-ASCII then? That's just another bad solution for something that we never needed respectively just need in rare cases.

It's not possible to select a language for articles in Joomla 4 in mono-language sites; which would be also just an emergency solution just to create the correct alias and then switch back to All and save again.

All we can do then is writing custom plugins on affected sites that create the alias in onContentBeforeSave.

Let people at least decide by configuration if they really want this unnecessary iconv-transliterate sh... or want to use the well working old behavior.

avatar ReLater
ReLater - comment - 2 Aug 2021

Were the two installations in your two tests including an installed German language file or still only with the English?

Both English and German installed. Backend set to German while writing the article. In the end I disabled content language English in J!4. No changes at all with added function in de-DE.localise.php on Joomla 4.

avatar Fedik Fedik - change - 2 Aug 2021
Status New Closed
Closed_Date 0000-00-00 00:00:00 2021-08-02 15:00:10
Closed_By Fedik
avatar Fedik Fedik - close - 2 Aug 2021
avatar Fedik
Fedik - comment - 2 Aug 2021

please test #35029

avatar richard67 richard67 - unlabeled - 2 Aug 2021
avatar 7at1blow
7at1blow - comment - 4 Aug 2021

May/can I ask a few more comprehension questions here on the matter?

1.)
As I understood it, the (or one) cause of the incorrect conversion of the umlauts in the title alias of my 4.0-rc5 installation was old files (ICU Data, ...) on the server OS of the provider.

On the same webspace (i.e. with the same environment as for the 4.0-rc5-installation) I also have a J3.9 installation. But there the umlaut problem never existed.

Why is it that the problem with the umlaut conversion does not exist in my 3.9? Has the cause in my 4.0-rc5 perhaps been different (than too old ICU Data, ...)?

2.)
I've compared the Language.php that my 4.0-rc5 had until yesterday with the Language.php in the current zip file for the complete installation of 4.0-rc5. The code in both files is 100 % identical. In this respect I am glad that (at least in this point) everything went correctly with my updates from beta6 (at installation) to my rc5 now.

3.)
I have another 4.0-rc5 installation that I started in March with beta7. I'm using this installation to build a private site (with a few hundred articles) and have planned that it will go productive hopefully later this year. In light of this thread, I wonder if it was a mistake of mine to start with beta7 for my future productive site.

What is your opinion about this? Would it have been better to wait until the first official 4.0.0 is released to start building a site that will be productive in the future? (I hope not, because I've already put a lot of work into it and don't want to start all over again. But better now than to find out in a few months that I have to start all over again because of some problems.)

Thank you very much for your advice!

avatar richard67
richard67 - comment - 4 Aug 2021

Why is it that the problem with the umlaut conversion does not exist in my 3.9? Has the cause in my 4.0-rc5 perhaps been different (than too old ICU Data, ...)?

The problem existed only in J4 because in J4 an additional transliteration method was used to fix transliteration problems with other languages than German. As that is new functionality and for these language a potential backwards compatibility break, it was decided to do that in J4 only and not in J3.

As it turned out now by your issue, it is not granted that the new method always produced the desired result if the ICU library which is part of the OS and used by the PHP method is outdated.

Therefore the fix in J4 is a change in the order of processing, which of the transliteration methods are tried first (not all methods may be available on all hosts), and the test for incomplete or missing transliteration has been improved.

What is your opinion about this? Would it have been better to wait until the first official 4.0.0 is released to start building a site that will be productive in the future? (I hope not, because I've already put a lot of work into it and don't want to start all over again. But better now than to find out in a few months that I have to start all over again because of some problems.)

Updating between beta and RC versions and updating is officially supported and has been working, except of Beta 3 when we had a small problem which required an additional step (SQL fix). But as you have started with Beta 7, all should be ok you stay with the release candidates now and then go to the stable package. There only could be a problem if you had used nightly developer builds, or if one of the updates in past had failed and you had to fix database problems. But if all this is not the case, your site should be ok.

So in shorter words: There should not be a problem, and there were no big changes since Beta 7 and will not be now with RC which would make everything look or work completely different.

So I think it was not a bad idea to start with 4.0 Beta 7.

The only reason to wait with J4 now would be if you wantr to use some 3rd party extensions and they are not ready for J4 yet.

avatar 7at1blow
7at1blow - comment - 4 Aug 2021

Thank you very, very much, Richard! For your feedback, your assessment, your very understandable explanations and your helpful support at all!

avatar 7at1blow 7at1blow - change - 7 Aug 2021
Title
J!4.0.0-rc5: The German Umlauts of an article title are no longer considered in its alias (like it is in J!3.x)
[4.0-rc5] The German Umlauts of an article title are no longer considered in its alias (like it is in J!3.x)
avatar 7at1blow 7at1blow - edited - 7 Aug 2021

Add a Comment

Login with GitHub to post a comment