No Code Attached Yet bug
avatar weeblr
weeblr
19 Oct 2021

Steps to reproduce the issue

Have an article with a apostrophe in its title, such as Let's go.

Expected result

Generated alias is let-s-go

Actual result

This is what happens in Joomla 3. In Joomla 4, the apostrophe is stripped, the alias is lets-go

Additional comments

I think this is a significant backward compatibility issue that will break a lot of URLs if people upgrade a Joomla 3 site to Joomla 4.

I have seen a number of reported issues with non-latin characters such as #27875 and #35014. Those appeared to be related to transliteration happening in a different way, and the language pack being involved.

This is a bit different in that it happens with stock, en-GB joomla content.

I have not taken the time to look into the code, as this was reported on a French group by others but I tested to see if it appears as well on stock en-GB Joomla. I think this should be modified and reverted to the Joomla 3 behavior, mostly for content B/C reasons and the expected ranking drops if a URL changes unknowingly.

Note that I tested with other usual suspects such as $ or # and Jooml 3 & 4 behavior appears to be the same for these characters.

Has this been reported before?

avatar weeblr weeblr - open - 19 Oct 2021
avatar joomla-cms-bot joomla-cms-bot - change - 19 Oct 2021
Labels Added: No Code Attached Yet
avatar joomla-cms-bot joomla-cms-bot - labeled - 19 Oct 2021
avatar weeblr weeblr - change - 19 Oct 2021
The description was changed
avatar weeblr weeblr - edited - 19 Oct 2021
avatar weeblr weeblr - change - 19 Oct 2021
The description was changed
avatar weeblr weeblr - edited - 19 Oct 2021
avatar brianteeman
brianteeman - comment - 19 Oct 2021

It was a deliberate change

avatar weeblr
weeblr - comment - 19 Oct 2021

Care to expand or link to something where this was discussed and we could get an idea of the reasons behind?

(I looked through 18 pages of "alias" related Github issues before posting the above, could not find anything)

avatar infograf768
infograf768 - comment - 19 Oct 2021

In any case, the url(s) that were created in J3 will not be modified

avatar weeblr
weeblr - comment - 19 Oct 2021

Hi @infograf768 :)

You're right, existing aliases won't be modified, except in edge cases where articles or menu items are deleted and then re-created for instance.

So this likely is not a general B/C break I guess.

avatar brianteeman
brianteeman - comment - 19 Oct 2021
avatar weeblr
weeblr - comment - 19 Oct 2021

@brianteeman OK, I see. I can see it kind of works ok for the English language, not so much for french where the apostrophe is often going to be at the start (d'évaluer and dévaluer really are not the same word for instance).

Too late to change now anyway.

avatar brianteeman
brianteeman - comment - 19 Oct 2021

Not tested but can't the custom transliterate function in a language pack address this https://docs.joomla.org/J3.x:Making_a_Language_Pack_for_Joomla#Example_2_-_Custom_transliteration_implemented

avatar weeblr
weeblr - comment - 19 Oct 2021

It should, the call to transliterate() happens before the regular expression that drops anything but latin alphanumeric.

That's likely where the change should have happened, in the en-* localise.php files.

I'll suggest that to the French localization team.

PS: I noticed that your change was done in src/j4/libraries/src/Filter/OutputFilter.php but not in the vendored folder: src/j4/libraries/vendor/joomla/filter/src/OutputFilter.php, so there's likely an inconsistency risk down the line, right?

avatar weeblr
weeblr - comment - 19 Oct 2021

OK, this kind of work but won't likely with multilingual sites.

The thing is, the transliterator called is the one corresponding to the current admin language.

So if the backend is displayed in French and you create a French article (language set to FR in the article options), then the transliterate() method in fr-FR.localise.php is used and I can put change there to preserve apostrophe.

However, if the backend is set to English and I create/edit the same article, then the default localise.php file is used, and my custom transliteration for French is not applied.

So for most people I guess adding ' ' => '\'', at the top of the $glyph_array in transliterate() will do the job.

For multilingual sites, they'll be better off adding a transliterate method in localize.php, so that it applies to all items.

Again, I'd think this should be done in per language transliterate but it's too late now for 4.x.

avatar Hackwar Hackwar - change - 22 Feb 2023
Labels Added: bug
avatar Hackwar Hackwar - labeled - 22 Feb 2023
avatar brianteeman
brianteeman - comment - 23 Aug 2024

Again, I'd think this should be done in per language transliterate but it's too late now for 4.x.

is there anything left to do here or should it be closed

Add a Comment

Login with GitHub to post a comment