? ? Failure

User tests: Successful: Unsuccessful:

avatar infograf768
infograf768
27 Jun 2017

See #12049

Summary of Changes

This PR takes advantage of the PHP extension intl when it is enabled to use the transliterator_transliterate() method, itself using ICU library.

The php extension is available since php 5.4.0, but may not be enabled on some hosts.

If disabled, former behavior is used which does not accept to upload files with UTF8 names and displays an error. This PR is therefore totally BC.

Using iconv and IGNORE let's get rid of some prime-characters that can't be transliterated, like the Cyrillic letter ь.

Testing Instructions

Check in System Information => PHP Information that the extension is enabled:
You should get something like this:

screen shot 2017-06-27 at 10 14 47
(It is more likely to be enabled by default when using php 5.5.x or higher).
If it is not enabled on your local environment, try to modify your PHP.ini.

Change an image file name to utf8.
I used here '完 成'".png to be sure we also get rid of unwanted characters.

screen shot 2017-06-27 at 10 02 22

Go to Media Manager =>Upload, select the file and click upload.
In the case above I get a pure ascii name wan-cheng.png:

screen shot 2017-06-27 at 10 03 28

Test with other utf8 fine names.

NOTE: J4 may use a similar code.

@ggppdk @schnuti

avatar infograf768 infograf768 - open - 27 Jun 2017
avatar infograf768 infograf768 - change - 27 Jun 2017
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 27 Jun 2017
Category Administration com_media
avatar brianteeman
brianteeman - comment - 27 Jun 2017

What will happen if I move a site from a server that does use this php function to a server that does not?

avatar infograf768
infograf768 - comment - 27 Jun 2017

What will happen if I move a site from a server that does use this php function to a server that does not?

Nothing, as the files already uploaded are OK with their ascii names.
New uploads of files with utf8 names will display the usual error.

avatar ggppdk
ggppdk - comment - 27 Jun 2017

Just i still think this new code should go inside makeSafe() !
so that benefit will be everywhere that makeSafe() is used

avatar infograf768
infograf768 - comment - 27 Jun 2017

I was not sure we could do that with unwanted consequences. Therefore why I limited it to com_media.
@joomla/cms-maintainers
What do you think?
I can modify easily.

avatar zero-24
zero-24 - comment - 27 Jun 2017

I was not sure we could do that with unwanted consequences.

What kind of consequences do you expect by moving that code?

avatar infograf768
infograf768 - comment - 27 Jun 2017

Will move the code to makesafe()

avatar infograf768 infograf768 - change - 27 Jun 2017
Labels Added: ?
avatar joomla-cms-bot joomla-cms-bot - change - 27 Jun 2017
Category Administration com_media Libraries
avatar infograf768
infograf768 - comment - 27 Jun 2017

Moved the code to the makeSafe() method.

@ggppdk @zero-24 @schnuti
Please test.

avatar infograf768 infograf768 - change - 27 Jun 2017
Title
Media Manager: Transliteration of UTF8 file names when uploading
Media Manager: Transliteration of UTF8 file names when uploading (makeSafe)
avatar infograf768 infograf768 - edited - 27 Jun 2017
avatar infograf768 infograf768 - change - 27 Jun 2017
Title
Media Manager: Transliteration of UTF8 file names when uploading
Media Manager: Transliteration of UTF8 file names when uploading (makeSafe)
avatar infograf768 infograf768 - change - 27 Jun 2017
The description was changed
avatar infograf768 infograf768 - edited - 27 Jun 2017
avatar schnuti
schnuti - comment - 30 Jun 2017

👍 Thanks @infograf768. It works for me.
Tested with different filenames. German, French, Swedish, Czech, Greek, Bulgarian, Chinese.

Personally I would prefer if only one hyphen is used between characters and no hyphen at the end of the name.
e.g. Åländsk höna.jpg (5 blanks) => alandsk-----hona.jpg

Underscore seems to be an allowed character. Is it? And in this case no hyphen was added.

# % & { } $ ! @ _.jpg (blanks between the characters) => _.jpg

Edit: The first character made a header of this line.

As I think the underscore should be transformed to a hyphen and no hyphen at the end of the file name, this one should fail.

I'll make some more tests with disallowed characters but I find this only as details and do not really influence the successful test.

To testers: Check if intl is activated as described above.
On my XAMPP system I had to activate the intl extension in php.ini
Look for the line
;extension=php_intl.dll
Remove the semicolon, save the file and restart the server.

avatar ggppdk ggppdk - test_item - 30 Jun 2017 - Tested successfully
avatar ggppdk
ggppdk - comment - 30 Jun 2017

I have tested this item successfully on c75e759

With intl enabled i get transliteration + existing behaviour
Without intl enabled i get existing behaviour

about:

As I think the underscore should be transformed to a hyphen and no hyphen at the end of the file name, this one should fail.

underscore is a valid character, let's not add such a replacement


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/16878.

avatar schnuti schnuti - test_item - 30 Jun 2017 - Tested successfully
avatar schnuti
schnuti - comment - 30 Jun 2017

I have tested this item successfully on c75e759

I now agree. Keep character tests as is. What I wrote is too opinioned.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/16878.

avatar franz-wohlkoenig franz-wohlkoenig - change - 30 Jun 2017
Status Pending Ready to Commit
avatar franz-wohlkoenig
franz-wohlkoenig - comment - 30 Jun 2017

RTC after two successful tests.

avatar infograf768 infograf768 - change - 30 Jun 2017
Milestone Added:
avatar infograf768
infograf768 - comment - 30 Jun 2017

Thanks for testing.
please also test #16880

@ggppdk @schnuti

avatar brianteeman
brianteeman - comment - 30 Jun 2017

i still have doubts about introducing functionality that will not exist for over 10% of the users

avatar infograf768
infograf768 - comment - 30 Jun 2017

Always so positive... even if it was only 10% (where do you take this figure from?) it will help those concerned.
See btw https://w3techs.com/technologies/details/pl-php/5/all = 51% use php >=5.4
php 7 getting in slowly https://w3techs.com/technologies/details/pl-php/7/all

Already shows the extension is present on all these servers. It may or may not be enabled. Impossible to know... I know it is enabled by default on my share host which runs 5.6.30

In a discussion with @mbabker in maintainers chat, he proposed we add in j4 the symphony INTL as a J library, which will make this "new" functionality independent from the php extension itself.

avatar brianteeman
brianteeman - comment - 30 Jun 2017

Always so positive... even if it was only 10% (where do you take this figure from?) it will help those concerned.

From the recorded Joomla stats https://developer.joomla.org/about/stats.html

Over the years there have been lots of great proposals that couldn't move forward because of the php version that we support.

avatar infograf768
infograf768 - comment - 30 Jun 2017

We do not read these stats with the same glasses. They just say that 5.3 is used by 13% and all others >=5.4 therefore a possible share of 87%.
The stats do not check for now if a specific function exists. And, anyway, this PR is totally B/C.
Thanks for your concern.

avatar brianteeman
brianteeman - comment - 30 Jun 2017

For someone who cares so much about B/C thats a disappointing viewpoint

avatar dgrammatiko
dgrammatiko - comment - 30 Jun 2017

@brianteeman mate you are wrong here! What @infograf768 is doing here is totally B/C and also is progressive enhancement. I can't see what more can be done in a sensible manner here (given that coding for an unsupported version of PHP is just a waste of time).

avatar ggppdk
ggppdk - comment - 30 Jun 2017

The code does not break for PHP 5.3, so it is B/C enough

They just say that 5.3 is used by 13% and all others >=5.4 therefore a possible share of 87%.

possibly you could subtract those sites using mostly latin-based filenames, english, french, etc, because these already have little problems, since they get their filename proper already with just the accented characters removed

And finally users with of non-latin based languages, will be able to both

  • upload files even if all filename characters are non latin characters
  • and get some "usuable" filenames too

what is currently happening with these filenames really makes Joomla look bad

avatar mbabker
mbabker - comment - 30 Jun 2017

We've been very careful to never add functionality that results in different behaviors on different PHP versions. This is crossing that line (hell I'm not even comfortable with my last libsodium PR because it isn't available on PHP 5.3). There really needs to be a majorly strong supporting use case to break this philosophy now, especially as we are slowing down development on the 3.x branch and begin to focus on stability for our current definition of "long term support".

avatar brianteeman
brianteeman - comment - 30 Jun 2017

thank you for expressing more eloquently what I tried to say

avatar schnuti
schnuti - comment - 30 Jun 2017

possibly you could subtract those sites using mostly latin-based filenames, english, french, etc, because these already have little problems, since they get their filename proper already with just the accented characters removed

You can't really subtract Latin filenames, only the ASCII characters (English). For others you often or mostly get unreadable filenames as the result.

avatar brianteeman
brianteeman - comment - 30 Jun 2017

Can we please remove the RTC label -

avatar infograf768
infograf768 - comment - 30 Jun 2017

Can we please remove the RTC label -

I don't see why. Release leaders may decide to merge or not, not you.

@mbabker
As it is totally B/C, I see no reason to not include this in the next 3.x version.
It will save time for many people who have the extension present (as it will avoid time loss for those who want to use transliteration instead of unicode slugs to get auto-ascii aliases) and 3.x is going to be available for 2 years more! It does not cost a cent to get it in.

avatar brianteeman
brianteeman - comment - 30 Jun 2017

It is not about BC but exactly what the release leader for the next release has stated.

We've been very careful to never add functionality that results in different behaviors on different PHP versions

avatar infograf768
infograf768 - comment - 30 Jun 2017

The behavior is NOT different fundamentaly, it just saves time for people who are lucky enough to have that extension loaded.

avatar mbabker
mbabker - comment - 30 Jun 2017

As it is totally B/C, I see no reason to not include this in the next 3.x version.

Yes it is B/C. It does not change existing behavior for PHP 5.3 users. That part is not in dispute. What is in dispute though is this introduces a behavior where PHP 5.3 users get logic based on one workflow and PHP 5.4+ users potentially get logic based on another workflow (IIRC the transliterator stuff is always enabled, but iconv is optional). This can result in this function producing different results in different hosting environments, or even pre- and post- upgrade for users moving off of PHP 5.3. That is what makes me uncomfortable accepting this into 3.x. Our API should be consistent regardless of the user being hosted on a PHP 5.3 platform or PHP 7.1, and generally we've done a good job at keeping things consistent.

To be clear, I'm fine with this improvement in general. I'm fine with it landing in 4.0. I'm just not 100% comfortable introducing this into 3.x.

avatar brianteeman
brianteeman - comment - 30 Jun 2017

If a = 1 do X
If a = 2 do Y

That is a fundamental different behaviour to me

avatar infograf768
infograf768 - comment - 30 Jun 2017

if a = 1 displays Error
if a = 2 does not display Error

Is indeed different. ROTFLOL
Whatever...

avatar zero-24
zero-24 - comment - 30 Jun 2017

What is about that symfony lib? Don't include this lib the fallbacks?

avatar mbabker
mbabker - comment - 30 Jun 2017

There is not a polyfill for the transliterator_transliterate function. There are polyfills available for the iconv, intl, and mbstring PHP extensions.

avatar infograf768
infograf768 - comment - 30 Jun 2017

intl does contains transliterator_transliterate afaik

avatar mbabker
mbabker - comment - 30 Jun 2017

Even though extensions get polyfilled, not all of their functionality gets replicated in all cases. In Symfony's package, transliterator_transliterate is not one of the replicated things. And on a quick Google search, I did not see a polyfill for that function.

avatar infograf768
infograf768 - comment - 1 Jul 2017

Therefore, if I understand well, even for 4.0, if the php intl extension is not enabled, we will be exactly in the same situation as for 3.x.

This can result in this function producing different results in different hosting environments

avatar rdeutz
rdeutz - comment - 6 Jul 2017

I also think we shouldn't include it in 3.x, it is a difference if we have a different system behaviour on enable/disables php extensions or different php versions. The first one is something you can fix be enabling the extension the second one can't be fixed. Even if this will help some users and improve the system for them, we shouldn't start following the path making the system behave different for different PHP Version on purpose.

avatar rdeutz rdeutz - change - 6 Jul 2017
Milestone Removed:
avatar rdeutz rdeutz - change - 6 Jul 2017
Status Ready to Commit Needs Review
Labels Added: ?
avatar franz-wohlkoenig franz-wohlkoenig - change - 19 Aug 2017
Status Needs Review Discussion
avatar schnuti
schnuti - comment - 14 Oct 2017

@rdeutz @infograf768 @ggppdk @mbabker and all of you
I suppose it's time to re-evaluate now and set a 4.0 milestone label.
It should really be used in the new media manager.
As it is at the moment, it even got worse as any dashes (-) get removed.

avatar franz-wohlkoenig franz-wohlkoenig - change - 27 Oct 2017
Status Discussion Needs Review
avatar brianteeman
brianteeman - comment - 29 Dec 2019

@HLeithner This really should be closed. It's never going to get merged in J3

avatar HLeithner
HLeithner - comment - 29 Dec 2019

Similar pr then the other, a rebase to j4 would make sense if it's not fixed already...

avatar HLeithner HLeithner - change - 29 Dec 2019
Status Needs Review Closed
Closed_Date 0000-00-00 00:00:00 2019-12-29 22:52:16
Closed_By HLeithner
avatar HLeithner HLeithner - close - 29 Dec 2019
avatar schnuti
schnuti - comment - 30 Dec 2019

Are you aware that filenames coming from MS-Windows (Unicode) are translated to utf-8 since PHP version 7.2. i.e. makeFilesafe can be modified now that J4 Beta has the requirement min 7.2.5
It was not possible to change it for the New MediaManager until this decision was made.
I use this behaviour myself since the PHP 7.2 release. I e.g. store filenames with UTF-8 characterset in the database without any noticed problems
I have no clue if the New MediaManager will be backported to J3.10

avatar infograf768
infograf768 - comment - 30 Dec 2019

@schnuti
Not sure what you mean. Afaik one can’t use pure utf8 characters in a path (the image name or anything passing through makeSafe() will be part of a path).
The only way is using percent encodings (same method we use for unicode slugs) which may create problems for file names.
I am discussing now with @HLeithner about including or not the patch above (and the other one specific for aliases) in J4.

avatar schnuti
schnuti - comment - 30 Dec 2019

I don't know any cases where percent encoding is needed. As far as I can see, the browser or the javascript translates the characters as needed.
Exampel:
pasted in the browsers address field or used browsers Open File:
file:///D:/images/_Üb er_größen-längeå.jpg => shows the image

This copied from the browsers address field, pasted to the windows simple editor:
file:///D:/images/_%C3%9Cb%20er_gr%C3%B6%C3%9Fen-l%C3%A4nge%C3%A5.jpg

javascript is internally using %-encoding if needed as I've seen when debugging.

The same is valid for using alias with those characters. No router problems as far as I can see.

avatar HLeithner
HLeithner - comment - 30 Dec 2019

I have no clue if the New MediaManager will be backported to J3.10
This part is easy, no it will not be backported, 3.10 is a compatibility release for J4 with no new planed features.

This patch is for saving files and to have a filename that does a bit of a meaning for non latin languages. I'm not sure if restricting filenames to latin characters only is necessary but I also see no reason to remove this limitation atm.

As poly-fill to intl extension we could use https://github.com/LukeMadhanga/transliterator for example. But only for J4+

What's @wilsonge opinion?

avatar infograf768
infograf768 - comment - 30 Dec 2019

As poly-fill to intl extension we could use https://github.com/LukeMadhanga/transliterator for example. But only for J4+

Looks like it is rather limited.

avatar brianteeman
brianteeman - comment - 30 Dec 2019

what can it not do that the intl extension can do?

avatar infograf768
infograf768 - comment - 30 Dec 2019

Look by yourself which character sets it deals with in its description...

avatar brianteeman
brianteeman - comment - 30 Dec 2019

what can it not do that the intl extension can do?

avatar HLeithner
HLeithner - comment - 30 Dec 2019

I would say it misses any asia language.

avatar infograf768
infograf768 - comment - 30 Dec 2019

and arabic, and hebrew, persian, etc.
it is a simple transliterator, just a bit more complex than the default core one.

the php intl extension uses ICU as clearly explained in this PR description.

avatar wilsonge
wilsonge - comment - 16 Feb 2020

Sorry for the delay replying here - There's no intention to backport new media manager into 3.10 - if nothing else browser requirements will kill us. I'm more than happy to see this go into J4 media manager though - will leave 3.x decision to @HLeithner

avatar HLeithner
HLeithner - comment - 20 Feb 2020

We have it in j4 now, I don't want to change this in J3. thx

avatar infograf768
infograf768 - comment - 20 Feb 2020

@HLeithner
Nope, we do not have it in J4.
It was only done for the url alias.
I have prepared a patch for J4 and asked @wilsonge if Ok to propose.

Add a Comment

Login with GitHub to post a comment