?
avatar osignell
osignell
9 Sep 2015

Steps to reproduce the issue

  • Create an image with the name "åäö.jpg" or any other filename consisting of ONLY non-english characters.
  • upload via Media Manager in the backend.

Expected result

File uploaded and stored with a "safe" filename.

Actual result

File is not uploaded. An error message about "This file type is not supported." is displayed.

System information (as much as possible)

Joomla 3.4.3, Linux, Apache, PHP 5.5

Additional comments

The makeSafe function strips out all non-english characters from the filename, leaving only "png" which does not qualify as a valid filename.

Could we do this the same way as in JFilterOutput::stringURLSafe() and use JFactory::getLanguage()->transliterate() on the filename before running the stripping regexp?

Votes

# of Users Experiencing Issue
1/1
Average Importance Score
4.00

avatar osignell osignell - open - 9 Sep 2015
avatar brianteeman
brianteeman - comment - 9 Sep 2015

Issue confirmed
Another example is with a file names 1åäö.jpg - this will be uploaded as 1.jpg

avatar ryandemmer
ryandemmer - comment - 10 Sep 2015

This needs to be language independent, so it should not require additional processing through a language file.

The regular expression used by JFile::makeSafe needs to use a PCRE modifier to allow UTF-8 characters, eg:

$regex = array('#(\.){2,}#', '#[^a-zA-Z0-9_\.\-~\p{L}\p{N}\s ]#u');

however this requires PCRE 5 support.

See http://php.net/manual/en/regexp.reference.unicode.php

JCE already handles this quite well I think, so one might consider a variation of:
https://github.com/widgetfactory/jce/blob/master/components/com_jce/editor/libraries/classes/utility.php#L146-L215

avatar Bakual
Bakual - comment - 10 Sep 2015

I wouldn't make it to complicate.
I would use the JApplication::stringURLSafe() method which takes into account the setting from the config if we want to allow unicode alias or not.
If the resulting filename doesn't contain any valid characters anymore, either show an error or just create a filename based on datetime.

Like something along what I do in my own extension: https://github.com/Bakual/SermonSpeaker/blob/master/com_sermonspeaker/admin/controllers/file.json.php#L75-L88

// Make filename URL safe. Eg replaces ä with ae.
$file['name'] = JFilterOutput::stringURLSafe(JFile::stripExt($file['name'])) . '.' . $ext;

// Make the filename safe
$file['name'] = JFile::makeSafe($file['name']);

// Replace spaces in filename as long as makeSafe doesn't do this.
$file['name'] = str_replace(' ', '_', $file['name']);

// Check if filename has more chars than only underscores, making a new filename based on current date/time if not.
if (count_chars(JFile::stripExt($file['name']), 3) == '_')
{
    $file['name'] = JFactory::getDate()->format("Y-m-d-H-i-s") . '.' . $ext;
}
avatar ryandemmer
ryandemmer - comment - 10 Sep 2015

JApplication::stringURLSafe() makes the string lowercase and removes the period character.

avatar mbabker
mbabker - comment - 10 Sep 2015

It still uses the language system and URL safe strings don't use the same
rules as filesystem safe strings so I wouldn't necessarily go that route.

On Thursday, September 10, 2015, Thomas Hunziker notifications@github.com
wrote:

I wouldn't make it to complicate.
I would use the JApplication::stringURLSafe() method which takes into
account the setting from the config if we want to allow unicode alias or
not.
If the resulting filename doesn't contain any valid characters anymore,
either show an error or just create a filename based on datetime.


Reply to this email directly or view it on GitHub
#7841 (comment).

avatar Bakual
Bakual - comment - 10 Sep 2015

I don't have an issue if it uses language functions. I'd say there is a high chance the special characters present in the filename are from the same language the user has active.
Also lowercasing the filename isn't necessary a bad thing when it comes to filenames that end up in URLs. It solved a lot of support requests in the case of my extension :smile:
But it may indeed be the wrong approach for the core.

avatar Fedik
Fedik - comment - 10 Sep 2015

what about use JLanguageTransliterate::utf8_latin_to_ascii() directly? :smile:

avatar JoomliC
JoomliC - comment - 18 Dec 2015

So, it seems that issue with special characters was solved ?

But isn't it possible to allow spaces in uploaded file, and to convert space to dash ?

I agree with @Bakual that a url safe name could be good, to prevent issue in the case we allow spaces, with pdf files on old browsers (eg. allow spaces in uploaded file, convert to stringURLSafe, then file readable in IE7/XP as an example).

avatar brianteeman brianteeman - change - 21 Mar 2016
Status New Confirmed
avatar ZoFx
ZoFx - comment - 8 Sep 2016

@JoomliC, what makes you think the issue was solved? Didn't find nothing about this. If it isn't solved yet, I would vote for Fedik's solution to use JLanguageTransliterate::utf8_latin_to_ascii(). Works well. It might not be the perfect solution, but is way better than just stripping all the special characters from the file name. In languages like German, French, Spanish (and for sure a lot more), which are heavily relying on accents, the filenames look pretty ugly when just stripping all the accented letters. It could be considered a temporary solution until a more in-depth solution has been developed.

Of course after the transliteration the filename can and should still be "treated" by the makeSafe function.


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/7841.

avatar JoomliC
JoomliC - comment - 14 Sep 2016

@ZoFx me too, found nothing about this but when testing since 3.5 with åäö.jpg, this one works, and file is renamed aao.jpg (just tested again on Joomla 3.6.2, and still works for me).
That's why i supposed this fixed, but didn't see where the change was applied...

What does not work yet is a filename like this : 代替品.jpg
This why i proposed this in comment : #9608 (comment) (i can do a PR for asian, arabic ... characters to generate a datetime-based filename if accepted as a possible workaround ?)

avatar ryandemmer
ryandemmer - comment - 14 Sep 2016

I would recommend using transliterator_transliterate for this:

http://php.net/manual/en/transliterator.transliterate.php

eg:

if (function_exists('transliterator_transliterate')) {
    return transliterator_transliterate('Any-Latin; Latin-ASCII;', $subject);
} else {
    return JLanguageTransliterate::utf8_latin_to_ascii($subject);
}

This does work on 代替品 resulting in dai ti pin

avatar brianteeman
brianteeman - comment - 14 Sep 2016

Sadly that is php 5.4 and above only

On 14 September 2016 at 14:14, Ryan Demmer notifications@github.com wrote:

I would recommend using transliterator_transliterate for this:

http://php.net/manual/en/transliterator.transliterate.php

eg:

if (function_exists('transliterator_transliterate')) { return transliterator_transliterate('Any-Latin; Latin-ASCII;', $subject);} else { return JLanguageTransliterate::utf8_latin_to_ascii($subject);}

This does work on 代替品 resulting in dai ti pin


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#7841 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABPH8TNTka5XRbj7h1wyVb0OFXUMkjaHks5qp_MbgaJpZM4F6HHU
.

Brian Teeman
Co-founder Joomla! and OpenSourceMatters Inc.
http://brian.teeman.net/

avatar JoomliC
JoomliC - comment - 14 Sep 2016

@brianteeman I think this is why @ryandemmer had this :
if (function_exists('transliterator_transliterate')) {

;-)

@ryandemmer will you propose a PR to allow this ? (would be great! ? )

avatar ryandemmer
ryandemmer - comment - 14 Sep 2016

The changes should probably be made to JFile::makeSafe - https://github.com/joomla/joomla-cms/blob/staging/libraries/joomla/filesystem/file.php#L58-L66

eg:

public static function makeSafe($file)
{
    // Remove any trailing dots, as those aren't ever valid file names.
    $file = rtrim($file, '.');

    if (function_exists('transliterator_transliterate')) {
        $transformed = transliterator_transliterate('Any-Latin; Latin-ASCII;', $file);
    } else {
        $transformed = JLanguageTransliterate::utf8_latin_to_ascii($file);
    }
    $regex = array('#(\.){2,}#', '#[^A-Za-z0-9\.\_\- ]#', '#^\.#');

    if ($transformed !== false) {
        return trim(preg_replace($regex, '', $transformed));
    }

    return trim(preg_replace($regex, '', $file));
}
avatar ZoFx
ZoFx - comment - 14 Sep 2016

@JoomliC I confirmed it on a complete fresh and clean 3.6.2 installation: uploading åäö.jpg through media manager results in "Notice: This file type is not supported." So I confirm it's not fixed yet. But you seem to be about to fix it :-).

avatar JoomliC
JoomliC - comment - 14 Sep 2016

@ZoFx tested on a fresh 3.6.2 install on local, and there the result :

capture d ecran 2016-09-14 a 16 16 55

capture d ecran 2016-09-14 a 16 13 03

As you can see, for me it works with a jpg named åäöéèû.jpg

Are you sure of the mime-type of this file ?

Could someone else confirm if it works or not with an image named åäöéèû.jpg ?

avatar ZoFx
ZoFx - comment - 14 Sep 2016

@JoomliC Wow, that's strange. So it might be dependent on the setup ... Unfortunately I'm already well beyond my knowledge when it comes to charset and Joomla. I leave this up to you to investigate and hopefully fix it. Please let me know if I can assist you with additional information.

avatar ZoFx
ZoFx - comment - 14 Sep 2016

Just an idea: could it be that your server setup somehow already transliterates the filename before handing it over to PHP while in the case of my setup, it does not? I have only restricted knowledge of PHP programming, so please forgive if it's complete nonsense :-).

avatar franz-wohlkoenig
franz-wohlkoenig - comment - 28 Jan 2017

Original: 1öäü.jpg,
Upload: 1oau.jpg

Tested on:
Joomla! 3.7.0-beta1
macOS Sierra, 10.12.3
Firefox 50.1.0
PHP 7.0.4
MySQLi 5.5.53-0

avatar rvbgnu
rvbgnu - comment - 3 Jun 2017

Waiting for PR #12049 to be updated

avatar franz-wohlkoenig
franz-wohlkoenig - comment - 3 Jun 2017

@rvbgnu #12049 is a PR for this Issue?

avatar franz-wohlkoenig franz-wohlkoenig - change - 3 Jun 2017
The description was changed
Status Confirmed Information Required
avatar joomla-cms-bot joomla-cms-bot - edited - 3 Jun 2017
avatar rvbgnu
rvbgnu - comment - 3 Jun 2017

Yes, it is referenced a few lines above, but the present issue still has the no code attached label.
And the PR has the status This branch is out-of-date with the base branch

avatar franz-wohlkoenig franz-wohlkoenig - change - 3 Jun 2017
Status Information Required Closed
Closed_Date 0000-00-00 00:00:00 2017-06-03 15:49:31
Closed_By franz-wohlkoenig
avatar joomla-cms-bot joomla-cms-bot - change - 3 Jun 2017
Closed_Date 2017-06-03 15:49:31 2017-06-03 15:49:32
Closed_By franz-wohlkoenig joomla-cms-bot
avatar joomla-cms-bot joomla-cms-bot - close - 3 Jun 2017
avatar joomla-cms-bot
joomla-cms-bot - comment - 3 Jun 2017
avatar franz-wohlkoenig
franz-wohlkoenig - comment - 3 Jun 2017

closed as having PR #12049


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/7841.

avatar ggppdk
ggppdk - comment - 3 Jun 2017

I will update that PR, just need to remove an unwantable dependency to some class (see the PR)

Add a Comment

Login with GitHub to post a comment