User tests: Successful: Unsuccessful:
Pull Request for Issue #7841
Upload files that are using characters in SITE or ADMIN default language of your website
Please also do a code review if you can
A new optional parameter $language = null is added to the method ... for doing a specific language transliteration
Status | New | ⇒ | Pending |
Labels |
Added:
?
|
Category | ⇒ | Libraries |
i will remove dependency on language component configuration, and fix code styling, please do not test yet
I have tested this item
By appliing the PR the site give this error
Parse error: syntax error, unexpected 'if' (T_IF) in /home/xxx/public_html/nl/xxxxxx/libraries/joomla/filesystem/file.php on line 126
Sorry I was testing on a joomla 3.7rc1 site
By appliing the PR the site give this error
Parse error: syntax error, unexpected 'if' (T_IF) in /home/xxx/public_html/nl/xxxxxx/libraries/joomla/filesystem/file.php on line 126
Sorry I was testing on a joomla 3.7rc1 site
hi, having more family members now, takes up a good portion of my time, now i can only work on to updating my extensions for J3.7 and on my other obligations, i can not update / look into this PR any more
you can close or rewrite this PR, it should work with only a few changes
Status | Pending | ⇒ | Needs Review |
Label removed
@ggppdk
Can't you send an array with the languages you now look up in the J library file?
e.g. a "helper" method in the CMS library could do the job and send the filename + languages to the J library.
I have to ask. Are those languages needed to "change" the translation-"table". I once did that "manually" in Joomla 1.5 for a cyrillic language..
@ggppdk
Didn't you read my suggestion and question? I saw you had the same idea somewhere above. Let the caller send the languages.
I found confusing use of makeSafe.
Correct in old com_media file controller
$file['name'] = JFile::makeSafe($file['name']);
But what happens here? In cms/helper/media method canUpload($file...
if (str_replace(' ', '', $file['name']) != $file['name'] || $file['name'] !== JFile::makeSafe($file['name']))
This tells me (the user) to make the filename safe with Joomla logic before uploading. (Currently in new mediamanager 403 forbidden.) i.e. even if the makeSafe code is modified this will kill the upload.
Didn't you read my suggestion and question? I saw you had the same idea somewhere above. Let the caller send the languages.
This PR already includes this new parameter for makeSafe
but
so i was hoping to avoid it
That was the second step i had.
e.g. a "helper" method in the CMS library could do the job and send the filename + languages to the J library.
Now is the question if this is possible in a BC way. I'm not fit enough with namespaces to know if a call to makeSafe can execute a CMS method and this method can call a makeSafe Joomla method. I hope so!
I*ve implemented the code in J 4.0 and tested to upload in the new Media Manager.
I moved the language "loading" out of the library file to the controller taking care of the AJAX call.
// Language transliteration should include given language, and also site + admin defaults (most useful is site default)
$lang_params = ComponentHelper::getParams('com_languages');
$lang_site_default = $lang_params->get('site', '*');
$lang_admin_default = $lang_params->get('admin', '*');
//$languages[$language] = $language && $language != '*';
$languages[$lang_site_default] = $lang_site_default != '*';
$languages[$lang_admin_default] = $lang_admin_default != '*';
// Call to transliteration method
$name = \JFile::makeSafe($name, $languages);
There are some issues not related directly to this as I mentioned above.
To my surprise a method is called that modifies the file name with Punycode! Why?
After removing those obstacles I tested with german and swedish characters with result as expected. Without installing the languages!
I had problems with french characters so I installed the language pack and set it to default language. I still have a problem here so I get Datetime as filename. I found that normal french characters are translated. It's the character in d'Artagnan that is NOT removed by the transliteration method. The character gets removed by the regex but then the check translated? fails.
I got some problems with the regex not working. As I do not understand what it is supposed to do I used my smaller version that so far works.
$regex = '/(\s|[^A-Za-z0-9-.])+/';
How to fix that d'Artagnan gets dArtaggnan or d-Artagnan and not datetime?
I tested the language specific characters + blanks. I had to make some changes to handle the blanks - to replace them with hyphens.
It has to be tested in J3 to find out if there are three different handlings of the filename in ONE TRANSACTION as in J 4.0. Then it should be cleaned up.
I've made some more testing (Joomla 4.0 MM upload file)
Western and to my surprise Eastern European Latin characters are translated by default using the en-GB language . Probably without a language parameter.
e.g.
Über größen länge .jpg => ueber-groessen-laenge.jpg (German)
d'Édimbourg a été admis à l'hô .jpg => d-edimbourg-a-ete-admis-a-l-ho.jpg (French)
Åländsk höna.jpg => alaendsk-hoena.jpg (Swedish)
výročí na příští.jpg => vyroci-na-pristi.jpg (Czech)
But sorry they are all using Latin Characters
Πωλούν ακίνητα .jpg => 2017-06-22_23-00-42_000000.jpg (Greek)
I'm not able to test with Greek installed as default language.
I'll try to exclude (replace with hyphens) some more characters with regex. I'll follow the recommendations from Michigan Technological University. Link in my "Normalize filenames" issue in the "New Media Manager" project.
You can see the edited method code in my gist -
file-makeSafe.php
I'll edit the regex when I've tested the changes I have in mind.
Shouldn't the length be checked and limited? 190 characters or less? I believe 190 is the limit if you later want to index the filenames in a mySql table. MTU's 30 is a little too minimized;)
We can close the idea of using the filenames from the OS (Unicode/UTF or whatever).
In version 1 of my Gist I had methods to save the filenames as is. They were ok in the PHP code but Joomla's file write method wrote the files to disk with unreadable characters if not in the range a-zA-Z.
If there are some Joomla hackers that want to invest time in such an issue he/she would be very welcome.
In version 1 of my Gist I had methods to save the filenames as is. They were ok in the PHP code but Joomla's file write method wrote the files to disk with unreadable characters if not in the range a-zA-Z.
That must have been in windows ? right ?
See my comment here
#16595 (comment)
About this PR, i do not know how to proceed,
can it be accepted to use
$lang_params = JComponentHelper::getParams('com_languages');
inside makeSafe() or is it strictly a "Not do it" ?
Yes local Wiindows10, XAMPP, PHP7
If it doesn't work in Windows it's a no go for me You can't allow OS dependent functionality in a CMS for known reasons.
I moved the language look up out of the method for the test. My hope is that the call to makeSafe can be directed to a CMS method and then via the use of namespace be redirected to the Joomla library. I see no problem to move the lines back if this is not possible.
You should though have a look at the gist code. I've made some editing to solve the issues I found. e.g. French texts did not work. Not allowed characters have to be removed before the transliteration loop, where those characters are checked and gives a *false" result. And this result is false!
I had a short look at proposed LINUX solution. It is NOT a solution. The changes are only in the library file and folder classes. i.e. Any checks before calling those methods in the library e.g. in makeSafe also have to react on the OS version! A complete mess!
I've updated my gist
Please check the regex. As I wrote - the filename transferred might not come from an OS. With AJAX and base64 encoded data anything is possible!
I added a length restriction. Has nothing to do with the file size restriction.
I think I have everything in my mind covered by now.
EDIT: And tested!
Maybe I should open a new issue to ask for some kind of input. As a user not related to the English world the supported filename conversion is a disaster! Since Joomla 1.5! As I understand you have your workaround and I have another one.
You should though have a look at the gist code. I've made some editing to solve the issues I found. e.g. French texts did not work. Not allowed characters have to be removed before the transliteration loop, where those characters are checked and gives a *false" result. And this result is false!
Well i have made it return false (meaning result of transliteration is not acceptable)
on purpose and forced the date('Y-m-d_H-i-s_u') as safe filename
Reason is this:
you suggest that:
$file = '\ / : * ? <> myfile| " # % & { } $ ! @ _.jpg';
should give the safe filename: myfile.jpg
but with your suggest if e.g. you have
$file = '\ / : * ? <> my| " # % & { } $ ! @ _.jpg';
then you will get safe filename my.jpg
and if you have
$file = '\ / : * ? <> | " # % & { } $ ! @ _.jpg';
then you will get safe filename .jpg
Thus the result may have very few or zero corresponding characters from the original filename
that is why i choose to force it to date even if 1 character is not transliterated
Again probably my choise is not optimal either,
e.g. it would be better if 90% of the LENGTH of original filename is preserved after
-- both (1) transliteration and (2) removal of remaining bad characters,
then accept it as safe filename ...
Well, as in the case of the French filename one (1), the single quote, character was the problem. m.jpg is a valid filename. The case without any character .jpg is catched at the bottom, before acceptance and gives a datetime filename . How will you count 90 % if -my.jpg or --m.jpg is used? m.jpg or my.jpg should be accepted as valid filenames. With your duplicate check it's still possible to store a lot of m.jpg ;)
I guess in the real world most cases will use very few forbidden characters but all of them might appear.
The check you do in the the loop should only give false if any "real" characters still remains. e.g if I (on a Latin system) use Greek or Bulgarian characters in my filenames.
My example was constructed around the invalid characters to test that my regex works. Not from the real world. Most of the characters are not allowed on the filesystem or are very rarely used in filenames. Personally I use the DB objects title or alias to generate the uploaded main image's name. This can be long and include some invalid characters. Those files are though still reusable.
Forgot
If the user sends
$file = 'My\ / : * ? <> very| " # big% & file{ } $ ! @ _.jpg';
He should get
my-very-big-file.jpg
OK to me! Not depending on how many invalid characters was removed.
And don't forget that Apple have just introduced a new file system with its own unique issues with character sets
BTW, here is a pure php solution (since php 5.4) for transliteration
echo transliterator_transliterate('Any-Latin; Latin-ASCII;', 'Ολοκλήρωση.png');
will give Oloklerose.png
in Persian
transliterator_transliterate('Any-Latin; Latin-ASCII;', 'چکیده.png');
will give
chkydh.png
in Chinese
transliterator_transliterate('Any-Latin; Latin-ASCII;', '完成.png');
wan cheng.png
we can also use
transliterator_transliterate('Any-Latin; Latin-ASCII; Lower()', 'filename.jpg');
to only get lowercase file names
and to be sure some characters are taken care of
iconv("UTF-8", "ASCII//TRANSLIT//IGNORE", transliterator_transliterate('Any-Latin; Latin-ASCII'; Lower()', etc.
As it is php 5.4+ then it cannot be used in j3 it will only be possible in j4
(this proposal came up before=
@infograf768
How do you upload files in French language to Joomla?
I get
ddimbourg-a-t-admis--lh-.jpg
for
d'Édimbourg a été admis à l'hô .jpg
Sorry for the filename I just picked some French words with some French specific characters.
To me that is a bug since J 1.5
Thanks for the link. With this transliteration we try to bypass the encoding problems.
If there are other better methods that can be introduced/backported to J3 I'm sure everyone is open to it.
As I understand there is a project that is targeting the cloud communication.
And I hope you still want to support Windows systems.
I was not considering this transliteration function for J3
d'Édimbourg a été admis à l'hô .jpg
I would never call a file like this.
I would never call a file like this.
Then we are two ;)
@brianteeman
Using it in j4 would be great as a default transliterate to replace the simple limited one I added in 1.6 (https://github.com/joomla/joomla-cms/blob/staging/libraries/joomla/language/transliterate.php) . TTs would still be able to override it if they wish, or just take of their custom method in their localise.php.
Are you telling us
"Leave it as is and wait for some future solution"?
Yep
That's your opinion. Yes it will be easier in j4 where we dont have the php 5.3 limitation but j3 is going to be supported for at least 2.5 more years it doesnt mean that we should give up trying to find a solution to this issue for j3
@ggppdk
I only wanted to test and help you get a solid solution. And that was the intention of my comment above. I can't really test with other languages than Latin ones. The interest in fixing things (old bugs) in J3 seems to be a little low. See some of the "funny" comments above where I didn't at all understand the first hmm with a useless link.
I'll still want to go on as I want to use it for my own transliterations anyway. I though understand if you want to close the issue. I hope not!
Until then I'll give feedback from my tests.
In my first tests with J3 I can confirm the results from the tests with J4. Everything ok with Latin characters. Are you sure about other languages character encodings?
I know that you had to use custom overrides for transliterations in J1.5 Are you sure that this is solved in J3?
One issue I found is that you can't use the datetime as is for none conform names. Any blank has to be replaced by hyphens or it will fail in the media helper media.php.
if (str_replace(' ', '', $file['name']) != $file['name'] || $file['name'] !== JFile::makeSafe($file['name']))
{
Heads up!
.
@ggppdk
The message given to the user should be changed
JLIB_MEDIA_ERROR_WARNFILENAME
IS:
File name must only contain alphanumeric characters and no spaces.
SHOULD depending on the solution be something like:
File name must only contain alphanumeric characters a-z, A-Z or 0-9 and no spaces.
@brianteeman
What is your feeling about a solution that only helps Latin languages. It solves the issue for a lot of users around the world but is not optimal.
I hope though that this proposal can fix more.
@ggppdk
Jippee
I have no real use of it but I installed the Greek language, set it to default language and uploaded a file with Greek characters. A file that completely failed without the Greek language installed.
A proof that it works as you say. I'm impressed.
Great!
@brianteeman
What is your feeling about a solution that only helps Latin languages.
Personally I would not be in favour of anything that makes joomla different depending on language
The other solution in J3 is to puny encode file names when they are not latin.
They could be saved puny encoded and displayed decoded (or not) in the Media Manager.
Example:
點看.jpg
would be saved as xn--c1yn36f.jpg
Ακρόπολη Αθηνών.jpg
would give xn-- -ylbbwcfkixaolm8mpb.jpg
This would evidently not solve the issue of adding manually a file with its utf8 name in the images folder.
Quoting
we can also use
transliterator_transliterate('Any-Latin; Latin-ASCII; Lower()', 'filename.jpg');
to only get lowercase file namesand to be sure some characters are taken care of
iconv("UTF-8", "ASCII//TRANSLIT//IGNORE", transliterator_transliterate('Any-Latin; Latin-ASCII'; Lower()', etc.
So is it bad if we use for PHP 5.4+ ?
transliterator_transliterate('Any-Latin; Latin-ASCII; Lower()', 'filename.jpg');
and if older version then fallback to current behaviour ?
Just tested on MAMP with php 5.4.4, and I get
Call to undefined function transliterator_transliterate()
In php 7.0.8 it works fine. Looks like there is an issue with php 5.4.x
[EDIT] Looks like PHP needs the intl
extension to be present to use transliterator.
nice,
i will close this PR in favour of them
Status | Needs Review | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2017-06-27 10:08:50 |
Closed_By | ⇒ | ggppdk |
@ralain
Yes i know, and in my own custom code i have extra code to make sure that a unique name is created by examining the path too, but we should not do it inside makeSafe,
best option is to add seperate method for it,
@wilsonge
about dependency i agree, any better ideas ? it would be nice to avoid it,
about no language, well it makes sense to try frontend language default
[EDIT]
@wilsonge
we could remove the dependency to the configuration of the language component and make it the responsibilty of the caller, ... to decide and pass an array of languages to try for transliteration
should i do this change ?? and remove the dependency ?