User tests: Successful: Unsuccessful:
This PR allows to upload UTF-8 filenames with media manager and use the makeSafe
method for UTF-8 file names:
Windows with PHP >= v7.1 - media manager uploads files with non-ascii names.
Windows with PHP < v7.1 - media manager prevent files with non-ascii names from being uploaded.
Other OS with any PHP version - media manager uploads files with non-ascii names.
Try to upload an image named in any other letters then ASCII, e.g. файл.jpg
Image is uploaded
Error (unsupported format)
Status | New | ⇒ | Pending |
Category | ⇒ | Libraries |
What you are doing with your PR is
which is not desirable, because it is not "safe" (please read my previous answer)
I think you problem with "Error (unsupported format)" is elsewhere
you are probably getting the error
because ALL filename characters are removed
and file name: файл.jpg
becomes just : .jpg
instead of returning utf-8 to achieve a non-empty filename
you can transliterate to achieve again a non-empty but this time safe filename
UTF-8 file names are not safe? Why is that? Please, refer to PHP manual, and you can see that the w
ecaped flag is the same as [a-zA-Z0-9_], and if you use the u
modifier, the regexp will match any letter in any language. Why it is not safe, can you explain me? And why should I transliterate my file names? Is that required to be safe? I want to use file names in my language, such as файл.jpg
. And with this PR such file names are uploaded correctly without transforming to "garbage": I upload файл.jpg
and see on the disk файл.jpg
.
Did you test in all filesystems and servers, and with all languages ? if you did then lets go ahead and accept this PR, me and a million others would like this change.
I have spent a lot of time dealing with this in the past, currently in our app we are transliterating, i am happy to hear it is no longer needed
I have tested this item
@ggppdk is right, on Windows (XAMPP on Windows 10) you get a file named "файл.jpg" and the media manager shows a broken image.
Windows uses utf-16
https://msdn.microsoft.com/en-us/library/windows/desktop/dd317752(v=vs.85).aspx
Most people use Linux for production servers, so do I. In Debian 8 it works perfectly, I even uploaded a file in Japanese. As for Windows, I think may be we should add a condition:
if (strtoupper(PHP_OS === 'LINUX'))
Because, I think, we should get the profit from Linux based platforms. If anybody is interested in, I can rewrite this PR so that only Linux based servers can recieve files in UTF-8.
Using utf-8 filenames in URLs
e.g. an image
instead it will depend if user / process of apache is using a UTF-8 locale (in linux, locale is per user)
indeed on many installations it is the default, but not in all
We really need to be careful about introducing OS specific handling of
files into the code. Also take into consideration that the site should
still work if moved to another platform, and from the sounds of it if you
upload files on Linux with UTF-8 support then restore a backup of it to a
Windows environment it would cause issues.
On Sat, Jun 10, 2017 at 2:58 PM Philip Sorokin notifications@github.com
wrote:
Most people use Linux for production servers, so do I. In Debian 8 it
works perfectly, I even uploaded the file in Japanese. As for Windows, I
think may be we should add a condition:if (strtoupper(substr(PHP_OS, 0, 3) === 'LINUX')
Because, I think, we should get the profit from Linux based platform. If
anybody is interested in, I can rewrite this PR so that only linux based
servers can recieve files in UTF-8.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#16595 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAWfoQ7ehd7SM64d66CJaBkuBO5VCHsTks5sCvVZgaJpZM4N1OIE
.
--
Once a file uploaded, it will stay "clear" regardless the platform. You can compress a directory using zip/tar/gz and copy via (S)FTP on your Windows server, then when you uncompress the archive, you will not see "broken" files. The problem can occur when using PHP based archivers, like Akeeba, which breaks all utf-8 files.
Users often upload files via FTP in UTF-8, and do not know that they can become broken. But when they try to upload these files with the media manager, they have error (unsupported format).
As Linux uses UTF-8, it will certainly work in Linux based servers. We have only a problem with Windows and with the transfer of files between the platforms.
This PR can be closed by maintainers, because I ran out of arguments.
Labels |
Added:
?
|
What would happen when a windows user uploads a utf-16 file to a Linux server using utf-8
@brianteeman
I have just uploaded a file файл.jpg
using Windows 10 on my Debian server -- it looks great!
I edited this PR to make it platform-dependent.
What would happen when a windows user uploads a utf-16 file to a Linux server using utf-8
when a <form> lacks accept-charset="..." attribute,
then the form is sent with the encoding of the document
and our HTML document defines to use utf-8 encoding
so the form is sent using utf-8
so the utf-16 filename (string) should be converted to utf-8 string
so the filesystem character-set (e.g. utf-16) of the client sending the form is not / should not be a problem
If we can get it to work on Windows as well, that would be interesting. Otherwise I don't feel confident with this.
Also, do we want to have it configurable like the SEF setting to allow unicode characters in URLs? After all those files usually end up as URLs in the page (and Google search results) as well.
In linux locale is configurable, it is not guaranteed to be utf8,
but i guess we can assume that great majority of linux servers will be using utf8
but i guess we can assume that great majority of linux servers will be using utf8
If we have learnt anything then it is that we can not assume anything about a server optional configuration
If we have learnt anything then it is that we can not assume anything about a server optional configuration
You are right and that why i said assume, but not rely on it
and that is why i am negative of this PR
we can have a solid solution with
[EDIT]
myself i am using the above 2 things to solve the problem of having UTF-8 filenames
For Joomla 4 this will be much easier as we can use the php transliterate function which we cant use in j3 as it requires > php 5.4
See #7841 (comment)
Transliteration is a step back. And locales can be configured by hosting providers. I do not think it is a problem. I even think, locales are not important. See, I have Russian locale and successfuly uploaded a filename in Japanese. In linux you can create any files with PHP, and if not, you simply do not upload files in non-ASCII encoding.
I even think, locales are not important. See, I have Russian locale and successfuly uploaded a filename in Japanese. In linux you can create any files with PHP, and if not, you simply do not upload files in non-ASCII encoding.
Yes you are partly right , regardless of the exact locale, it is enough to have any *.utf-8 locale
but if linux locale is not *.utf-8 it will not work, so your current check of OS is linux is not enough
Transliteration is a step back. And locales can be configured by hosting providers. I do not think it is a problem.
In order to accept something it should be supported everywhere in the Joomla supported platforms
or at least make a more complete solution with some feedback to the user that will upload files and see utf8 does not work,
also it is not so uncommon to have a website in linux a testing copy in windows !!
as @mbabker said:
We really need to be careful about introducing OS specific handling of files into the code. Also take into consideration that the site should still work if moved to another platform, and from the sounds of it if you upload files on Linux with UTF-8 support then restore a backup of it to a Windows environment it would cause issues.
it is enough to have any *.utf-8 locale
but if linux locale is not *.utf-8 it will not work
In Linux you can add as many locales, as you want. It is not just one locale. I have two: English and Russian, and if I need to, I will add a new one. This is really not a problem for most users. Files are safely uploaded via FTP in any language, but PHP is not working right with Windows. So, instead of dropping this useful feature anywhere, would it be smart to enable it when it is supported and most commonly used?
My two cents on this:
In the past I have been bitten many times with character encoding between different systems, different programs on the same system and even with documents that had non ascii filenames that were printed out as garbage or completely refused being printed.
Therefore I agree with @mbabker on his statement.
And that is the reason why I generally tell people to prefer using safe ASCII characters for filenames.
As for transliteration, I am no big fan of it. I would also prefer a nice solution with international characters.
Nevertheless, as this would be a nice feature to have, there might be a way to do this by providing the user with a choice of allowing UTF-8 (or better said international characters) in filenames, if the system allows it.
So, for example, if Joomla runs on a Linux system and UTF-8 is set as a locale on the system, it is allowed, and the user can choose to enable it.
If Joomla runs on WIndows (with UTF-16), again, it would be allowed, but in this case the filename would be converted from UTF-8 to UTF-16 before the file is saved.
A helper could also be provided to check the system and locale during the installation and prompt the user accordingly. Again, if the system would not allow it, the user could be informed, that international characters are not supported because of the underlying system settings.
And, of course, the user must be informed that if the user has the site running on multiple systems (as in a test and production system), then those must match the criteria in order for this to work.
I hope this makes sense. This is just an idea. Haven't looked at Joomla's internals on if that would be possible.
.... And how would that work if a user moves a site from a windows to a Linux server?
.... And how would that work if a user moves a site from a windows to a Linux server?
If Joomla creates
UTF-8 filenames in Linux / Unix and
UTF-16 filenames in Windows
Some archiving programs will need some parameters / extra configuration when extracting,
so that they are informed with the character-set that was used when archive was created
The above may depend on the archive format type and version, some formats may include this information
..more than enough reason then not to proceed with this pr.
On the archiving programs, 7zip and rar seem to support the encoding on utf-8 vs utf-16.
I went ahead and tested it with 7zip on a windows VM, and copied it over to linux. Worked fine. Also the other way around. Did not yet test with rar though. I tested filenames with Greek, German and French letters without any issues.
With or without this patch users will upload their files with FTP in their languages. Joomla! cannot prevent them from doing this. But the media manager will block non-ascii file names as invalid format
.
Moving the site from Windows to Linux is fine, and vice versa if you do not use PHP based archivers. Native archivers (zip/tar) do it well.
@philip-sorokin I remember reading that zip had encoding issues between Linux and WIndows. There are reports that tar does well, though.
I often use zip
in my linux server, and it compresses my files without problems, so that I can unpack them on Windows. May be it depends on the version, and old versions of zip
can cause issues.
@philip-sorokin and you then extract them on windows? (and the other way around?) If this is tested, yes maybe they were old versions, which had the issues.
and you then extract them on windows?
Yes, I can.
Which of course leaves the question, will existing backup/migration extensions be able to adapt to this? Or will users choosing this strategy loose the "one click" backup/restore solutions they know and love? Theoretically they could call 7z, rar, zip if any one is installed, but what if not (for example on a Windows system)?
will existing backup/migration extensions be able to adapt to this
I have not tested the according PHP extensions and do not know if they work right with filenames in UTF-8. But I think, it is the question of computer literacy. If your server does not have UTF-8 locale, perhaps, you do not need to upload UTF-8 files.
BTW... PHP allows to run native archivers via exec
, shell_exec
. These functions are considered unsafe, but enabled by default.
BTW... PHP allows to run native archivers via exec, shell_exec. These function are considered unsafe, but enabled by default.
Yes, that's what I meant with:
Theoretically they could call 7z, rar, zip if any one is installed, but what if not (for example on a Windows system)?
Test it with AkeebaBackup and you know if it would work
If you didn't read that far in Brian's link to msdn. A nice to know:
NTFS stores file names in Unicode. In contrast, the older FAT12, FAT16, and FAT32 file systems use the OEM character set.
i.e. NTFS = UTF-16 and FAT32 = ANSI can be used together on Windows system.
FYI with PHP 7.1 streams switched to the Unicode API on Windows. See http://php.net/manual/en/migration71.windows-support.php
Thanks.
@weltling
Thanks for the tip. I made a quick test on a XAMPP local system with PHP 7.1.10. out of the box.
I think valid URIs still have to be %-encoded so there will be a lot of rawurlencode() used when I change to allow all UTF-8 characters.
Do other OS (e.g. OS-X) have the same issue?
Is the PHP 7.0 requirement for Joomla 4.x written in stone?
Is the PHP 7.0 requirement for Joomla 4.x written in stone?
PHP 7 requirement is written in stone. There is no possibility of restoring PHP 5 support.
7.0 is the hard minimum. Anything higher is possible but it's going to have to be strongly justified.
I find the many and long discussions about filenames justify PHP 7.1 to resolve the Windows issue. Linux should be ok with it's default settings. But as I wrote: I do not know if other OS supported by Joomla have problems with the filesystem and UTF-8 character encoding.
The transliteration solution could be sidestepped in favour of allowing unicoded filenames. For sure this is the (near) future.
The URL encoding is not a PHP thing. It is something to be configured in a server. If a browser sends an URL in UTF-8, the server is in charge.
Regarding Apple, i couldn't tell any issue. I had to decease some tests, because Apple uses different UTF-8 normalization than Linux and has configurable case sensitivity for FS, but that doesn't affect the functionality in most. It's only about testing an exact byte sequence.
I can also mention, that many popular apps already catch with this 7.1 change, like what one can observe with Wordpress, Mediawiki, etc. Merely what is needed is a PHP version check, so transliteration were the way for PHP < 7.1.
Thanks.
The URI encoding is a Internet Standard thing. RFC 3986. It might still work with invalid URIs as well.
As I understand it, the Joomla deciders do not like PHP version tests or different behaviour dependent on PHP versions. That's why there is no transliteration in J 3.latest
As I understand it, the Joomla deciders do not like PHP version tests or different behaviour dependent on PHP versions.
It's not that we don't like to do PHP version tests. Rather you should be able to run a site consistently even when moving across platforms. So even if you build a site on a Windows PHP 7.1 set up and move to a Linux PHP 5.6 server, things should continue to just work and if we have APIs with completely differing behaviors based on the root operating system or PHP version then we can't keep things just working.
... the Joomla deciders do not like ...
I understand the arguments. That's why I ask about other supported OS. If there is a problem with UTF-8 there, we might have to live with a not OS dependent transliteration solution for ever (yet to implement). Or at least until Joomla 5.
Just a note on this - why wouldn't one export this part as a setting available to the user? 7.0 is about to run out of the active support. There are for sure users aware of possible impacts and willing to access new features.
If there were a clear and visible warning, that using something would be not backward compatible when using an older PHP version - for a setting being off by default perhaps it would be ok. Besides the case of saving files as is - the server configuration for UTF-8 encoded URLs is actually from the same category. Of course i couldn't tell how much effort would it be from the Joomla internal perspective, sure there'll be some. It's only the question whether it's going on the user vs. developer cost.
In general, 7.2 tells already some difference from 7.1 in other areas, and 7.3 might diverge even more. Also, taking in account external dependencies like ICU, where ICU 58 undergoes several BC breach which can make impact in any PHP version depending which ICU version was linked.
Thanks.
Very good news! It seems, the issue is not a problem anymore in PHP7.1, and you can safely upload filenames in utf8. I have tested it very successfully using WAMP server with PHP7.1 installed.
The PR has been changed in order to allow unicode file names on Windows-based enviroments starting from PHP7.1, as well as on any other OS.
Voters and supporters, please, test this patch.
@philip-sorokin this PR is only for Test >php7.1 and can be tested using Mamp?
3.8.2-rc
Multilanguage Site
macOS Sierra, 10.12.6
Firefox 56 (64-bit)
Tested here. Works fine on OSX MAMP php 7.1.6 or php 7.0.20. Firefox. No problem (which was expected). URLs percent-encoded OK.
Can't test on Windows here.
Windows with PHP >= v7.1 - media manager uploads files with non-ascii names.
Windows with PHP < v7.1 - media manager prevent files with non-ascii names from being uploaded.
Other OS with any PHP version - media manager uploads files with non-ascii names.
I have tested this item
PHP 7.1.2 Joomla 3.8.2
I have tested this item
Tested on Windows with PHP 7.1.6
Status | Pending | ⇒ | Ready to Commit |
Ready to Commit after two successful tests.
Thanks for tests.
Please remove RTC. We already discussed that we don't want Joomla to work differently depending on the phone version
@brianteeman Аt the moment it is impossible to upload image with Cyrillic name.
For example лого.jpg
This PR fixes this problem
But only on specific PHP versions and not all the supported versions. That's the problem
Its not the client phone version -- it is sevrer OS operating system and php version
@philip-sorokin maybe rename file before upload?
For example:
$name = OutputFilter::stringURLSafe($name);
@philip-sorokin yes exactly. Joomla would be working differently depending on the php version. For me that's not acceptable.
@philip-sorokin I rename Сyrillic files before try to upload =) To avoid problems in future.
Status | Ready to Commit | ⇒ | New |
removed Ready to Commit-Staus as stated above (Commit).
You are a webmaster. Other people upload files in Cyrillic, and it is not prohibited in FTP. Joomla cannot prevent the files from being uploaded. The problem was in past. In future there will not be a problem neither from PHP, nor from ZIP/TAR archivers.
Imagine someone uploads a file in Cyrillic. Then he tries to find it using a Joomla library for whatever reason. Joomla! fails because it cannot work with Cyrillic filenames.
This is a reason I do not use standard Joomla! media manager in my websites. But I would like to use it.
What if we add a new setting in the media manager options. This setting will be disabled by default, but users will be able to enable it.
@philip-sorokin yes, and with a clear warning, that if one goes down that route, there may or may not be compatibility issues when going from a higher PHP version to a lower one ( >=7.1 to <=7.1) or from Linux to Windows, etc...
@mbabker, @brianteeman Renaming each file where the file name is not ASCII is always a pain and extra time for the users.
IMHO this option - with a clear warning - should be in Joomla 3.x, since 3.x will still be supported for some time.
With Joomla 4 though, we should make sure that this is not an issue anymore, as Joomla 4 should start with PHP 7.1 anyways (I think I read that somewhere, but even if it was not decided yet, it should start with PHP 7.1 as 7.0 support is in maintenance mode as of 3. Dec 2017).
The Media manager team might have solved this there already, though, no idea.
PHP 7.1 is now unstable in the Debian packages list. Joomla! 4 is scheduled on the summer. It is not clear if PHP 7.1 will be stable by this time. Control panels now have unofficial packages to install, but unless PHP7.1 is stable, it should not be forced for users to switch on it.
But
Vast majority of users use Linux based OS in production. Some people use Windows-based servers like WAMP. These servers now go with PHP7.1 included. Thus, this is not a problem anymore.
@philip-sorokin I don't know why PHP 7.1 which is in its mid-life (of the normal lifespan, excluding the security only support), is still unstable in Debian (I guess you mean main package repos).
However, Ondřej Surý does provide PHP 7.2 packages for Debian and Ubuntu.
As for Fedora/RHEL/Centos, I remember Remi Collet providing fresh PHP repos in the past, and still does.
Those are two very dependable sources for PHP 7.2, and even if one does not like those 3rd party package providers for whatever reason, they still can compile PHP themselves.
So this should not be an issue for Joomla to consider, IMHO.
I would find it a major step back to require a minimum PHP version that is practically with one and a half foot on its way out.
If Joomla 4 is released in Summer 2018, PHP 7.1 will only have 4 months of minimal life support to look forward to. After that, it's gone... And 7.3/8.0 will be out at the end of the year anyways...
Besides that, there a re a few goodies in PHP 7.2, that would of course be nice to be used in Joomla 4.
I am closing this PR as it is not going to be merged.
Status | New | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2018-01-14 08:04:00 |
Closed_By | ⇒ | philip-sorokin |
Please correct me,
but windows uses utf-16 for filenames,
and Linux and others use different character sets, oftenly utf-8
So in windows,
your filenames will appear probably well inside PHP
but they will look like carbage in windows explorer (i remember confirming this myself too)
and furthermore because of utf-16 usage , it is possible to create invalid filenames too
About linux and similar as said above,
you do not have guaranteed of which character-set the filesystem is using
only safe way way is to transliterate the filename
please see my PR here:
#12049