Hi guys,
PHP 7.4.12 + MariaDB 10.3.24 + Joomla 3.9.22 + Admin Tools Pro 5.9.0:
Why I have back Joomla > System Information > Database Connection Collation: utf8mb4_general_ci ? > image
PS - I cleared all my caches and I ask to Server Provider (A2hosting) to clear also the Server Cache, but the result is the same
It should be utf8mb4_unicode_ci, Am I right ?
Please, Am I missing anything ?
Labels |
Added:
?
|
Hi richard67,
thanks for reply and for the useful info. Anyway:
So, this is why, in first of all I am asking: Am I missing anything ?
Or it should be better to investigate it in Joomla...
PS - I tested it on a new Joomla 4 beta 5 new installation and I have back the same issue...
The connection with phpMyAdmin is not the same as the one with Joomla. No idea right now why they could be different, but the SHOW VARIABLES LIKE "collation_connection"
always shows the value for the current connection. SO you are comparing 2 different connections, not the same one.
Will see if I can find the time to investigate if there really does something wrong, but if someone is faster it would be good.
Hi richard67,
"...the connection with phpMyAdmin is not the same as the one with Joomla..." = Is there anything else I can do to test it to exclude doubts ?
I have to check, but I won’t have time before weekend. Maybe someone else has an idea and is faster.
@joomleb Maybe an additional explanation: As you can see here, the "collation_connection" variable has scope "Global, Session".
https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_collation_connection
I.e. if you do a SHOW VARIABLES LIKE "collation_connection"
in your phpMyAdmin session, it will show you the value in that session, for which the global, server side-defined value is a default, but as you could see in the link I've provided in one of my previous posts, it is negotiated between the client and the server which character set and collation finally is used for a particular session. The system information in backend is a different connection with a different client than the phpMyAdmin connection, so they might at the end use different connection collations for the session. In backend we show the session value and not the global value, same as the SHOW VARIABLES LIKE "collation_connection"
in your phpMyAdmin session.
Does that at least explain to you why you can get different results for both, system info in backend and phpMyAdmin?
E.g. here on my local Linux virtual machines I get "utf8mb4_0900_ai_ci" in backend in System Information and "utf8mb4_unicode_ci" as connection collation in phpMyAdmin with Joomla 3 and 4.
It strongly depends on operating system settings and what the particular database client in its particular version does to determine the character set and collation it wants to use.
E.g. Joomla can use either the MySQLi or the MySQL PDO client, and on Joomla 3 also the native MySQL client, depending on your configuration. PhpMyAdmin might use a different client and apply own methods to determine the character set and collation it wants to use.
Another reason for differences could be that Joomla directly connects to the database which is used for Joomla, while when connecting with phpMyAdmin as root user, you normally connect not to any of the Joomla databases (you can have more than one) but to the phpmyadmin database, if available, and this might have a different database collation than the Joomla database, and so for that connection a different connection character set and collation might be negotiated between the client and the server.
So up to now I see no problem related to Joomla here, that's just the normal way how things work with MySQL or MariaDB.
Hi richard67,
really many thanks for all details and explications. I red all.
So, being that "...we show the session value and not the global value..." the first thing would be to exclude the session value doubt.
(Sorry, I'm really not an expert) As far as I understand, "session value", it is anything that can be set on Server, where:
If so, I think this is the first doubt step to exclude, "the operating system setting" (linux).
Please, Do you have any suggestions here ? Can you address me on the right way on How to check it and How can I set it ?
* B - "...Each client can autodetect which character set to use based on the operating system setting, such as the value of the LANG or LC_ALL locale environment variable on Unix systems or the code page setting on Windows systems..." = Does it mean the Server operating system ?
No, it means the client OS. If your webserver and your database server are on the same host, database client (= webserver) and database server are the same, but if like e.g. on shared hosting the database server is another server, the webserver is the database client.
Hi richard67,
Many Thanks! I'm going to investigate it and I'll report here soon...
@joomleb Anything new? I still don't see a bug here. In Joomla backend the actual session's connection collation is shown, which is not the same as the default setting for that on the server, and as long as you don't have issues with sorting or unique keys I don't see a problem with that connection collation.
I think we should move this from issues to discussions.
Hi @richard67
sorry for my later due to a really precarious Covid-19 situation. So, we made it clear that:
A - "...In the absence of other information, each client uses the compiled-in default character set, ..." = Server / DB / Tables / Columns are utf8mb4_unicode_ci
B - "...Each client can autodetect which character set to use based on the operating system setting, such as the value of the LANG or LC_ALL locale environment variable on Unix systems or the code page setting on Windows systems..."
I investigated it with my Server provider and yes:
C - "...I have checked the server and I could find the Server difference in global and session values of this variable. Since you are on a shared server, it will not be possible to make such changes for one website..." = I asked them more information thinking to report here to do some local tests, but they never answered to me about it...
D - "...As checked, the collation on your Database in our server is utf8mb4_unicode_ci, however, we are not sure why Joomla is still detecting utf8mb4_general_ci...."
As far as I see the issue is not causing any errors while using this collation, till today. Anyway, being that a lot of Joomla users are on shared Servers, would be good:
1 - to find a way to read the value Joomla > System Information > Database Connection Collation: ...
on a Database level and not on a Server setting level
2 - Or, at least, to add an Info tip to that field to explain well what it is reading, so users don't get confused like I did
I see no need for that. Can we close this issue?
Labels |
Added:
Information Required
|
@richard67
no, I'm not agree. At least into the fiend title > documentation it should be explained in detail so as not to incur other reports.
As indicated today it is quite promiscuous, here an image
and would be a really simple "fix"
I also do not agree... I am running into strange errors 'Save failed with the following error: Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_unicode_ci,IMPLICIT) for operation '='
and they only occur on joomla instances where the system reports a mismatch as noted above. I have a different joomla instance where the two reported collations MATCH, and it doesn't throw these errors. Co-incidence? I don't know...
It would be EXCELLENT to know where "in joomla" this 'connection' collation is set, and how to change it...
@rotech1 The illegal mix of collations has nothing to do with the connection collation. It comes from 3rd party extensions who do joins or unions over many tables, also from old other 3rd party extensions from times before utf8mb4 was established in Joomla, and so they fail with that error when trying to join different tables. One example for that is a module named "word cloud".
Another thing which can cause this is people manipulating their character sets and collations with certain tools and then installing extensions afterwards, and those then use the character set and collation which would have been used before this manipulation.
When using a not too old Joomla 3 without any outdated extension, that error doesn't happen. In 100% of all cases which I have obvserved, and they are quite a number, the illegal mix of collation was caused by what I said above.
So please stop blaming the CMS core when using outdated or erroneous extensions or manipulating character sets and collations with tools without any real reason.
The worst example I ever saw was a site with more than 200 extensions, of which maybe 10 were really used, but all installed just because they were cheaper in a bundle, and some 50 of them outdated and not ready for utf8mb4.
Thanks for the info! It's really odd as I have checked the databases extensively but cannot find any tables that are set to utf8mb4_general_ci ( all are utf8mb4_unicode_ci) - and checking INSIDE the tables, the only one I can see that has 'wrong' column collations for some fields is xxx_finder_terms ( it has table collation of utf8mb4_unicode_ci, but 4 fields within have collations of utf8mb4_general_ci). I did an export of that table and there is no mention of utf8mb4_general_ci in its SQL. But when I dropped and then imported it, those four fields are still utf8mb4_general_ci! ( very strange, dunno where its getting that from - the database collation is utf8mb4_unicode_ci...)
`/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT /;
/!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS /;
/!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION /;
/!40101 SET NAMES utf8mb4 */;
xxx
xxx_finder_terms
CREATE TABLE xxx_finder_terms
(
term_id
int(10) UNSIGNED NOT NULL,
term
varchar(75) CHARACTER SET utf8mb4 NOT NULL,
stem
varchar(75) CHARACTER SET utf8mb4 NOT NULL,
common
tinyint(1) UNSIGNED NOT NULL DEFAULT '0',
phrase
tinyint(1) UNSIGNED NOT NULL DEFAULT '0',
weight
float UNSIGNED NOT NULL DEFAULT '0',
soundex
varchar(75) CHARACTER SET utf8mb4 NOT NULL,
links
int(10) NOT NULL DEFAULT '0',
language
char(3) CHARACTER SET utf8mb4 NOT NULL DEFAULT ''
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;`
@rotech1 In Joomla 3, some of the finder tables have general collation by purpose, because that can be better combined with others without causing that illegal mix of collations. Which character sets and collations are used with the CMS core you can find out by downloading a full installation package, unzipping it and then checking the joomla.sql for MySQl below the installation folder. If you say that all finder tables have utf8mb4_unicode_ci default character set and collation, then they have been very likely manipulated by some tool. What you can do it to check the differences to the joomla.sql and restore it like it should be.
Or are you talking about Joomla 4? In this case all should be utf8mb4_unicode_ci, except some of the columns with name "alias" in particular tables, they should be "utf8mb4_bin". If this is also not the case, then again something has been manipulated.
Another example which I once had seen and which causes illegal mix of collations was an extension not specifying the character set and collation in the create table statements in its installation SQL script, and so the database default was used. It was an old MySQL database where that was still Swedish "latin1_swedish_ci", and so the tables were created with this character set and collation, which leads to illegal mix of collation in almost all cases when doing a UNION or JOIN with a table which has utf8 or utf8mb4 character set.
The utf8mb4_general_ci is a more "friendly" character set and collation which can be combined with most if not all other utf8mb4 and most utf8 collations. That's why J3 used it for the finder tables for b/c with 3rd party extensions (finder plugins). Having that as connection collation does not cause illegal mix of character set errors.
Doing JOIN or UNIONS between tables with not matching character sets cause that error.
Much appreciate the information, Thanks!! This is Joomla! 3.9.26
And yes, I had used AdminTools to change all the 'general_ci' to 'unicode_ci' ( on advice of a extension developer who insists that unicode_ci is the collation of choice now and for the future...). Admin tools actually didn't change a couple of tables ( stuff like community builder tables that were not mb4 to begin with). For now I had to roll back my collation changes - putting the (former) general_ci ones back to general_ci and the illegal mix errors went away!
At any rate, I am not going to mess with it in the future - hopefully the (eventual) upgrade to Joomla 4 will take care of any discrepancies
@rotech1 Glad you could fix it. I hope I did not sound unfriendly with my above comments, but if so, then sorry, it wasn't meant to be. I'm not a native English speaker and sometimes type faster than I think and so it can be it sounds harsh where it shouldn't.
Admin tools are fine, just you have to know what you are doing, and messing with character sets and collations is something which has to be handled with care.
Status | New | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2021-05-04 18:43:51 |
Closed_By | ⇒ | alikon |
i think this can be closed
reopen if you disagree
Maybe read https://dev.mysql.com/doc/refman/8.0/en/charset-connection.html .
I also have connection collation utf8mb4_general_ci and all database tables are utf8mb4_unicode_ci, and I never had issues with that and with unicode characters.