No Code Attached Yet
avatar david-fores
david-fores
12 Mar 2025

Steps to reproduce the issue

Perform a clean installation of Joomla with after downloading the installer (e.g. Joomla_5.2.5-Stable-Full_Package.zip).

The problem occurs both when the database is created during the installation process and when the user and database are created beforehand.

Expected result

The database collation should be utf8mb4_unicode_ci.

Actual result

The database collation is utf8mb4_general_ci.

System information (as much as possible)

PHP Built On Windows NT SERVER-01 10.0 build 19045 (Windows 10) AMD64
Database Type mysql
Database Version 10.11.10-MariaDB
Database Collation utf8mb4_general_ci
Database Connection Collation utf8mb4_general_ci
Database Connection Encryption None
Database Server Supports Connection Encryption No
PHP Version 8.3.13
Web Server Apache/2.4.58 (Win64) OpenSSL/3.1.3 PHP/8.3.13
WebServer to PHP Interface apache2handler
Joomla! Version Joomla! 5.2.5 Stable [ Uthabiti ] 11-March-2025 16:00 GMT
Joomla Backward Compatibility Plugin Enabled (classes_aliases:"1", es5_assets:"1")
User Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36

Additional comments

As can be seen in the SQL scripts, all tables are created by default in Joomla with the collation:

COLLATE=utf8mb4_unicode_ci

The database, if created during the installation process, is also created with the utf8mb4_unicode_ci collation, as can be seen in function getCreateDatabaseQuery in:

\libraries\vendor\joomla\database\src\Mysqli\MysqliDriver.php

protected function getCreateDatabaseQuery($options, $utf)
{
    if ($utf) {
        $charset   = $this->utf8mb4 ? 'utf8mb4' : 'utf8';
        $collation = $charset . '_unicode_ci';

        return 'CREATE DATABASE ' . $this->quoteName($options->db_name) . ' CHARACTER SET `' . $charset . '` COLLATE `' . $collation . '`';
    }

    return 'CREATE DATABASE ' . $this->quoteName($options->db_name);
}

The problem is that after the database is created, the function alterDbCharacterSet is called. And this function, then calls 'getAlterDbCharacterSet'.

public function alterDbCharacterSet($dbName)
{
    if ($dbName === null) {
        throw new \RuntimeException('Database name must not be null.');
    }

    $this->setQuery($this->getAlterDbCharacterSet($dbName));

    return $this->execute();
}

This function modifies (or overwrites) the character set of the database, without specifying the collation, and in this case that the database had a collation utf8mb4_unicode_ci, after executing that query it is modified to utf8mb4_general_ci.

public function getAlterDbCharacterSet($dbName)
{
    $charset = $this->utf8mb4 ? 'utf8mb4' : 'utf8';

    return 'ALTER DATABASE ' . $this->quoteName($dbName) . ' CHARACTER SET `' . $charset . '`';
}
avatar david-fores david-fores - open - 12 Mar 2025
avatar david-fores david-fores - change - 12 Mar 2025
Labels Removed: ?
avatar joomla-cms-bot joomla-cms-bot - change - 12 Mar 2025
Labels Added: No Code Attached Yet
avatar joomla-cms-bot joomla-cms-bot - labeled - 12 Mar 2025
avatar Hardik301002
Hardik301002 - comment - 12 Mar 2025

Hi @david-fores ,

The issue of database collation being overwritten during installation can cause inconsistencies, especially for multilingual support. A simple fix is to update the SQL schema in installation/sql/mysql to use utf8mb4_unicode_ci instead of utf8mb4_general_ci.

For existing installations, running this SQL command will update the collation:

ALTER DATABASE your_joomla_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
avatar richard67
richard67 - comment - 12 Mar 2025

The database character set and collation are just a default values which are used when a CREATE TABLE statement does not specify character set and collation.

Which character set and collation is then used for particular tables and columns is the relevant thing, not the database collation or database charset.

All Joomla CMS core tables are all created with character set utf8mb4 and unicode collation. Some alias columns in particular tables have binary collations.

Please check and report back if this is the case in your installation.

I assume that's the case, and if so, we will close this issue here as expected behaviour.

Joomla does not touch the database character set and collation values during installation.

You chose these values when creating a database and can change them e.g. in phpMyAdmin.

If you install a 3rd party extension which does not specify character set and collation in the CREATE TABLE statements in its installation SQL, then the table(s) of that extension might have different character set and collations than the Joomla core tables.

There are tools like Akeeba Admin Tools which allow to change character set and collation of the tables and table columns.

avatar richard67
richard67 - comment - 12 Mar 2025

Hmm I just see I was partly wrong and the core touches database character set and collation on installation. But my other statements still are right, they are just defaults. Nevertheless the issue should be fixed so it is utf8mb4_unicode_ci at the end and 3rd party extensions' tbales are created right if they don't specify character set and collation.

Thanks for reporting.

avatar richard67
richard67 - comment - 12 Mar 2025

@david-fores Could you report the issue here https://github.com/joomla-framework/database/issues ? Thanks in advance.

Sorry, no, issue is that the CMS calls alterDbCharacterSet. So for now it's ok here.

avatar Hardik301002
Hardik301002 - comment - 12 Mar 2025

Thanks for the clarification! I appreciate the detailed explanation. I agree that ensuring utf8mb4_unicode_ci is set correctly at the end of installation would help maintain consistency, especially for third-party extensions that don’t specify collation.

Let me know if there’s anything else I can do to help with this fix or if you need further testing. Thanks again for your time and support!

avatar Hardik301002
Hardik301002 - comment - 12 Mar 2025

@richard67
Thanks for the clarification. I understand now that Joomla CMS modifies the database character set using alterDbCharacterSet(), and this is expected behavior.

I appreciate your time in explaining this. I'll keep this in mind when working with third-party extensions. Thanks again!

Best,
Hardik

avatar richard67
richard67 - comment - 12 Mar 2025

I understand now that Joomla CMS modifies the database character set using alterDbCharacterSet(), and this is expected behavior.

@Hardik301002 Right now it is expected behaviour, but it might cause problems and maybe should be changed, so I leave this issue here open and will discuss with other maintainers how to proceed.

@david-fores Thanks for your detailed report. I should have read it completely before starting to comment.

avatar Hardik301002
Hardik301002 - comment - 12 Mar 2025

I understand that this is expected behavior for now but might need adjustments to prevent potential issues. I appreciate you bringing it up for discussion with the other maintainers.
If any testing or further input is needed, I’d be happy to help. Looking forward to the outcome of the discussion.

avatar david-fores
david-fores - comment - 12 Mar 2025

Hi @richard67,

Maybe I could have explained a little better that the problem is not in the Joomla tables themselves, since as you said, the collation is specified in the table creation scripts.

The problem is in the modification after the database is created, which makes losing the consistency between the default database collation and the Joomla core tables, in addition to the inconsistency that could be with third party extensions if they do not specify the collation in their SQL scripts.

Thanks for your time making Joomla better.

avatar richard67
richard67 - comment - 22 Mar 2025

Please text joomla-framework/database#331 .

I suggest to leave this issue open until that change has been merged and released in the framework and then a pull request is created in this repository here to fetch that change.

avatar richard67 richard67 - close - 2 Apr 2025
avatar richard67
richard67 - comment - 2 Apr 2025

Closing as having a pull request. Please test #45273 . Thanks in advance.

As 4.4 is in security fix only mode, this issue won't be fixed for Joomla 4.

avatar richard67 richard67 - change - 2 Apr 2025
Status New Closed
Closed_Date 0000-00-00 00:00:00 2025-04-02 20:31:12
Closed_By richard67

Add a Comment

Login with GitHub to post a comment