J3 Issue ?
avatar ot2sen
ot2sen
15 Aug 2016

A user asked how they could sort a category list correctly, when using Danish character set that has æ+ø+å as the last 3 letters extending the English a-z alphabet.

They had article with title starting with 'Ø' to show before 'P', as were it an 'O'. In Danish 'Ø' is the second last letter of the alphabet.

I am uncertain whether this is to be handled in the localise.php of the individual language packs, or if it is an i18n issue in j 3.6.2.

Could reproduce the issue using any of the Nordic languages, and could see this also was an issue when sorting search results alphabetically.

Steps to reproduce the issue

  1. Set Danish (or norwegian or swedish) as default language.
  2. Create a series of articles in their own category with starting letters having a-z and also some having æ, ø and å.
  3. Create a menu item of the type 'Category list' with setting 'Title sort alphabetically' for articles.

Expected result

Articles were expected to be listed in the alpha order of the chosen language

Actual result

For the tested languages, special letters æ and å listed after a, and ø listed after n, as are they considered to be a and o.

Also the search results show same pattern when selecting alphabetical ordering of results.

System information (as much as possible)

Joomla 3.6.2, non-English language packs with extra letters beyond a-z

Additional comments

avatar ot2sen ot2sen - open - 15 Aug 2016
avatar brianteeman
brianteeman - comment - 15 Aug 2016

For clarification is this something new in 3.6.2 or has it always been this way?

avatar ot2sen
ot2sen - comment - 15 Aug 2016

I don´t know. Never had the need to create an a-z category list of article, nor to filter search results by a-z, so haven´t noticed it before.

The attachment for entry post didn´t get attached, so here is an active link
http://j35.codeqa.eu/sprogtest
Should list 1,2,3,5,6,7 to follow the ordering of Nordic alphabets, but as you can see it present the extra characters as were they a´s and o´s.

avatar brianteeman
brianteeman - comment - 15 Aug 2016

It would be useful to know if this existed before 3.6.2 so we can try to
track it down.

For anyone looking the correct order will be

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Æ Ø Å

I am going to guess that the problem is with the nb-NO.localise.php -
specifically this

//Specific language transliteration.
//This one is for latin 1, latin supplement , extended A, Cyrillic, Greek

$glyph_array = array(
'a' => 'à,á,â,ã,ä,å,ā,ă,ą,ḁ,α,ά',
'ae' => 'æ',

avatar ot2sen
ot2sen - comment - 15 Aug 2016

Tested with 3.5.1 and it is the same there. Using Danish da-DK for test.
15-08-2016_sprogtest351

avatar brianteeman
brianteeman - comment - 15 Aug 2016

I can confirm the issue and I was wrong about it being related to localise.php

avatar andrepereiradasilva
andrepereiradasilva - comment - 15 Aug 2016

does this also happens in the backend?

avatar brianteeman
brianteeman - comment - 15 Aug 2016

yes

avatar ot2sen
ot2sen - comment - 15 Aug 2016

Tested with 3.4.3 and there is a change. Using that with da-DK again, only the 'Å' is listed wrongly next after 'A' instead of at the end.
Letters 'Æ' and 'Ø' listed at the end as they should.

15-08-2016_sprogtest343

avatar ot2sen
ot2sen - comment - 15 Aug 2016

3.4.1 same as 3.4.3

avatar ot2sen
ot2sen - comment - 15 Aug 2016

3.3.6 still same as 3.4.1 and 3.4.3. 'Å' is wrong, but 'Æ' and 'Ø' is correct at end.
15-08-2016_sprogtest336

avatar brianteeman
brianteeman - comment - 15 Aug 2016

I am going to make another guess which is that it is related to the change
to utf8mb4 in the database - perhaps you need a special collation for
scandinavian languages?

On 15 August 2016 at 10:00, Ole Bang Ottosen notifications@github.com
wrote:

3.3.6 still same as 3.4.1 and 3.4.3. 'Å' is wrong, but 'Æ' and 'Ø' is
correct at end.
[image: 15-08-2016_sprogtest336]
https://cloud.githubusercontent.com/assets/790445/17660107/71bc4f38-62d7-11e6-9589-cdd7f9ea8e20.png


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#11609 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABPH8Tr9yfW0DU9oXfKO1lD2pYhb-ZUNks5qgCqkgaJpZM4JkJXb
.

Brian Teeman
Co-founder Joomla! and OpenSourceMatters Inc.
http://brian.teeman.net/

avatar ot2sen
ot2sen - comment - 15 Aug 2016

The last 3 test where only 'Å' is an issue all using utf8_general_ci, which seems to work for 'æ' and 'ø' to list in correct order

avatar ot2sen
ot2sen - comment - 15 Aug 2016

Test with a clean 2.5.28 with da-DK
Same pattern as 3.3-3.4.3 where 'Å' is listed wrongly, and 'Æ' and 'Ø' correctly at the end.

avatar ot2sen
ot2sen - comment - 15 Aug 2016

15-08-2016_sprogtest2528

avatar brianteeman brianteeman - change - 15 Aug 2016
Status New Confirmed
avatar yild
yild - comment - 15 Aug 2016

Order should be correct if you create db with utf8_danish_ci collation.

avatar ot2sen
ot2sen - comment - 15 Aug 2016

Idk, even if you create a database with charset utf8mb4_danish_ci, the tables that are created during installation will get the DEFAULT CHARSET=utf8mb4 DEFAULT COLLATE=utf8mb4_unicode_ci

avatar ot2sen
ot2sen - comment - 16 Aug 2016

Given this some thoughts and feel concerned if the output on the application level has to rely on preset format/charset and collation on creation of the database.
Many users will not be able to change how the database is created.
Furthermore it would be a lock it to a specific comparison and sorting order, lets say Danish for example as in this case. This could create trouble when adding a second, third, etc. language to your site and these are having a different collation and sort order.

With the standard utf8mb4_unicode_ci charset and collation set on installation we are good for storing all the langs supported, and perhaps should ensure the support of output sorting is handled in the application only.

Would adding a COLLATE to the alpha ordering query, taking installed languages into account, be a solution to that? Meaning detect languages installed or set default as site language, and dynamically add for example COLLATE utf8mb4_danish_ci to the query.

avatar brianteeman
brianteeman - comment - 16 Aug 2016

Thinking overnight I doubt the dB is a real solution as it wouldn't help in
a multilingual site.

avatar ralain
ralain - comment - 16 Aug 2016

It is theoretically possible to override the default collation directly in the SQL statement:

SELECT title FROM table ORDER BY title COLLATE utfmb4_danish_ci;

avatar chrisdavenport
chrisdavenport - comment - 16 Aug 2016

This can't be fixed by sorting in PHP. It has to be fixed by changing the database collation on a per-query basis as @ralain has suggested. That means adding database collation rules to the language packs in a way that will work across all database types.

Maybe something like a collate($dbType) method in localise.php, which can be called from the database query methods?

avatar brianteeman
brianteeman - comment - 16 Aug 2016

@chrisdavenport but then it still wont work on a multilingual site as all
content is in a single table

On 16 August 2016 at 09:44, Chris Davenport notifications@github.com
wrote:

This can't be fixed by sorting in PHP. It has to be fixed by changing the
database collation on a per-query basis as @ralain
https://github.com/ralain has suggested. That means adding database
collation rules to the language packs in a way that will work across all
database types.

Maybe something like a collate($dbType) method in localise.php, which can
be called from the database query methods?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#11609 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABPH8cap-kidJeXJyV8mC_IhCliVFnTOks5qgXhtgaJpZM4JkJXb
.

Brian Teeman
Co-founder Joomla! and OpenSourceMatters Inc.
http://brian.teeman.net/

avatar ot2sen
ot2sen - comment - 16 Aug 2016

@brianteeman @chrisdavenport exactly my concern above for sites having more than 1 language.

Would adding a COLLATE to the alpha ordering query, taking installed languages into account, be a solution to that? Meaning detect languages installed or set default as site language, and dynamically add for example COLLATE utf8mb4_danish_ci to the query. >

If XYZ is default site language and alpha set as parameter, then dynamically use utf8mb4_XYZ_ci for the COLLATE in the alpha query, and take into account if more languages installed.

avatar chrisdavenport
chrisdavenport - comment - 16 Aug 2016

@brianteeman Yes, all content is in a single table and it has a default collation defined when the table was created, but it can be overridden in the SELECT statements to get ordering by whatever collation you require for that particular query.

The tricky part is doing this across database types. In MySQL the collation might be utf8mb4_danish_ci, but in Postgres it's probably something completely different. That means adding the correct collation codes for all supported databases into localise.php.

Or adding a method into the database code that will return the correct collation when you give it an ISO language code. Would there ever be a case where a multilingual site would not use the collation defined for that language by the database vendor?

What if the collation selected in localise.php or in the database code is not installed?

avatar schnuti
schnuti - comment - 16 Aug 2016

This problem is as old as Joomla, i.e. not caused by a new version.
On single language sites the user can change the collation, if he has the skill. It's hard work, since you have to change every single alphafield you want to have correct sorted.
As far as I know, is the collation language only for automatic alpha sorting and is not influencing anything else.

On multilanguage sites the users probably keep the "international" sorting.
In some cases I've used hardcoded sql query like this for alpha sorting on title (mySql). I added the dependency, as collate might be a bit slower.


$orderCol = $orderCol != 'r.title' ? $orderCol : $orderCol.' COLLATE utf8mb4_xyz_ci';

I see the problem even larger in selection lists then in item (article) sorting. e.g. lists of user created tags, user groups ...

It would be nice, if you find a way to add the collation to the sorting depending on language and db type.

avatar baijianpeng
baijianpeng - comment - 24 Sep 2016

When sorting article titles in Chinese language, this issue also happens.

It seems that Joomla core can not correctly sort Chinese characters in UTF-8 encoding.

Hope someone will improve this sorting/order issue to solve it completely for all languages.

avatar rvbgnu
rvbgnu - comment - 3 Jun 2017

Hi @ot2sen Is there any progress on this issue? Or should it need to be pushed to a "team"?

avatar franz-wohlkoenig franz-wohlkoenig - change - 8 Nov 2017
The description was changed
Status Confirmed Discussion
avatar joomla-cms-bot joomla-cms-bot - edited - 8 Nov 2017
avatar brianteeman brianteeman - labeled - 25 Mar 2018
avatar brianteeman
brianteeman - comment - 10 Apr 2018

I dont think it can ever be resolved to satisfy all the use cases bur perhaps @mbabker can comment on this and then we can either close or move forward

avatar mbabker
mbabker - comment - 10 Apr 2018

If the sorting is happening at the database level it's out of our hands unless we want to do sorting in PHP as well. And even then there are no guarantees that the sort will result in a locale correct listing.

avatar brianteeman brianteeman - change - 24 Jul 2018
Status Discussion Closed
Closed_Date 0000-00-00 00:00:00 2018-07-24 15:10:25
Closed_By brianteeman
Labels Added: J3 Issue
avatar brianteeman
brianteeman - comment - 24 Jul 2018

Sorry but I am going to have to fix this as something that cannot be fixed

avatar brianteeman brianteeman - close - 24 Jul 2018

Add a Comment

Login with GitHub to post a comment