Joomla! Issue Tracker | Joomla! CMS #25845 - Support non-english chars in JS

Fixed in Code Base
8 Apr 2020
Medium
Build: staging
# 25845
Diff
Hackwar:patch-19

Pending

Pending Hound Hound is busy sniffing around... Details

User tests: Successful: Unsuccessful:

Hackwar
14 Aug 2019

The previous code doesn't really support non-english characters and for example fails on german umlauts. This code is used in com_finder, so when you are searching for something like "Europäer", you wont get any highlighting for the searched term in the search results.

How to test

Enable Smart Search content plugin
Edit/create an article with the word "Europäer"
Search for that word in the frontend
See that the word "Europäer" is not highlighted in the search results
Apply patch
Search again and now see it being highlighted.

ce45420 3 Jun 2017

Removing ContactModelContact::getContactQuery()

3a7ecc6 4 Jun 2017

Merge branch 'staging' into patch-19

3ea6a47 14 Aug 2019

Support non-english chars in JS

Hackwar - open - 14 Aug 2019

Hackwar - change - 14 Aug 2019

Status

New

⇒

Pending

joomla-cms-bot - change - 14 Aug 2019

Category

⇒

Libraries

brianteeman - comment - 14 Aug 2019

This is something we really should have a test for. Specifically to ensure it not only supports accented characters in latin characters sets but also in non latin character sets and multibyte character sets.

infograf768 - test_item - 15 Aug 2019 - Tested successfully

infograf768 - comment - 15 Aug 2019

Although it does not always work (see below), it is a great improvement.
I have tested this item ✅ successfully on 3ea6a47

_{This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/25845.}

infograf768 - comment - 15 Aug 2019

I have tested this item ✅ successfully on 3ea6a47

_{This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/25845.}

infograf768 - comment - 15 Aug 2019

Example with Chinese

infograf768 - comment - 15 Aug 2019

Note: It does not work for single 4 bytes characters.
Example: ?

Edit:
"They are in the Supplementary Plane and will test support for code points above U+FFFF"
See: http://www.i18nguy.com/unicode/supplementary-test.html

infograf768 - comment - 15 Aug 2019

Dumping some of these 4 bytes characters is correct though

 Assuming ? is required, the following results were found.
string(7) "\u20e9d"

wilsonge - comment - 15 Aug 2019

Agree with @brianteeman this is an easy function to test as it's static and no dependencies.

richard67 - test_item - 15 Aug 2019 - Tested successfully

richard67 - comment - 15 Aug 2019

I have tested this item ✅ successfully on 3ea6a47

Tested with "Überlingen", "Europäer", "Музыка", "Curaçao" and "香肠"

_{This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/25845.}

richard67 - comment - 15 Aug 2019

@infograf768 Just I noticed your posts above. Does it mean your test was not good? Should that work, 4-byte UTF-8? I know we did much for MySQL to support utf8mb4, and PHP should be ok, but maybe we have forgotten something in js?

_{This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/25845.}

richard67 - test_item - 15 Aug 2019 - Not tested

richard67 - comment - 15 Aug 2019

I have not tested this item.

_{This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/25845.}

richard67 - test_item - 15 Aug 2019 - Tested successfully

richard67 - comment - 15 Aug 2019

I have tested this item ✅ successfully on 3ea6a47

Agree with JM: A big improvement compared to now.

_{This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/25845.}

Fedik - comment - 15 Aug 2019

hm, can just be?

return json_encode($string);

infograf768 - comment - 16 Aug 2019

@Fedik
with json_encode($new_str) we lose highlighting.

@richard67
In my posts above I was just wondering why it does not work fully with 4 bytes glyphs when isolated.

FYI, this code is already present in 4.0 and we get the same results as here.

richard67 - comment - 23 Aug 2019

@Hackwar Is this code also used by the media manager? If so: Could it be that it solves issue #25997 ?
Edit: No, it seems to be used by the highlight behavior. I don't think that is used by media manager. But it seems media manager has a problem similar to the one handled in this PR here.

brianteeman - comment - 23 Aug 2019

Richard. The media issue is completely different and a large part of that is the filesystem

1c1c25a 21 Oct 2019

Merge branch 'staging' into patch-19

Hackwar - change - 21 Oct 2019

Labels

Added: ?

Hackwar - comment - 21 Oct 2019

I'll put writing tests for this on my todo list, but can we still merge this for 4.0, @wilsonge?

Quy - change - 27 Nov 2019

Status

Pending

⇒

Ready to Commit

Quy - comment - 27 Nov 2019

RTC

_{This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/25845.}

5728233 27 Nov 2019

Merge branch 'staging' into patch-19

Quy - change - 27 Nov 2019

Labels

Added: ?

be4aaf2 3 Dec 2019

Merge branch 'staging' into patch-19

HLeithner - comment - 5 Dec 2019

@infograf768 does this support mb4?

@Hackwar you wrote this pr against J3, do you had time to write some tests for this?

infograf768 - comment - 6 Dec 2019

@infograf768 does this support mb4?

yes, when we do not search an isolated utf8mb4 glyph
see above : #25845 (comment)

HLeithner - comment - 6 Dec 2019

ok so can we fix this case for single mb4 character and have a test for this @Hackwar ?

Then we can fix this issue in staging. Thanks

e44ceff 7 Dec 2019

Merge branch 'staging' into patch-19

Hackwar - comment - 7 Dec 2019

Sorry, I would really love to help you right now, but I'm really swamped with work in the next few weeks. I doubt that I can get to this before christmas. If anybody wants to make a PR against my branch, that would be appreciated.

HLeithner - comment - 11 Dec 2019

Since this is out JS safe output filter function I would request some feedback from @joomla/security and a complete PR with the mentioned single mb4 character.

f1e9834 12 Dec 2019

Merge branch 'staging' into patch-19

infograf768 - change - 12 Dec 2019

Labels

Added: ?

9c44239 28 Mar 2020

Adding JFilterOutput::stringJSSafe() test

Hackwar - change - 28 Mar 2020

Labels

Added: ?
Removed: ? ?

joomla-cms-bot - change - 28 Mar 2020