User tests: Successful: Unsuccessful:
The previous code doesn't really support non-english characters and for example fails on german umlauts. This code is used in com_finder, so when you are searching for something like "Europäer", you wont get any highlighting for the searched term in the search results.
Status | New | ⇒ | Pending |
Category | ⇒ | Libraries |
Although it does not always work (see below), it is a great improvement.
I have tested this item
I have tested this item
Note: It does not work for single 4 bytes characters.
Example: ?
Edit:
"They are in the Supplementary Plane and will test support for code points above U+FFFF"
See: http://www.i18nguy.com/unicode/supplementary-test.html
Dumping some of these 4 bytes characters is correct though
Assuming ? is required, the following results were found.
string(7) "\u20e9d"
Agree with @brianteeman this is an easy function to test as it's static and no dependencies.
I have tested this item
Tested with "Überlingen", "Europäer", "Музыка", "Curaçao" and "香肠"
@infograf768 Just I noticed your posts above. Does it mean your test was not good? Should that work, 4-byte UTF-8? I know we did much for MySQL to support utf8mb4, and PHP should be ok, but maybe we have forgotten something in js?
I have not tested this item.
I have tested this item
Agree with JM: A big improvement compared to now.
hm, can just be?
return json_encode($string);
@Fedik
with json_encode($new_str)
we lose highlighting.
@richard67
In my posts above I was just wondering why it does not work fully with 4 bytes glyphs when isolated.
FYI, this code is already present in 4.0 and we get the same results as here.
Richard. The media issue is completely different and a large part of that is the filesystem
Labels |
Added:
?
|
Status | Pending | ⇒ | Ready to Commit |
RTC
Labels |
Added:
?
|
@infograf768 does this support mb4?
@Hackwar you wrote this pr against J3, do you had time to write some tests for this?
@infograf768 does this support mb4?
yes, when we do not search an isolated utf8mb4 glyph
see above : #25845 (comment)
Sorry, I would really love to help you right now, but I'm really swamped with work in the next few weeks. I doubt that I can get to this before christmas. If anybody wants to make a PR against my branch, that would be appreciated.
Since this is out JS safe output filter function I would request some feedback from @joomla/security and a complete PR with the mentioned single mb4 character.
Labels |
Added:
?
|
Labels |
Added:
?
Removed: ? ? |
Category | Libraries | ⇒ | Libraries Unit Tests |
Labels |
Added:
?
?
?
Removed: ? |
I added a unit test. @infograf768 can you have a look if I did this right with the utf8mb4 glyph?
Will test tomorrow
Labels |
Added:
?
Removed: ? ? |
I tested with your strings and the second string did work, while the first indeed did not work. I looked into this and again noticed that I'm not a JS developer. Javascript uses UCS-2 to encode its strings and that means that the code we've been using will only work for Unicode characters up to U+FFFF. Everything above would have to be converted in a way that I found too complicated for me to grasp. I took the cheap way out and for such characters I'm now using ECMAScript 6 notation with curly braces around the escape sequence. I did not want to use that for all characters, since I fear that that would be a break in B/C, since it would require that every browser would have to support ECMAScript. Since this feature didn't work at all up till now, I think it would be okay to require that compatibility for higher codepoints... What do you think?
Labels |
Added:
?
Removed: ? |
Status | Ready to Commit | ⇒ | Pending |
不能创建文件 is still not highlighted here (same result as above #25845 (comment) ) while it is when we use com_search
@infograf768 The Chinese problem could come from a collation problem in MySQL, see my comments in the J4 PR: #28493 (comment) and previous.
@richard67
read. We should keep com_search
@infograf768 I'd like to keep com_search too, at least as alternative for small sites like my private site.
However this has nothing to do with com_search or com_finder. You have the exact same issue in com_search as well. It is an issue with the DB and the bad support for chinese characters.
@wilsonge @HLeithner Are you saying that you are requiring some check if this new notation is allowed to be used? Because I don't see how that could be done... Considering that it is broken right now and this would fix it for all users except for <IE11, I would accept this change as is...
Status | Pending | ⇒ | Fixed in Code Base |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2020-04-08 10:08:36 |
Closed_By | ⇒ | HLeithner | |
Labels |
Removed:
?
?
|
Thanks
This is something we really should have a test for. Specifically to ensure it not only supports accented characters in latin characters sets but also in non latin character sets and multibyte character sets.