? bug PR-4.3-dev Pending

User tests: Successful: Unsuccessful:

avatar Hackwar
Hackwar
29 Jul 2023

Summary of Changes

Smart Search splits up the text into single words and indexes those. The chinese language uses single characters per word and thus we can't just split on whitespace. For that, we have additional code for tokenisation for Chinese content, which unfortunately up to now was broken.

The strategy was to get a list of all chinese characters in the term, then to replace them all in the given term and add the chinese character as new terms to the list at the end. When a term is empty when replacing all chinese characters (because it only contained those characters and no numbers or latin chars) that term has to be removed from the list.

The problem was, that characters could be present more than once in the input, but when replacing them, all occurences would be replaced at once the first time. The list of to-be-replaced characters however did contain those characters for each occurence in the input term and when the input term ran empty before the list of all characters was processed, this threw a notice.

This PR tries to fix that. In a first attempt, I tried to replace all chars at once and then to add all matches as new terms at the end. I wrote lots of comments and it took quite some work, as you can see from the first commit in this PR. Then I noticed, that I could have this a lot easier. Now I'm just modifying the input before handing it to our default tokenisation routine by adding whitespace around each chinese character. That is a lot easier, shorter and better to understand than that previous attempt...

Testing Instructions

  1. Install simplified chinese in your site and setup a multilanguage site
  2. Edit administrator/components/com_finder/src/Indexer/Indexer.php and add a die; on line 633 (before the return $linkId;) to abort the redirect when saving an article.
  3. Copy the following string into an article 标签印刷机在更换印刷工艺时的调试时间灵活适用于所有应用 and mark the articles language as chinese.
  4. Save the article.

Actual result BEFORE applying this Pull Request

You get a white page with a notice Undefined Offset at X

Expected result AFTER applying this Pull Request

The page is white without any notices at all.

When you remove the die; from the Indexer.php, saving works normally again.

Link to documentations

Please select:

  • Documentation link for docs.joomla.org:

  • No documentation changes for docs.joomla.org needed

  • Pull Request link for manual.joomla.org:

  • No documentation changes for manual.joomla.org needed

Thanks to @coolcat-creations for reporting this to me and helping with the debugging.

avatar joomla-cms-bot joomla-cms-bot - change - 29 Jul 2023
Category Administration com_finder
avatar Hackwar Hackwar - open - 29 Jul 2023
avatar Hackwar Hackwar - change - 29 Jul 2023
Status New Pending
avatar Hackwar Hackwar - change - 1 Aug 2023
Labels Added: PR-4.3-dev
avatar coolcat-creations coolcat-creations - test_item - 1 Aug 2023 - Tested successfully
avatar coolcat-creations
coolcat-creations - comment - 1 Aug 2023

I have tested this item successfully on ba36ce2

Thank you for the patch, I tested it, the search did not break and the article with the string was indexed.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/41275.

avatar richard67 richard67 - test_item - 6 Aug 2023 - Tested successfully
avatar richard67
richard67 - comment - 6 Aug 2023

I have tested this item successfully on ba36ce2

It seems this fixes also the failing blog sample data installation when backend is Chinese.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/41275.

avatar richard67 richard67 - change - 6 Aug 2023
Status Pending Ready to Commit
avatar richard67
richard67 - comment - 6 Aug 2023

RTC


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/41275.

avatar obuisard obuisard - change - 7 Aug 2023
Labels Added: ?
avatar obuisard obuisard - change - 7 Aug 2023
Labels Added: bug
avatar obuisard obuisard - close - 7 Aug 2023
avatar obuisard obuisard - merge - 7 Aug 2023
avatar obuisard obuisard - change - 7 Aug 2023
Status Ready to Commit Fixed in Code Base
Closed_Date 0000-00-00 00:00:00 2023-08-07 19:58:33
Closed_By obuisard
avatar obuisard
obuisard - comment - 7 Aug 2023

Thank you Hannes @Hackwar for this PR.

avatar coolcat-creations
coolcat-creations - comment - 7 Aug 2023

Thank you @Hackwar

Add a Comment

Login with GitHub to post a comment