User tests: Successful: Unsuccessful:
Pull Request for Issues #9361 and #9249
This PR changes back the collation from utf8mb4_unicode_ci or utf8_unicode_ci to utf8mb4_general_ci or utf8_general_ci for the database tables of com_finder in the utf8(mb4) conversion procedure and joomla.sql, so there will not be duplicated search terms anymore after utf8(mb4) conversion, and the behavior of com_finder will be as it was before 3.5.0 Beta 5.
2 tests have to be done:
This test has to be done with Joomla! in a language which has the problem described here http://dev.mysql.com/doc/refman/5.7/en/charset-unicode-sets.html e.g. German or Greece:
To further illustrate, the following equalities hold in both utf8_general_ci and utf8_unicode_ci (for the effect this has in comparisons or when doing searches, see Section 10.1.7.8, “Examples of the Effect of Collation”):
Ä = A
Ö = O
Ü = U
A difference between the collations is that this is true for utf8_general_ci:
ß = s
Whereas this is true for utf8_unicode_ci, which supports the German DIN-1 ordering (also known as dictionary order):
ß = ss
If you did not observe the error mentioned in the issues referred to on top here, makeon a Joomla! version lower than 3.5.0 Beta 1, e.g. a 3.4.8, com_finder contain search terms which would result in duplicate key with unicode sorting, e.g. "ausschließlich" and "ausschliesslich" in German.
Then update to 3.5.0 RC and you will run into the problem reported in these issues.
Now do the same on the e.g 3.4.8 and update to "3.5.0 RC plus patch from this PR".
Result: The updates succeeds without the problem reported in the issues.
Use following custom URL to update to "3.5.0 RC plus patch from this PR" in the Joomla! Update component, channel "Custom URL":
Check that a new installation of "3.5.0 RC plus patch from this PR" does not show any SQL errors.
After installation, run the smart search indexer and verify it can complete without any SQL error related to collations.
You can find the installation sources here: https://github.com/richard67/joomla-cms/archive/finder-search-terms-collation.zip.
Status | New | ⇒ | Pending |
Labels |
Added:
?
|
Labels |
Added:
?
|
I have tested this item successfully on cacc15b
Had to click one Time at "Fix"-Button.
@franz-wohlkoenig Thanks for testing. Why did you have to use the "fix" button, which problem was shown? The missing UTF-8 Multibyte (utf8mb4) conversion? Or something else? Just am curious.
Nothing about UTF-8 Multibyte (utf8mb4) conversion. I was to fast.
If you wan't i can do same again.
Would be interesting to know, yes.
behavior of com_search will be as it was before 3.5.0 Beta 5.
I guess you meant "com_finder".
@infograf768 yes, sorry, will correct that.
@richard67 For your Interest why i have to click Fix-Button:
Index "whoisonline" is not in Table "#_session". (By File: "3.0.0.sql".)
(Translated by: me).
I have tested this item unsuccessfully on cacc15b
Sadly unsuccessful test:
I installed a fresh copy of staging with this PR applied with the "Testing Sample Data" and then tried to run the smart search indexer. I hit the following error:
Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_unicode_ci,IMPLICIT) for operation '=' SQL=INSERT INTO `#__finder_tokens_aggregate` (`term_id`, `term`, `stem`, `common`, `phrase`, `term_weight`, `context`, `context_weight`, `language`) SELECT t.term_id, t1.term, t1.stem, t1.common, t1.phrase, t1.weight, t1.context, ROUND( t1.weight * COUNT( t2.term ) * 1.700000, 8 ) AS context_weight, t1.language FROM ( SELECT DISTINCT t1.term, t1.stem, t1.common, t1.phrase, t1.weight, t1.context, t1.language FROM `#__finder_tokens` AS t1 WHERE t1.context = 1 ) AS t1 JOIN `#__finder_tokens` AS t2 ON t2.term = t1.term LEFT JOIN `#__finder_terms` AS t ON t.term = t1.term WHERE t2.context = 1 GROUP BY t1.term ORDER BY t1.term DESC
Hmm seems we have to change a few other columns back, too ... or change all back to general_ci? Let me know if you have some decision or opinion on that. I go sleep now, continue tomorrow then.
OK So my thoughts are let's make the entire finder component tables run _general_ci - treat it as if it were a standalone component that can't use unicode yet. In the future when our PHP checks are upgraded we can move it to unicode_ci
ok, got it, will do that after sleep and coffee
thanks!
Not sure if it is still necessary or not, but I could update my site (3.4.8) with this PR right now (http://test5.richard-fath.de/list_test4.xml).
On the other hand, if I try to run the smart search indexter I get this 'Illegal mix of collations' error...
My double used word is 'Oberrußbach' and 'Oberrussbach'.
My thought would be (and I am not a joomla developer, I was just thinking - have a look onto the next joomla Version, try it now ) .If you do this with the canonicalise function when entering data into the table (maybe in 3.5.1) - you could just delete the duplicate data with the SQL upgrade ?
It may be usefult to know that the default transliteration for ß
in core
https://github.com/joomla/joomla-cms/blob/staging/libraries/joomla/language/transliterate.php#L47
In French custom transliteration we also use ss
https://github.com/infograf768/fr-FR-3.x/blob/master/admin_fr-FR/fr-FR.localise.php#L133:
This PR has received new commits.
CC: @franz-wohlkoenig, @wilsonge
Title |
|
Title |
|
@franz-wohlkoenig @wilsonge Please test again. Zip container for my custom update URL has been updated already by last commit.
Note the changed testing instructions:
In addition to the test like before ("Test 1" now), the test of a new installation ("Test 2" now) is not additional, optional anymore but mandatory now, in order to exclude the problem @wilsonge noticed first.
@hmayer1980 @swiffer Could you test this PR here, too?
@richard67 http://test5.richard-fath.de/Test1.xml
don't work, so i miss something in your instructions. And if i understand rigth i test both url.
@richard67 i did only test 1 from 2.5.28
Steps done:
1. Clean install of 2.5.28 with sample data
2. Created two articles: one with the title "ausschließlich" and the other with the title "ausschliesslich" (used different alias from then)
3. Smart Search Component "Index", all fine can see the two words indexed
4. Joomla update "Custom URL" with your custom 2.5.x url (http://test5.richard-fath.de/list_test4_j25.xml)
5. Updated fine
6. Extensions -> Database: all fine
7. Database tables all fine (only the #_finder_*
tables are utf8mb4_general_ci the rest are utf8mb4_unicode_ci)
8. Smart Search Component all fine can see the two words indexed
9. Somehow there is a message i need to activate the plugin ... activated the plugin
10. Pressed "Index" -> problem ... undefined null
I don't know if this last problem is related to your PR
@franz-wohlkoenig Here the custom url are
"test1" stuff was my other PR, and so or so the "t" in my "test" URLs are always lowercase.
It is enough to test 1 of these URLs, either the 3.x or the 2.5 one, depending on what you want to update.
@richard67 wait i have some javascript error, please ignore my test until i find if this is because of my server
UPDATE: it's not my server problem. Got a undefined null
because somehow the indexer ajax request (/administrator/index.php?option=com_finder&tmpl=component&format=json&<token>=1&task=indexer.batch
) returns a http 500 error.
@andrepereiradasilva As far as I know, the plugin is only necessary to keep the index up to date when you change articles.
I get here a "[13-Mar-2016 14:26:33 Europe/Berlin] PHP Parse error: syntax error, unexpected '?' in /homepages/38/d391032706/htdocs/test1/media/editors/tinymce/plugins/codesample/plugin.min.js on line 1" error in my PHP log, but not on the patched system for this PR, on another one, also updated to 3.5.0 RC. And this results in not beign able to edit some article. On the system I patched with this PR I do not have this ... strange.
@andrepereiradasilva Ah but this could be related to my server configuration, I use php tozip js because have neither mod_deflate nor mod_zip, and so have to keep short open tags = false. This could me my problem here, not related to yours.
Update: Yes, it is my special server configuration which the new js for the codesample tinymce plugin breaks.
@andrepereiradasilva Can you replicate the error you ran into on a normal staging, i.e. without patch for this PR?
I have tested this item successfully on c59e7c7
@franz-wohlkoenig That means a success. The database fix you refer to is not related to this PR, you will have to do it also when doing the same update with a normal 3.5.0 RC not patched by this PR here.
Ah, yes, you already did it. Thanks for testing, @franz-wohlkoenig .
It may be usefult to know that the default transliteration for ß in core
https://github.com/joomla/joomla-cms/blob/staging/libraries/joomla/language/transliterate.php#L47In French custom transliteration we also use ss
https://github.com/infograf768/fr-FR-3.x/blob/master/admin_fr-FR/fr-FR.localise.php#L133:
We could apply UTF8 to latin transliteration to canonicalise all words in Smart Search and that would seem to have the desired effect for German and French, but if we did it for Greek we'd have the problem that George pointed us to where two different words with completely different meanings (one of which is apparently a bit naughty) would be mapped to the same word under transliteration.
Since we can't rely on _unicode_ci to do a correct implicit canonicalisation in all cases either, it looks like we need a separate canonicalisation method in the language packs to cope with these variations.
Anyway, I think this should be opened as a separate issue rather than cluttering up this one.
hi @richard67 ,
I am confused. If I install with this custom URL (http://test5.richard-fath.de/list_test4.xml). I do not get an Error - OK. But the DB does not look like it is using utf8mb4... (My tables are still utf8_general_ci if I look in phpmyadmin onto some table structures)
And the Search Indexer still produces an Error (a different one, but an error)?
Duplicate entry '4-61' for key 'PRIMARY' SQL=INSERT INTO
#__finder_links_terms2(
link_id,
term_id,
weight) SELECT 4,
term_id, ROUND(SUM(
context_weight), 8) FROM
#__finder_tokens_aggregateWHERE
map_suffix= '2' GROUP BY
termORDER BY
termDESC
...
Just for safety I will rerun the upgrade?
@hmayer1980 I am confused, too. What is your mysql server version? Does it and the client api support utf8mb4? Do you use mysqli or pdomysql as driver in your global settings?
If your server and client support utf8mb4, then all Joomla! core tables should have "utf8mb4..." collations. The com_finder tables `#__finder_..." should have "..._general_ci" and all the others "..._unicode_ci" collations.
There is no utf8mb4 support if mysql database server version is lower than 5.5.3, or if mysqlnd client is used with a version lower than 5.0.9, or if other database client is used with a version lower than 5.5.3.
Just for safety I will rerun the upgrade?
You should start with the same starting point, i.e. same Joomla! files and database, e.g. from a 3.4.8 or whatever you upgraded. You should not upgrade any 3.5.0 Beta or RC or latest staging.
Now I did drop the joomla Directory and all database tables and copied my site again. ( I have a script which copies everything except my Images and Video directory because ist 500 GB in size - I do a link there - A procedure I regular use when testing new Versions of Joomla or components.)
This time the mysql db is on utf8mb4. (on the prev. run I did not receive and error, but the run before this I had an AJAX Error while the Progress bar of the Installation was shown)
Anyway, now the Database seems to be on utf8mb4. No Error while upgrading. So all is fine.
My System Information is
PHP Built On Linux rubinstein 3.2.61-grsec-modsign #1 SMP Tue Aug 12 09:58:26 UTC 2014 x86_64
Database Version 5.6.25-log
Database Collation latin1_swedish_ci
Database Connection Collation utf8mb4_general_ci
PHP Version 5.6.18
Web Server Apache
WebServer to PHP Interface cgi-fcgi
Joomla! Version Joomla! 3.5.0-rc Release Candidate [ Ember ] 10-March-2016 00:30 GMT
Joomla! Platform Version Joomla Platform 13.1.0 Stable [ Curiosity ] 24-Apr-2013 00:00 GMT
User Agent Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko
I have never used the Smart Search Index, but when I run it, I receive this dupplicate entry error. This error must not block this pull request, because in Version 3.4.8 the Smart Search Indexer does not return (it does not raise an error, but looks like running endless - it is stuck on the Progress "Your content is being indexed. Do not close this window.". In the 3.5.0 Version it returns with error and one more row is shown on the list. A new click results in a new error and a new row??? )
@hmayer1980 Looks like a good test result to me then.
The AJAX error you mentioned could be same as @andrepereiradasilva had?
I think it is not related to this PR bug might be related to cached js in the broswer?
Maybe it needs the browser to force load all from server and not from cache after an update, e.g. in Firefox with "shift" key + reload button, before doing anything else after the update?
@hmayer1980 Do you have SSH access to the server so you can run CLI scripts? If you can, try running the indexer from the command line:
php cli/finder_indexer.php
That will give you more information about what is failing.
i think the 500 error is something related to weblinks
I run the cli script and got:
Smart Search INDEXER
============================
Starting Indexer
Setting up Smart Search plugins
PHP Warning: require_once(/path/to/joomla-25/includes/application.php): failed to open stream: No such file or directory in /path/to/joomla-25/plugins/finder/weblinks/weblinks.php on line 313
Warning: require_once(/path/to/joomla-25/includes/application.php): failed to open stream: No such file or directory in /path/to/joomla-25/plugins/finder/weblinks/weblinks.php on line 313
PHP Fatal error: require_once(): Failed opening required '/path/to/joomla-25/includes/application.php' (include_path='xxxx') in /path/to/joomla-25/plugins/finder/weblinks/weblinks.php on line 313
Fatal error: require_once(): Failed opening required '/path/to/joomla-25/includes/application.php' (include_path='xxxx') in /path/to/joomla-25/plugins/finder/weblinks/weblinks.php on line 313
Ha, that's why I don't see it ... have uninstalled the weblinks once it was separated from core, and am using for testing a copy of the site where I have done so.
So it's not related to this PR? does it happen with an unpatched 3.5.0 RC, too?
The weblinks plugin for Smart Search is no longer in the core distribution so you won't see that error in 3.5x. It's happening because you uninstalled weblinks but didn't uninstall the weblinks Smart Search plugin. Even just disabling the plugin will fix the problem.
@hmayer1980 @andrepereiradasilva Can you check chrisdavenport's comment above and do as he recommends and then continue testing this PR? Thanks in advance if so.
I am Fixing my smartsearch Error in 3.4.8 (Production Site) right now.I am still running that smart search indexer - after a few Errors it is now running - I think the retries fixed the Errors one by one.
Running since 1.5 hours...
(
Setup 129728 items in 0.506 seconds.
....
When this run completed I will copy the Site again and try to run an upgrade - but I fear that this new large tables (finder tables are still growing - some > 100.000, for example finder_terms - 18 MB, 82.000 records) my shared hosted Server will timeout on the upgrade script if all this large tables will be rebuilt with a new collation... - another good test.
If I where you I would not count on me running this test again Maybe the indexer will run for hours - I do not know.
Oh ... then I only can wish good luck
@hmayer1980 @andrepereiradasilva Can you check chrisdavenport's comment above and do as he recommends and then continue testing this PR? Thanks in advance if so.
It's true, if i disable "Smart Search - Web Links" plugin the indexer runs fine. Sorry for the late response, but many things to do today.
9.Somehow there is a message i need to activate the plugin ... activated the plugin
Probably was this plugin i enabled in step 9 of my tests and cause the error. But i will retest as soon as i can and post the results after.
But probably this was the error so you can move on.
repeated test with the same steps, didn't enabled or disabled any plugin: same error.
But, after the update, i noticed i have also an update for weblinks package, and when updated weblinks ... the error is gone, and the indexer works again!
So we just need to update weblinks, nothing to do with this PR i think.
So for me Test 1: OK
UPDATE
Test 2:
#__finder_*
thata are utf8mb4_general_ci: OKI have tested this item successfully on c59e7c7
As described in tests in above comments
I have tested this item successfully on c59e7c7
Starting site from scatch works as expected with indexing working as expected
Before Patch applied when upgrading from 3.4.8 to the RC I get when creating two articles with titles"ausschließlich" and "ausschliesslich" I got an SQL Error:
Error
Duplicate entry 'ausschliesslich' for key 'idx_term' SQL=ALTER TABLE `#__finder_terms` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
After Patch no issues.
Milestone |
Added: |
Status | Pending | ⇒ | Fixed in Code Base |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2016-03-13 19:46:58 |
Closed_By | ⇒ | wilsonge |
Labels |
Removed:
?
|
I had the same error when updating to 3.5.0 (FINAL) just with the word hätte
and fixed it like this http://forum.joomla.org/viewtopic.php?f=707&p=3379083#p3379211
Also broken with Czech (had "Domů" in weblinks, after editing the item to "Domu" the conversion completed successfully.)
@wilsonge 3.5.0-blocker label for this here please.