? Success

User tests: Successful: Unsuccessful:

avatar richard67
richard67
12 Mar 2016

Pull Request for Issues #9361 and #9249

Summary of Changes

This PR changes back the collation from utf8mb4_unicode_ci or utf8_unicode_ci to utf8mb4_general_ci or utf8_general_ci for the database tables of com_finder in the utf8(mb4) conversion procedure and joomla.sql, so there will not be duplicated search terms anymore after utf8(mb4) conversion, and the behavior of com_finder will be as it was before 3.5.0 Beta 5.

Testing Instructions

2 tests have to be done:

  1. Updating a Joomla! with version lower than 3.5.0. Beta 1, e.g. a 3.4.8, to latest staging + this PR to see if the issue is solved.
  2. Install a clean Joomla! latest staging + this PR with testing sample data and see if running the indexer still works.
Test 1: Update

This test has to be done with Joomla! in a language which has the problem described here http://dev.mysql.com/doc/refman/5.7/en/charset-unicode-sets.html e.g. German or Greece:

To further illustrate, the following equalities hold in both utf8_general_ci and utf8_unicode_ci (for the effect this has in comparisons or when doing searches, see Section 10.1.7.8, “Examples of the Effect of Collation”):
Ä = A
Ö = O
Ü = U
A difference between the collations is that this is true for utf8_general_ci:
ß = s
Whereas this is true for utf8_unicode_ci, which supports the German DIN-1 ordering (also known as dictionary order):
ß = ss

If you did not observe the error mentioned in the issues referred to on top here, makeon a Joomla! version lower than 3.5.0 Beta 1, e.g. a 3.4.8, com_finder contain search terms which would result in duplicate key with unicode sorting, e.g. "ausschließlich" and "ausschliesslich" in German.

Then update to 3.5.0 RC and you will run into the problem reported in these issues.

Now do the same on the e.g 3.4.8 and update to "3.5.0 RC plus patch from this PR".

Result: The updates succeeds without the problem reported in the issues.

Use following custom URL to update to "3.5.0 RC plus patch from this PR" in the Joomla! Update component, channel "Custom URL":

Test 2: New installation and indexer run

Check that a new installation of "3.5.0 RC plus patch from this PR" does not show any SQL errors.

After installation, run the smart search indexer and verify it can complete without any SQL error related to collations.

You can find the installation sources here: https://github.com/richard67/joomla-cms/archive/finder-search-terms-collation.zip.

avatar richard67 richard67 - open - 12 Mar 2016
avatar richard67 richard67 - change - 12 Mar 2016
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 12 Mar 2016
Labels Added: ?
avatar richard67
richard67 - comment - 12 Mar 2016

@wilsonge 3.5.0-blocker label for this here please.

avatar wilsonge wilsonge - change - 12 Mar 2016
Labels Added: ?
avatar richard67 richard67 - change - 12 Mar 2016
The description was changed
avatar franz-wohlkoenig franz-wohlkoenig - test_item - 12 Mar 2016 - Tested successfully
avatar franz-wohlkoenig
franz-wohlkoenig - comment - 12 Mar 2016

I have tested this item :white_check_mark: successfully on cacc15b


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/9387.

Had to click one Time at "Fix"-Button.

avatar richard67
richard67 - comment - 12 Mar 2016

@franz-wohlkoenig Thanks for testing. Why did you have to use the "fix" button, which problem was shown? The missing UTF-8 Multibyte (utf8mb4) conversion? Or something else? Just am curious.

avatar franz-wohlkoenig
franz-wohlkoenig - comment - 12 Mar 2016

Nothing about UTF-8 Multibyte (utf8mb4) conversion. I was to fast.

If you wan't i can do same again.

avatar richard67
richard67 - comment - 12 Mar 2016

Would be interesting to know, yes.

avatar infograf768
infograf768 - comment - 12 Mar 2016

behavior of com_search will be as it was before 3.5.0 Beta 5.

I guess you meant "com_finder". :smiley:

avatar richard67
richard67 - comment - 12 Mar 2016

@infograf768 yes, sorry, will correct that.

avatar richard67 richard67 - change - 12 Mar 2016
The description was changed
avatar franz-wohlkoenig
franz-wohlkoenig - comment - 12 Mar 2016

@richard67 For your Interest why i have to click Fix-Button:

Index "whoisonline" is not in Table "#_session". (By File: "3.0.0.sql".) (Translated by: me).

avatar wilsonge wilsonge - test_item - 13 Mar 2016 - Tested unsuccessfully
avatar wilsonge
wilsonge - comment - 13 Mar 2016

I have tested this item :red_circle: unsuccessfully on cacc15b

Sadly unsuccessful test:

I installed a fresh copy of staging with this PR applied with the "Testing Sample Data" and then tried to run the smart search indexer. I hit the following error:

Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_unicode_ci,IMPLICIT) for operation '=' SQL=INSERT INTO `#__finder_tokens_aggregate` (`term_id`, `term`, `stem`, `common`, `phrase`, `term_weight`, `context`, `context_weight`, `language`) SELECT t.term_id, t1.term, t1.stem, t1.common, t1.phrase, t1.weight, t1.context, ROUND( t1.weight * COUNT( t2.term ) * 1.700000, 8 ) AS context_weight, t1.language FROM ( SELECT DISTINCT t1.term, t1.stem, t1.common, t1.phrase, t1.weight, t1.context, t1.language FROM `#__finder_tokens` AS t1 WHERE t1.context = 1 ) AS t1 JOIN `#__finder_tokens` AS t2 ON t2.term = t1.term LEFT JOIN `#__finder_terms` AS t ON t.term = t1.term WHERE t2.context = 1 GROUP BY t1.term ORDER BY t1.term DESC


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/9387.

avatar richard67
richard67 - comment - 13 Mar 2016

Hmm seems we have to change a few other columns back, too ... or change all back to general_ci? Let me know if you have some decision or opinion on that. I go sleep now, continue tomorrow then.

avatar wilsonge
wilsonge - comment - 13 Mar 2016

OK So my thoughts are let's make the entire finder component tables run _general_ci - treat it as if it were a standalone component that can't use unicode yet. In the future when our PHP checks are upgraded we can move it to unicode_ci

avatar richard67
richard67 - comment - 13 Mar 2016

ok, got it, will do that after sleep and coffee

avatar wilsonge
wilsonge - comment - 13 Mar 2016

thanks!

avatar hmayer1980
hmayer1980 - comment - 13 Mar 2016

Not sure if it is still necessary or not, but I could update my site (3.4.8) with this PR right now (http://test5.richard-fath.de/list_test4.xml).

On the other hand, if I try to run the smart search indexter I get this 'Illegal mix of collations' error...

My double used word is 'Oberrußbach' and 'Oberrussbach'.

My thought would be (and I am not a joomla developer, I was just thinking - have a look onto the next joomla Version, try it now ) .If you do this with the canonicalise function when entering data into the table (maybe in 3.5.1) - you could just delete the duplicate data with the SQL upgrade ?

avatar infograf768
infograf768 - comment - 13 Mar 2016

It may be usefult to know that the default transliteration for ß in core
https://github.com/joomla/joomla-cms/blob/staging/libraries/joomla/language/transliterate.php#L47

In French custom transliteration we also use ss
https://github.com/infograf768/fr-FR-3.x/blob/master/admin_fr-FR/fr-FR.localise.php#L133:

avatar franz-wohlkoenig
franz-wohlkoenig - comment - 13 Mar 2016

As @wilsonge reported got same Illegal mix of collations...-Message.

avatar joomla-cms-bot
joomla-cms-bot - comment - 13 Mar 2016

This PR has received new commits.

CC: @franz-wohlkoenig, @wilsonge


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/9387.

avatar richard67 richard67 - change - 13 Mar 2016
The description was changed
Title
Correct issue #9361 - Change finder terms to general collation
Correct issue #9361 - Change com_finder tables to general collation
avatar richard67 richard67 - change - 13 Mar 2016
Title
Correct issue #9361 - Change finder terms to general collation
Correct issue #9361 - Change com_finder tables to general collation
avatar richard67 richard67 - change - 13 Mar 2016
The description was changed
avatar richard67
richard67 - comment - 13 Mar 2016

@franz-wohlkoenig @wilsonge Please test again. Zip container for my custom update URL has been updated already by last commit.

Note the changed testing instructions:

In addition to the test like before ("Test 1" now), the test of a new installation ("Test 2" now) is not additional, optional anymore but mandatory now, in order to exclude the problem @wilsonge noticed first.

@hmayer1980 @swiffer Could you test this PR here, too?


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/9387.

avatar franz-wohlkoenig
franz-wohlkoenig - comment - 13 Mar 2016

@richard67 http://test5.richard-fath.de/Test1.xml don't work, so i miss something in your instructions. And if i understand rigth i test both url.

avatar andrepereiradasilva
andrepereiradasilva - comment - 13 Mar 2016

@richard67 i did only test 1 from 2.5.28

Steps done:
1. Clean install of 2.5.28 with sample data
2. Created two articles: one with the title "ausschließlich" and the other with the title "ausschliesslich" (used different alias from then)
3. Smart Search Component "Index", all fine can see the two words indexed
4. Joomla update "Custom URL" with your custom 2.5.x url (http://test5.richard-fath.de/list_test4_j25.xml)
5. Updated fine
6. Extensions -> Database: all fine
7. Database tables all fine (only the #_finder_* tables are utf8mb4_general_ci the rest are utf8mb4_unicode_ci)
8. Smart Search Component all fine can see the two words indexed
9. Somehow there is a message i need to activate the plugin ... activated the plugin
10. Pressed "Index" -> problem ... undefined null

I don't know if this last problem is related to your PR

avatar richard67
richard67 - comment - 13 Mar 2016

@franz-wohlkoenig Here the custom url are

"test1" stuff was my other PR, and so or so the "t" in my "test" URLs are always lowercase.

It is enough to test 1 of these URLs, either the 3.x or the 2.5 one, depending on what you want to update.

avatar andrepereiradasilva
andrepereiradasilva - comment - 13 Mar 2016

@richard67 wait i have some javascript error, please ignore my test until i find if this is because of my server

UPDATE: it's not my server problem. Got a undefined null because somehow the indexer ajax request (/administrator/index.php?option=com_finder&tmpl=component&format=json&<token>=1&task=indexer.batch) returns a http 500 error.

avatar richard67
richard67 - comment - 13 Mar 2016

@andrepereiradasilva As far as I know, the plugin is only necessary to keep the index up to date when you change articles.

I get here a "[13-Mar-2016 14:26:33 Europe/Berlin] PHP Parse error: syntax error, unexpected '?' in /homepages/38/d391032706/htdocs/test1/media/editors/tinymce/plugins/codesample/plugin.min.js on line 1" error in my PHP log, but not on the patched system for this PR, on another one, also updated to 3.5.0 RC. And this results in not beign able to edit some article. On the system I patched with this PR I do not have this ... strange.

avatar richard67
richard67 - comment - 13 Mar 2016

@andrepereiradasilva Ah but this could be related to my server configuration, I use php tozip js because have neither mod_deflate nor mod_zip, and so have to keep short open tags = false. This could me my problem here, not related to yours.

Update: Yes, it is my special server configuration which the new js for the codesample tinymce plugin breaks.

avatar franz-wohlkoenig
franz-wohlkoenig - comment - 13 Mar 2016

@richard67

  1. 3.4.8 > 3.5.0-rc
  2. Database need a Fix like in 195810666
  3. Fix done with success
  4. New Index of Smart Search with success
  5. in #_finder_terms "ausschließlich"/"ausschliesslich" are like in J-3.4.8.
avatar richard67
richard67 - comment - 13 Mar 2016

@andrepereiradasilva Can you replicate the error you ran into on a normal staging, i.e. without patch for this PR?

avatar franz-wohlkoenig franz-wohlkoenig - test_item - 13 Mar 2016 - Tested successfully
avatar franz-wohlkoenig
franz-wohlkoenig - comment - 13 Mar 2016

I have tested this item :white_check_mark: successfully on c59e7c7


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/9387.

avatar richard67
richard67 - comment - 13 Mar 2016

@franz-wohlkoenig That means a success. The database fix you refer to is not related to this PR, you will have to do it also when doing the same update with a normal 3.5.0 RC not patched by this PR here.

avatar richard67
richard67 - comment - 13 Mar 2016

Ah, yes, you already did it. Thanks for testing, @franz-wohlkoenig .

avatar chrisdavenport
chrisdavenport - comment - 13 Mar 2016

@infograf768

It may be usefult to know that the default transliteration for ß in core
https://github.com/joomla/joomla-cms/blob/staging/libraries/joomla/language/transliterate.php#L47

In French custom transliteration we also use ss
https://github.com/infograf768/fr-FR-3.x/blob/master/admin_fr-FR/fr-FR.localise.php#L133:

We could apply UTF8 to latin transliteration to canonicalise all words in Smart Search and that would seem to have the desired effect for German and French, but if we did it for Greek we'd have the problem that George pointed us to where two different words with completely different meanings (one of which is apparently a bit naughty) would be mapped to the same word under transliteration.

Since we can't rely on _unicode_ci to do a correct implicit canonicalisation in all cases either, it looks like we need a separate canonicalisation method in the language packs to cope with these variations.

Anyway, I think this should be opened as a separate issue rather than cluttering up this one.

avatar hmayer1980
hmayer1980 - comment - 13 Mar 2016

hi @richard67 ,

I am confused. If I install with this custom URL (http://test5.richard-fath.de/list_test4.xml). I do not get an Error - OK. But the DB does not look like it is using utf8mb4... (My tables are still utf8_general_ci if I look in phpmyadmin onto some table structures)

And the Search Indexer still produces an Error (a different one, but an error)?

Duplicate entry '4-61' for key 'PRIMARY' SQL=INSERT INTO#__finder_links_terms2(link_id,term_id,weight) SELECT 4,term_id, ROUND(SUM(context_weight), 8) FROM#__finder_tokens_aggregateWHEREmap_suffix= '2' GROUP BYtermORDER BYtermDESC

...
Just for safety I will rerun the upgrade?

avatar richard67
richard67 - comment - 13 Mar 2016

@hmayer1980 I am confused, too. What is your mysql server version? Does it and the client api support utf8mb4? Do you use mysqli or pdomysql as driver in your global settings?

If your server and client support utf8mb4, then all Joomla! core tables should have "utf8mb4..." collations. The com_finder tables `#__finder_..." should have "..._general_ci" and all the others "..._unicode_ci" collations.

There is no utf8mb4 support if mysql database server version is lower than 5.5.3, or if mysqlnd client is used with a version lower than 5.0.9, or if other database client is used with a version lower than 5.5.3.

Just for safety I will rerun the upgrade?

You should start with the same starting point, i.e. same Joomla! files and database, e.g. from a 3.4.8 or whatever you upgraded. You should not upgrade any 3.5.0 Beta or RC or latest staging.

avatar hmayer1980
hmayer1980 - comment - 13 Mar 2016

Now I did drop the joomla Directory and all database tables and copied my site again. ( I have a script which copies everything except my Images and Video directory because ist 500 GB in size - I do a link there - A procedure I regular use when testing new Versions of Joomla or components.)

This time the mysql db is on utf8mb4. (on the prev. run I did not receive and error, but the run before this I had an AJAX Error while the Progress bar of the Installation was shown)

Anyway, now the Database seems to be on utf8mb4. No Error while upgrading. So all is fine.

My System Information is
PHP Built On Linux rubinstein 3.2.61-grsec-modsign #1 SMP Tue Aug 12 09:58:26 UTC 2014 x86_64

Database Version 5.6.25-log

Database Collation latin1_swedish_ci

Database Connection Collation utf8mb4_general_ci

PHP Version 5.6.18

Web Server Apache

WebServer to PHP Interface cgi-fcgi

Joomla! Version Joomla! 3.5.0-rc Release Candidate [ Ember ] 10-March-2016 00:30 GMT

Joomla! Platform Version Joomla Platform 13.1.0 Stable [ Curiosity ] 24-Apr-2013 00:00 GMT

User Agent Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko

I have never used the Smart Search Index, but when I run it, I receive this dupplicate entry error. This error must not block this pull request, because in Version 3.4.8 the Smart Search Indexer does not return (it does not raise an error, but looks like running endless - it is stuck on the Progress "Your content is being indexed. Do not close this window.". In the 3.5.0 Version it returns with error and one more row is shown on the list. A new click results in a new error and a new row??? )

avatar richard67
richard67 - comment - 13 Mar 2016

@hmayer1980 Looks like a good test result to me then.

The AJAX error you mentioned could be same as @andrepereiradasilva had?

I think it is not related to this PR bug might be related to cached js in the broswer?

Maybe it needs the browser to force load all from server and not from cache after an update, e.g. in Firefox with "shift" key + reload button, before doing anything else after the update?

avatar chrisdavenport
chrisdavenport - comment - 13 Mar 2016

@hmayer1980 Do you have SSH access to the server so you can run CLI scripts? If you can, try running the indexer from the command line:


php cli/finder_indexer.php

That will give you more information about what is failing.

avatar andrepereiradasilva
andrepereiradasilva - comment - 13 Mar 2016

i think the 500 error is something related to weblinks
I run the cli script and got:

Smart Search INDEXER
============================
Starting Indexer
Setting up Smart Search plugins
PHP Warning:  require_once(/path/to/joomla-25/includes/application.php): failed to open stream: No such file or directory in /path/to/joomla-25/plugins/finder/weblinks/weblinks.php on line 313

Warning: require_once(/path/to/joomla-25/includes/application.php): failed to open stream: No such file or directory in /path/to/joomla-25/plugins/finder/weblinks/weblinks.php on line 313
PHP Fatal error:  require_once(): Failed opening required '/path/to/joomla-25/includes/application.php' (include_path='xxxx') in /path/to/joomla-25/plugins/finder/weblinks/weblinks.php on line 313

Fatal error: require_once(): Failed opening required '/path/to/joomla-25/includes/application.php' (include_path='xxxx') in /path/to/joomla-25/plugins/finder/weblinks/weblinks.php on line 313
avatar richard67
richard67 - comment - 13 Mar 2016

Ha, that's why I don't see it ... have uninstalled the weblinks once it was separated from core, and am using for testing a copy of the site where I have done so.

So it's not related to this PR? does it happen with an unpatched 3.5.0 RC, too?

avatar chrisdavenport
chrisdavenport - comment - 13 Mar 2016

The weblinks plugin for Smart Search is no longer in the core distribution so you won't see that error in 3.5x. It's happening because you uninstalled weblinks but didn't uninstall the weblinks Smart Search plugin. Even just disabling the plugin will fix the problem.

avatar richard67
richard67 - comment - 13 Mar 2016

@hmayer1980 @andrepereiradasilva Can you check chrisdavenport's comment above and do as he recommends and then continue testing this PR? Thanks in advance if so.


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/9387.

avatar hmayer1980
hmayer1980 - comment - 13 Mar 2016

I am Fixing my smartsearch Error in 3.4.8 (Production Site) right now.I am still running that smart search indexer - after a few Errors it is now running - I think the retries fixed the Errors one by one.
Running since 1.5 hours...
(
Setup 129728 items in 0.506 seconds.
....

  • Processed batch 551 in 10.866 seconds.
  • Processed batch 552 in 12.627 seconds. )

When this run completed I will copy the Site again and try to run an upgrade - but I fear that this new large tables (finder tables are still growing - some > 100.000, for example finder_terms - 18 MB, 82.000 records) my shared hosted Server will timeout on the upgrade script if all this large tables will be rebuilt with a new collation... - another good test.
If I where you I would not count on me running this test again Maybe the indexer will run for hours - I do not know.

avatar richard67
richard67 - comment - 13 Mar 2016

Oh ... then I only can wish good luck :smile:

avatar andrepereiradasilva
andrepereiradasilva - comment - 13 Mar 2016

@hmayer1980 @andrepereiradasilva Can you check chrisdavenport's comment above and do as he recommends and then continue testing this PR? Thanks in advance if so.

It's true, if i disable "Smart Search - Web Links" plugin the indexer runs fine. Sorry for the late response, but many things to do today.

9.Somehow there is a message i need to activate the plugin ... activated the plugin

Probably was this plugin i enabled in step 9 of my tests and cause the error. But i will retest as soon as i can and post the results after.
But probably this was the error so you can move on.

avatar andrepereiradasilva
andrepereiradasilva - comment - 13 Mar 2016

repeated test with the same steps, didn't enabled or disabled any plugin: same error.

But, after the update, i noticed i have also an update for weblinks package, and when updated weblinks ... the error is gone, and the indexer works again!

So we just need to update weblinks, nothing to do with this PR i think.

So for me Test 1: OK

UPDATE
Test 2:

  • Database no error: OK
  • All tables utf8mb4_unicode_ci except the #__finder_*thata are utf8mb4_general_ci: OK
  • Index: OK
avatar andrepereiradasilva andrepereiradasilva - test_item - 13 Mar 2016 - Tested successfully
avatar andrepereiradasilva
andrepereiradasilva - comment - 13 Mar 2016

I have tested this item :white_check_mark: successfully on c59e7c7

As described in tests in above comments


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/9387.

avatar wilsonge wilsonge - test_item - 13 Mar 2016 - Tested successfully
avatar wilsonge
wilsonge - comment - 13 Mar 2016

I have tested this item :white_check_mark: successfully on c59e7c7

Starting site from scatch works as expected with indexing working as expected

Before Patch applied when upgrading from 3.4.8 to the RC I get when creating two articles with titles"ausschließlich" and "ausschliesslich" I got an SQL Error:

Error

Duplicate entry 'ausschliesslich' for key 'idx_term' SQL=ALTER TABLE `#__finder_terms` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

After Patch no issues.


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/9387.

avatar wilsonge wilsonge - change - 13 Mar 2016
Milestone Added:
avatar wilsonge wilsonge - change - 13 Mar 2016
Status Pending Fixed in Code Base
Closed_Date 0000-00-00 00:00:00 2016-03-13 19:46:58
Closed_By wilsonge
avatar wilsonge wilsonge - close - 13 Mar 2016
avatar wilsonge wilsonge - merge - 13 Mar 2016
avatar wilsonge wilsonge - reference | 161bc19 - 13 Mar 16
avatar wilsonge wilsonge - merge - 13 Mar 2016
avatar wilsonge wilsonge - close - 13 Mar 2016
avatar wilsonge wilsonge - close - 13 Mar 2016
avatar richard67 richard67 - head_ref_deleted - 13 Mar 2016
avatar wilsonge wilsonge - change - 18 Mar 2016
Labels Removed: ?
avatar Stanzilla
Stanzilla - comment - 22 Mar 2016

I had the same error when updating to 3.5.0 (FINAL) just with the word hätte and fixed it like this http://forum.joomla.org/viewtopic.php?f=707&p=3379083#p3379211

avatar doktornotor
doktornotor - comment - 2 Apr 2016

Also broken with Czech (had "Domů" in weblinks, after editing the item to "Domu" the conversion completed successfully.)

Add a Comment

Login with GitHub to post a comment