Joomla! Issue Tracker | Joomla! CMS #28425 - [4.0] Change collations of com_finder tables from general to unicode on MySQL databases.

? ?

Fixed in Code Base
23 Mar 2020
Urgent
Build: 4.0-dev
# 28425
Diff
richard67:4.0-dev-com-finder-unicode-collations

Pending

Pending Hound Hound is busy sniffing around... Details

User tests: Successful: Unsuccessful:

richard67
21 Mar 2020

Pull Request for Issue #9361 .

Redo of Pull Request (PR) #16617 but for 4.0 and in a different way.

Summary of Changes

This Pull Request (PR) changes collations of database tables for com_finder from general to unicode collations for MySQL databases.

We had left them with general collations when we did the utf8mb4 conversion because otherwise things broke when tables from 3rd party extensions where joined to com_finder tables.

Now with J4 com_finder database tables have been restructured anyway, so most of the tables are either dropped and created again in the update sql script changed with this PR, or they are at least truncated, so it is a good chance to change the table collations now without having to do what was before tried with PR #16617 for that purpose, which was a kind of 2nd stage of the utf8mb4 conversion procedure. Especially the #__finder_terms table required a special conversion in PR #16617 because of possible duplicate records after conversion, but now in J4 this is not a problem anymore, because the #__finder_terms table is truncated anyway in the update sql, also before this PR here. That's why it now can be done just with the update sql script.

For PostgreSQL databases, it adds truncation of tables #__finder_taxonomy_map, #__finder_tokens and #__finder_tokens_aggregate in the same way as it adds it for MySQL databases.

Testing Instructions

All tests related to collations have to be done using a MySQL (or MariaDB) database.

For PostgreSQL nothing has to be done for new installation test. For the update test, just check as described if everything works as well as before.

There are 2 tests to be done:

New installation
Update

Instructions for new installation test

On a current 4.0dev branch, apply the patch of this PR and make a new installation. If you are not woorking on a git clone you can use following installation package, which is equal to last nightly build plus the patch of this PR applied: https://test5.richard-fath.de/Joomla_4.0.0-beta1-dev-Development-Full_Package_2020-03-23_pr-28425.zip.
When the update has finished, log in to backend and check that everything works.
Check in your database using e.g. PhpMyAdmin that all database tables have collation utf8mb4_unicode_ci except of table #__finder_terms_common, which has collation utf8mb4_bin.

Instructions for update test

Install a fresh copy of 3.10 nightly or a git clone of the current 3.10-dev branch and install some sample data, or have a 3.9 installation with much content and update that one to 3.10.
Enable the Smart Search plugin, if not enabled yet, enable statistics in Smart Search options and use smart search in frontend. Make sure you have indexed content and statistics.
Go to the Joomla Update Component component options, set the update channel to "Custom ULR", set the URL to https://test5.richard-fath.de/next_major_list_pr-28425.xml and the status to "Development".
Update to Joomla 4.0.
When the update has finished, log in to backend and check that everything works. There is an error message shown about not existing template style, this is known and happens also without this PR. Beside this there should not be any error, and no errors or warnings or notices should be shown in PHP error log.
Check in your database using e.g. PhpMyAdmin that all database tables have collation utf8mb4_unicode_ci except of table #__finder_terms_common, which has collation utf8mb4_bin.

Expected result

All database tables of the Joomla CMS core have collation utf8mb4_unicode_ci except of table #__finder_terms_common, which has collation utf8mb4_bin.

Actual result

All database tables of the Joomla CMS core have collation utf8mb4_unicode_ci except of tables with names starting with #__finder. Those have either collation utf8mb4_general_ci or utf8mb4_bin.

Documentation Changes Required

None.

e8f5c3e 8 Jul 2019

Update 4.0.0-2018-07-29.sql

eb8858a 15 Mar 2020

Merge remote-tracking branch 'upstream/4.0-dev' into 4.0-dev-com-finder-unicode-collations

3a3f24e 15 Mar 2020

Add missing conversions to update sql script

3b26c0c 15 Mar 2020

Update joomla.sql

4beb037 15 Mar 2020

Update 4.0.0-2018-07-29.sql

23b939c 16 Mar 2020

Change back language column change statement

07b0f72 21 Mar 2020

Merge remote-tracking branch 'upstream/4.0-dev' into 4.0-dev-com-finder-unicode-collations

richard67 - open - 21 Mar 2020

richard67 - change - 21 Mar 2020

Status

New

⇒

Pending

joomla-cms-bot - change - 21 Mar 2020

Category

⇒

SQL Administration com_admin Installation

wilsonge - comment - 21 Mar 2020

Adding beta blocker to this. It's actually pretty important and this needs to be in before beta if it's going.

richard67 - comment - 21 Mar 2020

@wilsonge Question is if it will work like it is now, or if we will run in some timout on large amounts of data when converting character set of some tables when updating. But most of the tables are deleted and recreated anyway with the update sql script.

@Hackwar What do you think? Could you have a look on the diff in changed files here and report back if you see any problems?

So or so it should be tested with an update package which contains the changes from this PR on a J 3.10 with some data for smart search, indexed stuff, search terms and so on. ~~I'll provide such an update package as soon as I am ready for test here and will undraft.~~ Update package done, PR undraftet.

richard67 - change - 21 Mar 2020

The description was changed

richard67 - edited - 21 Mar 2020

richard67 - change - 21 Mar 2020

The description was changed

richard67 - edited - 21 Mar 2020

richard67 - change - 21 Mar 2020

The description was changed

richard67 - edited - 21 Mar 2020

richard67 - change - 21 Mar 2020

The description was changed

richard67 - edited - 21 Mar 2020

richard67 - change - 21 Mar 2020

The description was changed

richard67 - edited - 21 Mar 2020

richard67 - change - 21 Mar 2020

Title

…

[4.0] ~~[WiP]~~ Change collations of com_finder tables from general to unicode on MySQL databases.

[4.0] Change collations of com_finder tables from general to unicode on MySQL databases.

richard67 - edited - 21 Mar 2020

richard67 - change - 21 Mar 2020

The description was changed

richard67 - edited - 21 Mar 2020

richard67 - change - 22 Mar 2020

The description was changed

richard67 - edited - 22 Mar 2020

richard67 - comment - 22 Mar 2020

@wilsonge @Hackwar Regarding the duplicate records after conversion, which were a problem in the old PR #16617 and issue #9361 , I think we are safe here. Those were a problem in the finder terms table, which is cleared anyway in the update sql script.

The only question which remains is if the conversion of the remaining tables which are not recreated or cleared or new may lead to timeout errors. These tables are #__finder_taxonomy_map, #__finder_tokens, #__finder_tokens_aggregate and #__finder_types.

wilsonge - comment - 22 Mar 2020

#__finder_types should be safe. the rest potentially are problematic as they are finder data. @Hackwar or @chrisdavenport would know more though

c40c342 22 Mar 2020

Merge remote-tracking branch 'upstream/4.0-dev' into 4.0-dev-com-finder-unicode-collations

richard67 - change - 22 Mar 2020

Labels

Added: ? ?

richard67 - change - 22 Mar 2020

The description was changed

richard67 - edited - 22 Mar 2020

a6b2b2c 23 Mar 2020

Merge remote-tracking branch 'upstream/4.0-dev' into 4.0-dev-com-finder-unicode-collations

richard67 - change - 23 Mar 2020

The description was changed

richard67 - edited - 23 Mar 2020

alikon - comment - 23 Mar 2020

maybe we can circumvent the problem truncating the data tables and with a postinstall message inform to reindex

wilsonge - comment - 23 Mar 2020

maybe we can circumvent the problem truncating the data tables and with a postinstall message inform to reindex

That was originally the idea anyhow. But I'm not sure how the changes made by @Hackwar in finder more generally affect that

richard67 - comment - 23 Mar 2020

@alikon @wilsonge Since the changes by @Hackwar most of the tables are already either truncated or dropped and created again or they are new, so for those no problem. Only those I've listed above remain to be checked (minus the #__finder_types which seems to be no problem): #__finder_taxonomy_map, #__finder_tokens, #__finder_tokens_aggregate. @Hackwar @chrisdavenport Couldn't we just truncate those, too, if all the rest is already truncated?

Hackwar - comment - 23 Mar 2020

For me this is fine. Those tables are all truncated as well or are being truncated in another place, so don't worry about those.

richard67 - comment - 23 Mar 2020

@Hackwar Thanks for feedback. Am happy to read that.

richard67 - comment - 23 Mar 2020

I have updated the patched full install package for the installation test and the update package behind the custom update URL for the update test, so people can test now without having to care for the other, meanwhile solved issues with updating.

wilsonge - comment - 23 Mar 2020

@richard67 so truncate these 3 please then this is ready for testing #__finder_taxonomy_map, #__finder_tokens, #__finder_tokens_aggregate

richard67 - comment - 23 Mar 2020

@wilsonge As far as I understood Hannes they are truncated at other places, but to be on the safer side I'll add that to the PR here. Stay tuned.

Hackwar - comment - 23 Mar 2020

the tokens and tokens_aggregate tables are memory tables anyway. Those are just helpers, which are cleared upon every indexing run. No need to truncate those.

wilsonge - comment - 23 Mar 2020

If there's data in them potentially they can cause SQL errors on changing collation?

richard67 - comment - 23 Mar 2020

@Hackwar But it would do no harm if we truncated them in the update sql before converting character set? Just to be on the safer side.

Hackwar - comment - 23 Mar 2020

sure, truncate them if you want.

1764f30 23 Mar 2020

Truncate some more tables

746c75b 23 Mar 2020

Merge remote-tracking branch 'upstream/4.0-dev' into 4.0-dev-com-finder-unicode-collations

joomla-cms-bot - change - 23 Mar 2020

Add a Comment

Older
Newer

Joomla! Issue Tracker - CMS

[#28425] - [4.0] Change collations of com_finder tables from general to unicode on MySQL databases.

Summary of Changes

Testing Instructions

Instructions for new installation test

Instructions for update test

Expected result

Actual result

Documentation Changes Required

Add a Comment

Status	Pending	⇒	Fixed in Code Base
Closed_Date	0000-00-00 00:00:00	⇒	2020-03-23 21:49:55
Closed_By		⇒	wilsonge