? ? Pending

User tests: Successful: Unsuccessful:

avatar richard67
richard67
21 Mar 2020

Pull Request for Issue #9361 .

Redo of Pull Request (PR) #16617 but for 4.0 and in a different way.

Summary of Changes

This Pull Request (PR) changes collations of database tables for com_finder from general to unicode collations for MySQL databases.

We had left them with general collations when we did the utf8mb4 conversion because otherwise things broke when tables from 3rd party extensions where joined to com_finder tables.

Now with J4 com_finder database tables have been restructured anyway, so most of the tables are either dropped and created again in the update sql script changed with this PR, or they are at least truncated, so it is a good chance to change the table collations now without having to do what was before tried with PR #16617 for that purpose, which was a kind of 2nd stage of the utf8mb4 conversion procedure. Especially the #__finder_terms table required a special conversion in PR #16617 because of possible duplicate records after conversion, but now in J4 this is not a problem anymore, because the #__finder_terms table is truncated anyway in the update sql, also before this PR here. That's why it now can be done just with the update sql script.

For PostgreSQL databases, it adds truncation of tables #__finder_taxonomy_map, #__finder_tokens and #__finder_tokens_aggregate in the same way as it adds it for MySQL databases.

Testing Instructions

All tests related to collations have to be done using a MySQL (or MariaDB) database.

For PostgreSQL nothing has to be done for new installation test. For the update test, just check as described if everything works as well as before.

There are 2 tests to be done:

  1. New installation
  2. Update

Instructions for new installation test

  1. On a current 4.0dev branch, apply the patch of this PR and make a new installation. If you are not woorking on a git clone you can use following installation package, which is equal to last nightly build plus the patch of this PR applied: https://test5.richard-fath.de/Joomla_4.0.0-beta1-dev-Development-Full_Package_2020-03-23_pr-28425.zip.
  2. When the update has finished, log in to backend and check that everything works.
  3. Check in your database using e.g. PhpMyAdmin that all database tables have collation utf8mb4_unicode_ci except of table #__finder_terms_common, which has collation utf8mb4_bin.

Instructions for update test

  1. Install a fresh copy of 3.10 nightly or a git clone of the current 3.10-dev branch and install some sample data, or have a 3.9 installation with much content and update that one to 3.10.
  2. Enable the Smart Search plugin, if not enabled yet, enable statistics in Smart Search options and use smart search in frontend. Make sure you have indexed content and statistics.
  3. Go to the Joomla Update Component component options, set the update channel to "Custom ULR", set the URL to https://test5.richard-fath.de/next_major_list_pr-28425.xml and the status to "Development".
  4. Update to Joomla 4.0.
  5. When the update has finished, log in to backend and check that everything works. There is an error message shown about not existing template style, this is known and happens also without this PR. Beside this there should not be any error, and no errors or warnings or notices should be shown in PHP error log.
  6. Check in your database using e.g. PhpMyAdmin that all database tables have collation utf8mb4_unicode_ci except of table #__finder_terms_common, which has collation utf8mb4_bin.

Expected result

All database tables of the Joomla CMS core have collation utf8mb4_unicode_ci except of table #__finder_terms_common, which has collation utf8mb4_bin.

Actual result

All database tables of the Joomla CMS core have collation utf8mb4_unicode_ci except of tables with names starting with #__finder. Those have either collation utf8mb4_general_ci or utf8mb4_bin.

Documentation Changes Required

None.

avatar richard67 richard67 - open - 21 Mar 2020
avatar richard67 richard67 - change - 21 Mar 2020
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 21 Mar 2020
Category SQL Administration com_admin Installation
avatar wilsonge
wilsonge - comment - 21 Mar 2020

Adding beta blocker to this. It's actually pretty important and this needs to be in before beta if it's going.

avatar richard67
richard67 - comment - 21 Mar 2020

@wilsonge Question is if it will work like it is now, or if we will run in some timout on large amounts of data when converting character set of some tables when updating. But most of the tables are deleted and recreated anyway with the update sql script.

@Hackwar What do you think? Could you have a look on the diff in changed files here and report back if you see any problems?

So or so it should be tested with an update package which contains the changes from this PR on a J 3.10 with some data for smart search, indexed stuff, search terms and so on. I'll provide such an update package as soon as I am ready for test here and will undraft. Update package done, PR undraftet.

avatar richard67 richard67 - change - 21 Mar 2020
The description was changed
avatar richard67 richard67 - edited - 21 Mar 2020
avatar richard67 richard67 - change - 21 Mar 2020
The description was changed
avatar richard67 richard67 - edited - 21 Mar 2020
avatar richard67 richard67 - change - 21 Mar 2020
The description was changed
avatar richard67 richard67 - edited - 21 Mar 2020
avatar richard67 richard67 - change - 21 Mar 2020
The description was changed
avatar richard67 richard67 - edited - 21 Mar 2020
avatar richard67 richard67 - change - 21 Mar 2020
The description was changed
avatar richard67 richard67 - edited - 21 Mar 2020
avatar richard67 richard67 - change - 21 Mar 2020
Title
[4.0] [WiP] Change collations of com_finder tables from general to unicode on MySQL databases.
[4.0] Change collations of com_finder tables from general to unicode on MySQL databases.
avatar richard67 richard67 - edited - 21 Mar 2020
avatar richard67 richard67 - change - 21 Mar 2020
The description was changed
avatar richard67 richard67 - edited - 21 Mar 2020
avatar richard67 richard67 - change - 22 Mar 2020
The description was changed
avatar richard67 richard67 - edited - 22 Mar 2020
avatar richard67
richard67 - comment - 22 Mar 2020

@wilsonge @Hackwar Regarding the duplicate records after conversion, which were a problem in the old PR #16617 and issue #9361 , I think we are safe here. Those were a problem in the finder terms table, which is cleared anyway in the update sql script.

The only question which remains is if the conversion of the remaining tables which are not recreated or cleared or new may lead to timeout errors. These tables are #__finder_taxonomy_map, #__finder_tokens, #__finder_tokens_aggregate and #__finder_types.

avatar wilsonge
wilsonge - comment - 22 Mar 2020

#__finder_types should be safe. the rest potentially are problematic as they are finder data. @Hackwar or @chrisdavenport would know more though

avatar richard67 richard67 - change - 22 Mar 2020
Labels Added: ? ?
avatar richard67 richard67 - change - 22 Mar 2020
The description was changed
avatar richard67 richard67 - edited - 22 Mar 2020
avatar richard67 richard67 - change - 23 Mar 2020
The description was changed
avatar richard67 richard67 - edited - 23 Mar 2020
avatar alikon
alikon - comment - 23 Mar 2020

maybe we can circumvent the problem truncating the data tables and with a postinstall message inform to reindex

avatar wilsonge
wilsonge - comment - 23 Mar 2020

maybe we can circumvent the problem truncating the data tables and with a postinstall message inform to reindex

That was originally the idea anyhow. But I'm not sure how the changes made by @Hackwar in finder more generally affect that

avatar richard67
richard67 - comment - 23 Mar 2020

@alikon @wilsonge Since the changes by @Hackwar most of the tables are already either truncated or dropped and created again or they are new, so for those no problem. Only those I've listed above remain to be checked (minus the #__finder_types which seems to be no problem): #__finder_taxonomy_map, #__finder_tokens, #__finder_tokens_aggregate. @Hackwar @chrisdavenport Couldn't we just truncate those, too, if all the rest is already truncated?

avatar Hackwar
Hackwar - comment - 23 Mar 2020

For me this is fine. Those tables are all truncated as well or are being truncated in another place, so don't worry about those.

avatar richard67
richard67 - comment - 23 Mar 2020

@Hackwar Thanks for feedback. Am happy to read that.

avatar richard67
richard67 - comment - 23 Mar 2020

I have updated the patched full install package for the installation test and the update package behind the custom update URL for the update test, so people can test now without having to care for the other, meanwhile solved issues with updating.

avatar wilsonge
wilsonge - comment - 23 Mar 2020

@richard67 so truncate these 3 please then this is ready for testing #__finder_taxonomy_map, #__finder_tokens, #__finder_tokens_aggregate

avatar richard67
richard67 - comment - 23 Mar 2020

@wilsonge As far as I understood Hannes they are truncated at other places, but to be on the safer side I'll add that to the PR here. Stay tuned.

avatar Hackwar
Hackwar - comment - 23 Mar 2020

the tokens and tokens_aggregate tables are memory tables anyway. Those are just helpers, which are cleared upon every indexing run. No need to truncate those.

avatar wilsonge
wilsonge - comment - 23 Mar 2020

If there's data in them potentially they can cause SQL errors on changing collation?

avatar richard67
richard67 - comment - 23 Mar 2020

@Hackwar But it would do no harm if we truncated them in the update sql before converting character set? Just to be on the safer side.

avatar Hackwar
Hackwar - comment - 23 Mar 2020

sure, truncate them if you want.

avatar joomla-cms-bot joomla-cms-bot - change - 23 Mar 2020
Category SQL Administration com_admin Installation SQL Administration com_admin Postgresql Installation
avatar richard67
richard67 - comment - 23 Mar 2020

Test packages updated.

avatar richard67 richard67 - change - 23 Mar 2020
The description was changed
avatar richard67 richard67 - edited - 23 Mar 2020
avatar richard67
richard67 - comment - 23 Mar 2020

Testing instructions updated to reflect the additional table truncation for PostgreSQL.

Ready for test now.

avatar richard67
richard67 - comment - 23 Mar 2020

Can it be that smart search is broken on 3.10-dev with PostgreSQL (PDO)?

@alikon Can you confirm that?

avatar alikon
alikon - comment - 23 Mar 2020

added to my to-do list

avatar jwaisner jwaisner - change - 23 Mar 2020
Priority Medium Urgent
avatar wilsonge
wilsonge - comment - 23 Mar 2020

I have tested this item successfully on 746c75b

Works for me. Data all seems to be truncated and a reindex works successfully


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/28425.

avatar wilsonge wilsonge - test_item - 23 Mar 2020 - Tested successfully
avatar wilsonge wilsonge - change - 23 Mar 2020
Status Pending Fixed in Code Base
Closed_Date 0000-00-00 00:00:00 2020-03-23 21:49:55
Closed_By wilsonge
avatar wilsonge wilsonge - close - 23 Mar 2020
avatar wilsonge wilsonge - merge - 23 Mar 2020
avatar wilsonge
wilsonge - comment - 23 Mar 2020

Thanks!

avatar richard67
richard67 - comment - 23 Mar 2020

Thanks.

avatar richard67
richard67 - comment - 25 Mar 2020

@wilsonge While checking another PR, I noticed that in this PR here I have forgotten file administrator/components/com_finder/sql/install.mysql.sql. Is that file really still needed? If so, I have to make a PR for that with he changes from here.

avatar wilsonge
wilsonge - comment - 25 Mar 2020

It’s not really used but I think let’s keep it updated for now

avatar richard67
richard67 - comment - 25 Mar 2020

Will make PR tonight then

avatar richard67
richard67 - comment - 25 Mar 2020

Ah, and I meant the mysql file of course, not the postgresql ;-)

avatar richard67
richard67 - comment - 25 Mar 2020

@wilsonge PR for the install.XXX.sql files is #28455 . Can be tested or even merged by review.

Add a Comment

Login with GitHub to post a comment