User tests: Successful: Unsuccessful:
Pull Request for Issue #9361 .
Redo of Pull Request (PR) #16617 but for 4.0 and in a different way.
This Pull Request (PR) changes collations of database tables for com_finder from general to unicode collations for MySQL databases.
We had left them with general collations when we did the utf8mb4 conversion because otherwise things broke when tables from 3rd party extensions where joined to com_finder tables.
Now with J4 com_finder database tables have been restructured anyway, so most of the tables are either dropped and created again in the update sql script changed with this PR, or they are at least truncated, so it is a good chance to change the table collations now without having to do what was before tried with PR #16617 for that purpose, which was a kind of 2nd stage of the utf8mb4 conversion procedure. Especially the #__finder_terms
table required a special conversion in PR #16617 because of possible duplicate records after conversion, but now in J4 this is not a problem anymore, because the #__finder_terms
table is truncated anyway in the update sql, also before this PR here. That's why it now can be done just with the update sql script.
For PostgreSQL databases, it adds truncation of tables #__finder_taxonomy_map
, #__finder_tokens
and #__finder_tokens_aggregate
in the same way as it adds it for MySQL databases.
All tests related to collations have to be done using a MySQL (or MariaDB) database.
For PostgreSQL nothing has to be done for new installation test. For the update test, just check as described if everything works as well as before.
There are 2 tests to be done:
utf8mb4_unicode_ci
except of table #__finder_terms_common
, which has collation utf8mb4_bin
.utf8mb4_unicode_ci
except of table #__finder_terms_common
, which has collation utf8mb4_bin
.All database tables of the Joomla CMS core have collation utf8mb4_unicode_ci
except of table #__finder_terms_common
, which has collation utf8mb4_bin
.
All database tables of the Joomla CMS core have collation utf8mb4_unicode_ci
except of tables with names starting with #__finder
. Those have either collation utf8mb4_general_ci
or utf8mb4_bin
.
None.
Status | New | ⇒ | Pending |
Category | ⇒ | SQL Administration com_admin Installation |
@wilsonge Question is if it will work like it is now, or if we will run in some timout on large amounts of data when converting character set of some tables when updating. But most of the tables are deleted and recreated anyway with the update sql script.
@Hackwar What do you think? Could you have a look on the diff in changed files here and report back if you see any problems?
So or so it should be tested with an update package which contains the changes from this PR on a J 3.10 with some data for smart search, indexed stuff, search terms and so on. I'll provide such an update package as soon as I am ready for test here and will undraft. Update package done, PR undraftet.
Title |
|
@wilsonge @Hackwar Regarding the duplicate records after conversion, which were a problem in the old PR #16617 and issue #9361 , I think we are safe here. Those were a problem in the finder terms table, which is cleared anyway in the update sql script.
The only question which remains is if the conversion of the remaining tables which are not recreated or cleared or new may lead to timeout errors. These tables are #__finder_taxonomy_map
, #__finder_tokens
, #__finder_tokens_aggregate
and #__finder_types
.
#__finder_types
should be safe. the rest potentially are problematic as they are finder data. @Hackwar or @chrisdavenport would know more though
Labels |
Added:
?
?
|
maybe we can circumvent the problem truncating the data tables and with a postinstall message inform to reindex
@alikon @wilsonge Since the changes by @Hackwar most of the tables are already either truncated or dropped and created again or they are new, so for those no problem. Only those I've listed above remain to be checked (minus the #__finder_types
which seems to be no problem): #__finder_taxonomy_map
, #__finder_tokens
, #__finder_tokens_aggregate
. @Hackwar @chrisdavenport Couldn't we just truncate those, too, if all the rest is already truncated?
For me this is fine. Those tables are all truncated as well or are being truncated in another place, so don't worry about those.
I have updated the patched full install package for the installation test and the update package behind the custom update URL for the update test, so people can test now without having to care for the other, meanwhile solved issues with updating.
@richard67 so truncate these 3 please then this is ready for testing #__finder_taxonomy_map, #__finder_tokens, #__finder_tokens_aggregate
the tokens and tokens_aggregate tables are memory tables anyway. Those are just helpers, which are cleared upon every indexing run. No need to truncate those.
If there's data in them potentially they can cause SQL errors on changing collation?
sure, truncate them if you want.
Category | SQL Administration com_admin Installation | ⇒ | SQL Administration com_admin Postgresql Installation |
Test packages updated.
Testing instructions updated to reflect the additional table truncation for PostgreSQL.
Ready for test now.
added to my to-do list
Priority | Medium | ⇒ | Urgent |
I have tested this item
Works for me. Data all seems to be truncated and a reindex works successfully
Status | Pending | ⇒ | Fixed in Code Base |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2020-03-23 21:49:55 |
Closed_By | ⇒ | wilsonge |
Thanks!
Thanks.
It’s not really used but I think let’s keep it updated for now
Will make PR tonight then
Ah, and I meant the mysql file of course, not the postgresql ;-)
Adding beta blocker to this. It's actually pretty important and this needs to be in before beta if it's going.