Joomla! Issue Tracker | Joomla! CMS #37156 - [4.2] Make update SQL scripts resilient to multiple executions so broken updates can be fixed

? ?

Fixed in Code Base
24 Mar 2022
Medium
Build: 4.1-dev
# 37156
Diff
richard67:4.1-dev-resilient-update-sql-scripts-2022-02-27

Pending

User tests: Successful: Unsuccessful:

richard67
27 Feb 2022

Pull Request for Issue #37141 and partly #35664 .

Requires #36506 .

Both PRs #36506 and this here together will solve issue #35664 completely.

Summary of Changes

This pull request (PR) changes the update SQL scripts so they don't fail with SQL errors when being executed a 2nd (or 3rd or 4th or nth) time.

This makes it possible to fix an update (especially from 3.10) which failed in the middle of the SQL updates by fixing the root cause and then updating again using "Upload & Update" as well as doing an update from 3.10 where the database still contains remainders from a failed previous attempt to update to 4 because the backup of the 3.10 from before that update attempt was not restored into an empty database.

In general the following 3 kinds of changes are made:

Use IF EXISTS and IF NOT EXISTS for structure changes where possible (supported by that database type, i.e. both MySQL and MariaDB in case of the MySQL files).
Mark structure changes with the /** CAN FAIL **/ installer hint introduced with PR ##36506 where 1. is not possible but would be needed, e.g. when a column is added to a table and that column already exists, because only MariaDB supports "IF NOT EXISTS" in this case, but MySQL and PostgreSQL don't.
Use INSERT IGNORE on MySQL and ON CONFLICT DO NOTHING on PostgreSQL when inserting new records where we would get duplicate primary or unique keys when running the particular statement a 2nd time.
This is sufficient because we do not need to update existing records with changes, we only do not need to insert them twice.

In addition, the statement to drop the system_data column of the #__extensions table has been moved up in files 4.0.0-2018-03-05.sql, and insert statement using that column in these files have been changed so that they don't fail when that column had already had been removed with a previous execution of the same update SQL script. This had to be done because that column did not have a default value and so needed to be in the insert statement before.

Finally, one change in file 4.0.0-2018-08-29.sql was necessary for the MySQL file only. In this kind of database, a table column had the wrong data type (varchar instead of datetime) on installations with a long update history, and so it needed a special update statement for the data before converting the column to the right type. This statement fails with wrong data type when it runs again and that column already has been converted. This PR here adds the necessary type cast for that statement and a check for NULL values (which would be valid on the converted table and so should not be touched).

Use cases

Scenario 1: Fix a core update which broke somewhere in the middle of the SQL updates

Currently, when a core update fails in the middle of the SQL updates for whatever reason, there are 2 possible things of which either one can happen:

The update failed because the database server went down or the connection to the database broke. In this case, the schema version in the database will be the one from before the update, so that when you try to redo the update, the core updater will run all the SQL update scripts again which have been already executed with the previous update until it broke.
The SQL updates failed e.g. due to bad data or data structures. In this case the core updater sets the schema version in the database in the same way as if all the SQL updates had succeeded, which is the version of the latest update SQL script present on the file system for that kind of database, e.g. currently "4.1.1-2022-02-20".

In both cases you can try to fix that with the database fixer, but this will fix structure only and not data, so you might be missing all data changes, which especially at updates from 3.10 to 4 leaves broken functionality.

@nikosdion 's PR #36506 fixes this issue so that the schema version will be set to the version of the last successfully executed update SQL script, so a further update attempt will not run all the scripts again and just pick up with the one which failed.

This means after you have fixed the root cause of a failed update, you can use the "Upload & Update" method to update again to the same target version to complete your update.

Depending on at which place in which update SQL script the previous update has failed, the SQL statements in this script up to this point will be executed again with the next update attempt. Some of these statements will cause SQL errors when they run a second time.

This PR here fixes the update SQL scripts so there will not be any SQL error.

PR #36506 and this one here together will even enable us to implement something to detect and finalize an unfinished core update e.g. with a new button in the database checker and/or the Joomla Update Component default view.

But there still might be duplicate data when some insert statement runs a 2nd time and there would not be an SQL error because no primary or unique key is violated, e.g. duplicate records in the extensions or the modules tables.

This is not handled by this PR here but might be fixed with a future PR. But it could have disadvantages, see section item "RFC 2" in section "RFC: What is worth a discussion - feedback is welcome" further below.

Scenario 2: Do a core update on a site where a previous update failed and a backup was not restored right

This happened often when people updated from 3.10 to 4 and it broke e.g. due to incompatible 3rd party system plugins so they had restored the backup from before the update, fixed that problem, tried to update again and that failed with primary key violation.

The reason for that was that they had not restored the database backup into an empty database so it still contained J4-specific tables.

This is described in detail here https://www.akeeba.com/news/1748-joomla-3-10-and-4-0-common-issues.html?fbclid=IwAR22RBK3YcTaRme3x_VNkdlBcYTOp4I0RoIn_Fqo70OQftTa6-JwEt5XiAM when you scroll down to section "I cannot restore a Joomla 3 backup on top of Joomla 4; the site is broken.".

The primary key violation is just the first thing which happens. If they would fix that by clearing the table, other problems would appear e.g. with structure changes having already been applied with the last update attempt.

I was for a long time very reluctant to fix that because from my point of view it should be clear that restoring a database backup should include to remove tables not belonging to that backup (or better use an empty database). I've done all I can to spread that knowledge on social media and in forums and here in corresponding issues.

But there are still so many issues reported everywhere which fit to this scenario that I finally decided to solve that with this PR here.

My PR plus PR #36506 together solve this scenario completely because in this scenario the tables from 3.10 are restored with the backup and so there is no issue with duplicates produced by insert statements e.g. into the extensions table where we do not have a primary key or a unique key violation like it would be in the other scenario 1 when we repeat these statements without having restored a backup before.

RFC: What is worth a discussion - feedback is welcome

RFC 1: Use `INSERT IGNORE`/`ON CONFLICT DO NOTHING` for all insert statements

Currently this PR uses the INSERT IGNORE on MySQL/MariaDB and ON CONFLICT DO NOTHING on PostgreSQL only where it is necessary because otherwise a primary key or a unique key would be violated.

When core contributors need to create new update SQL scripts in future or 3rd party extension developers want to do their update SQL scripts in the same way, this way would require some knowledge on where it is necessary and where not, so it makes update SQL scripts harder to maintain.

It would require less knowledge if we would do that change for every insert statement (and recommend 3rd party devs to do the same with their scripts) because it's easier to remember, and it would be more visible when people look up in some existing core update SQL script how inserts should be done.

The INSERT IGNORE or ON CONFLICT DO NOTHING shouldn't do any harm if being used with an insert statement on a table which doesn't have unique key or when not using the primary key in the insert (I have to check this to be sure).

It would also be good to always do that in case if a new unique index is added to a table e.g. in 4.1. If the update scripts e.g. in 4.2 would already be prepared like that, it would not need any change on them if they insert data in such a table and would run into the problem handled by this PR here.

So it's worth a thought to do it for every insert.

Let me know your opinion please.

If there is wide acceptance of doing it with all insert statements, I would prefer not do add that to this PR here but to make a follow-up PR.

RFC 2: Handle other insert statements to not produce duplicate data

Let's say in the use case scenario 1 described above, the SQL updates break e.g here:

https://github.com/joomla/joomla-cms/blob/4.1-dev/administrator/components/com_admin/sql/updates/mysql/4.1.0-2021-11-20.sql#L53

So when we fix the root cause of that problem and update again or finalize the update with a new, future functionality for that, the previous insert statements into the extensions table will run again and produce duplicate records because we don't have a unique key on that table. For the extensions table the index "extension" is just an index on the column combination "type, element, folder and client_id", but it should be a unique key. This is just an example, we are missing unique keys at many other places.

The clean way to fix that would be to provide unique keys and to use INSERT IGNORE/ON CONFLICT DO NOTHING for inserts.

But we know from past that when introducing unique keys, we could not be sure on installations with a long update history that the data really was unique. I only say "username column of the users table", and those with a good memory will remember. In that case we had helped us with an FAQ article how to manually fix that because we cannot know which of the duplicates is the right one when updating.

So we might not want to do the clean way.

Another way would be to modify these insert statement to not produce duplicates with tricky select statements in the insert, but that could produce complicated and so hard to be maintained SQL statements.

A said, it only concerns SQL inserts in the same file before the failed one, and so depending on where the SQL error happened at the first update attempt you might be lucky and there are no critical insert statements.

And in most cases duplicates don't harm.

But in other cases they might cause problems which you do not notice soon after the update but later when using some not often used backend functionality.

So I'm not 100% sure now about what's the best way to deal with that or if we shall deal with it.

Personally right now I tend to prefer the clean way.

Opinions are welcome.

Testing Instructions

Important hints before you start

This PR cannot be tested using the pre-built packages from Drone. It needs an update package which includes the changes from PRs #36506 in addition to the changes from this PR here, as long as this other PR hasn't been merged. Such a package can be downloaded from here: https://test5.richard-fath.de/Joomla_4.1.1-dev-Development-Update_Package_pr-37156_2022-02-27.zip .

This PR should be tested with both database types MySQL (or MariaDB) and PostgreSQL. Testers please report back what kind of database you have used. If you have different ones, test with all you can.

The test scenario

Testing the real use case scenario 1 for all possible cases would mean to cause an SQL error at some place in an update SQL script, fix that error and update again to finalize the update, and test this for all SQL scripts at all possible places.

Because this "for all SQL scripts at all possible places" would be a bit time consuming, we do here a hardcore test with ALL SQL statements of ALL J4 update SQL scripts being executed on a J4 database.

This has the same effect as executing all the SQL updates a second time times on a site updated from 3.10 to 4.

This hardcore scenario covers also the real use case scenario 2 regarding the J4-specific tables and structure changes which are the problem in that scenario.

On a clean, current 4.1-dev or latest 4.1 nightly build, apply the changes from this PR here and from PR #36506, and in case if using PostgreSQL also from PR #37154 if using the nightly build.
Make a new installation using an empty database.
Set error reporting to maximum and switch on "Debug System" in global configuration.
In phpMyAdmin (or if PostgreSQl in phpPgAdmin), modify the schema version of the CMS core as follows, replacing #__ by your database prefix and on PostgreSQL changing the names quoting:

UPDATE `#__schemas`
   SET `version_id` = '3.10.7-2022-02-20'
 WHERE `extension_id` = (SELECT `extension_id` FROM `#__extensions` WHERE `type` = 'file' AND `element` = 'joomla' AND `folder` = '' AND `client_id`= 0);

Download the following patched update package which includes the changes from this PR here and from PRs #36506 and #37154 from here: https://test5.richard-fath.de/Joomla_4.1.1-dev-Development-Update_Package_pr-37156_2022-02-27.zip .
Use the "Upload & Update" method of the Joomla Update component to update to the package downloaded in the previous step.
Watch that no SQL errors are shown at the end of the update.
After the update has finished, check in the update log "administrator/logs/joomla_update.php" that all SQL statements from all J4 update SQL scripts (currently 511 statements for MySQL/MariaDB and 615 for PostgreSQL) have been executed and the update has finished with success, which should show following messages (with ...being the lots of SQL statements between the first and the last one):

Starting installation of new version.
Finalising installation.
Start of SQL updates.
The current database version (schema) is .
Ran query from file 4.0.0-2018-03-05. Query text: ALTER TABLE `#__extensions` DROP COLUMN `system_data` ;.
...
Ran query from file 4.1.1-2022-02-20. Query text: DELETE FROM `#__postinstall_messages` WHERE `title_key` = 'COM_ADMIN_POSTINSTALL.
End of SQL updates.
Deleting removed files and folders.
Cleaning up after installation.
Update to version 4.1.1-dev is complete.

Check in "System - Manage - Database" that there are no database errors shown for the CMS core.

Actual result BEFORE applying this Pull Request

In both use case scenarios 1 and 2, a 2nd attempt to update after the root cause is fixed will result in SQL errors either due to structure changes being applied already or due to primary or unique keys being violated.

Expected result AFTER applying this Pull Request

No SQL error when running all SQL statements from all update SQL scripts again on a J4 database.

I.e. in both use case scenarios 1 and 2, a 2nd attempt to update after the root cause is fixed will be successful.

Documentation Changes Required

None required, but if there is some documentation about how update SQL scripts should be written (I don't remember any), information should be added about when to use the changes made here for the core update SQL scripts. The same should also be added to documentation for extension developers about how their update SQL scripts should look like, if there is any.

4e98330 27 Feb 2022

Make update SQL scripts resilient to multiple executions

richard67 - open - 27 Feb 2022

richard67 - change - 27 Feb 2022

Status

New

⇒

Pending

joomla-cms-bot - change - 27 Feb 2022

Category

⇒

SQL Administration com_admin Postgresql

richard67 - change - 27 Feb 2022

The description was changed

richard67 - edited - 27 Feb 2022

brianteeman - comment - 27 Feb 2022

Just a thought as I'm reading the changes. As you have had to edit all the sql is there any benefit to merge the files?

richard67 - comment - 27 Feb 2022

Just a thought as I'm reading the changes. As you have had to edit all the sql is there any benefit to merge the files?

@brianteeman Benefit (less files) would be less than additional problems (obsolete files to be deleted and trouble with them being executed if they were not deleted).

richard67 - comment - 27 Feb 2022

Stay tuned, it may take some time until I've finished the WiP.

brianteeman - comment - 27 Feb 2022

ok - was just a thought

be9e014 27 Feb 2022

Add comments to changes

richard67 - change - 27 Feb 2022

Labels

Added: ?

2b7b05d 27 Feb 2022

Fix wrong comment

richard67 - change - 27 Feb 2022

The description was changed

richard67 - edited - 27 Feb 2022

richard67 - change - 27 Feb 2022

The description was changed

richard67 - edited - 27 Feb 2022

richard67 - change - 27 Feb 2022

The description was changed

richard67 - edited - 27 Feb 2022

richard67 - change - 27 Feb 2022

The description was changed

richard67 - edited - 27 Feb 2022

richard67 - change - 27 Feb 2022

The description was changed

richard67 - edited - 27 Feb 2022

9437522 27 Feb 2022

Remove period after links to pull requests in comments

richard67 - comment - 27 Feb 2022

I will continue tomorrow with completing the testing instructions and description. In the mean time you could test PR #36506 .

87c7b43 28 Feb 2022

Merge branch '4.1-dev' into 4.1-dev-resilient-update-sql-scripts-2022-02-27

richard67 - change - 28 Feb 2022

The description was changed