User tests: Successful: Unsuccessful:
Pull Request for Issue #43701
Based on the PR #44079 by @Fedik , slightly adjusted and ported to 5.4
In our company we are supporting a rather large Joomla 5.x client site with more than 50.000 articles that makes use of the workflows. For years, the performance in the backend was perfectly fine, however recently we experienced significant performance issues in the article list views in the frontend but most importantly in the backend.
During my debugging I realized that the main query for the backend article list could use indexes on all relevant joins but doesn't do so. Instead, it creates a temporary table which has a significant performance impact. During further diagnosis it became apparent that the MySQL query optimizer stops using the available keys if a certain number of articles are in the #__content table, which is a document behavior (see https://stackoverflow.com/questions/72807286/index-not-being-utilized-in-mysql-when-larger-dataset) and helpful most of the times. In Joomla's usecase however the performance impact is significant: By enforcing the index usage as suggested in this PR, the query time was reduced from 2.6s down to 48ms.
Example screenshot of an EXPLAIN output WITHOUR the PR:
No key is used, instead temporary table with filesort
Example screenshot WITH the PR:
Please note: this is an issue that only affects MariaDB/MySQL servers. Postgres is unaffected as it uses it's own query optimization logic.
Please select:
Status | New | ⇒ | Pending |
Category | ⇒ | Administration com_content Front End |
Labels |
Added:
PR-5.4-dev
|
I reviewed the affected query on small (< 100 articles) and medium size (1200 articles) sites, in both cases the indexes that are enforced by this PR have been in use anyways. Only on large scale size the optimizer decides to skip the index.
Well, we can do like that also, a bit hacky, but should work.
I thought there is something smarter to trick the optimizer, but did not found anything.
I thought there is something smarter to trick the optimizer, but did not found anything.
Same for me. I played around with various changes to the indexes, join types and even tested multiple mariadb version - no effect.
I have tested this item ✅ successfully on e55df52
btw, the query works faster when use filter by category, that is why the frontend does not have the same slow down effect, even though query almost identical.
@richard67 MySQL
Labels |
Added:
Performance
|
I have tried to reproduce the actual and the expected result on diverse Ubuntu VMs with different MySQL 8.0 versions, but I was not able to get the same results. Unfortunately I don't have MariaDB available for tests.
What I did:
I've made a new installation of a current 5.4-dev. The I have disabled the "Content - Smart Search" and "Behaviour - Versionable" plugins as recommended in the Readme of @Fedik 's plugin https://github.com/Fedik/plg_sampledata_bigdata , and I have enabled Workflows in the integration options of com_content.
Then I've installed an enabled the latest release of that plugin plg_sampledata_bigdata_1.0.0_328f117_20240927_1253.zip .
Then I have run it several times to that I had more than 50k articles and workflow associations on one site, and on another I had more than 30k.
In both cases the same result:
Without the FORCE INDEX
I had only one index not being used and not 2 like shown in the screenshot of the actual result.
EXPLAIN SELECT `a`.`id`,`a`.`asset_id`,`a`.`title`,`a`.`alias`,`a`.`checked_out`,`a`.`checked_out_time`,`a`.`catid`,`a`.`state`,`a`.`access`,`a`.`created`,`a`.`created_by`,`a`.`created_by_alias`,`a`.`modified`,`a`.`ordering`,`a`.`featured`,`a`.`language`,`a`.`hits`,`a`.`publish_up`,`a`.`publish_down`,`a`.`introtext`,`a`.`fulltext`,`a`.`note`,`a`.`images`,`a`.`metakey`,`a`.`metadesc`,`a`.`metadata`,`a`.`version`,`fp`.`featured_up`,`fp`.`featured_down`,`l`.`title` AS `language_title`,`l`.`image` AS `language_image`,`uc`.`name` AS `editor`,`ag`.`title` AS `access_level`,`c`.`title` AS `category_title`,`c`.`created_user_id` AS `category_uid`,`c`.`level` AS `category_level`,`c`.`published` AS `category_published`,`parent`.`title` AS `parent_category_title`,`parent`.`id` AS `parent_category_id`,`parent`.`created_user_id` AS `parent_category_uid`,`parent`.`level` AS `parent_category_level`,`ua`.`name` AS `author_name`,`wa`.`stage_id` AS `stage_id`,`ws`.`title` AS `stage_title`,`ws`.`workflow_id` AS `workflow_id`,`w`.`title` AS `workflow_title`
FROM `j3ux0_content` AS `a`
LEFT JOIN `j3ux0_languages` AS `l` ON `l`.`lang_code` = `a`.`language`
LEFT JOIN `j3ux0_content_frontpage` AS `fp` ON `fp`.`content_id` = `a`.`id`
LEFT JOIN `j3ux0_users` AS `uc` ON `uc`.`id` = `a`.`checked_out`
LEFT JOIN `j3ux0_viewlevels` AS `ag` ON `ag`.`id` = `a`.`access`
LEFT JOIN `j3ux0_categories` AS `c` ON `c`.`id` = `a`.`catid`
LEFT JOIN `j3ux0_categories` AS `parent` ON `parent`.`id` = `c`.`parent_id`
LEFT JOIN `j3ux0_users` AS `ua` ON `ua`.`id` = `a`.`created_by`
INNER JOIN `j3ux0_workflow_associations` AS `wa` ON `wa`.`item_id` = `a`.`id`
INNER JOIN `j3ux0_workflow_stages` AS `ws` ON `ws`.`id` = `wa`.`stage_id`
INNER JOIN `j3ux0_workflows` AS `w` ON `w`.`id` = `ws`.`workflow_id`
WHERE `wa`.`extension` = 'com_content.article' AND `a`.`state` IN (1,0)
ORDER BY a.id desc LIMIT 20;
As you can see, the primary key is used for the workflows table "w", in opposite to the screen shot for the actual result in the description of this PR.
With the FORCE INDEX
2 indexes were not used, and the execution time of the query did not differ much, it was in both cases something between 350 and 390 ms.
EXPLAIN SELECT `a`.`id`,`a`.`asset_id`,`a`.`title`,`a`.`alias`,`a`.`checked_out`,`a`.`checked_out_time`,`a`.`catid`,`a`.`state`,`a`.`access`,`a`.`created`,`a`.`created_by`,`a`.`created_by_alias`,`a`.`modified`,`a`.`ordering`,`a`.`featured`,`a`.`language`,`a`.`hits`,`a`.`publish_up`,`a`.`publish_down`,`a`.`introtext`,`a`.`fulltext`,`a`.`note`,`a`.`images`,`a`.`metakey`,`a`.`metadesc`,`a`.`metadata`,`a`.`version`,`fp`.`featured_up`,`fp`.`featured_down`,`l`.`title` AS `language_title`,`l`.`image` AS `language_image`,`uc`.`name` AS `editor`,`ag`.`title` AS `access_level`,`c`.`title` AS `category_title`,`c`.`created_user_id` AS `category_uid`,`c`.`level` AS `category_level`,`c`.`published` AS `category_published`,`parent`.`title` AS `parent_category_title`,`parent`.`id` AS `parent_category_id`,`parent`.`created_user_id` AS `parent_category_uid`,`parent`.`level` AS `parent_category_level`,`ua`.`name` AS `author_name`,`wa`.`stage_id` AS `stage_id`,`ws`.`title` AS `stage_title`,`ws`.`workflow_id` AS `workflow_id`,`w`.`title` AS `workflow_title`
FROM `j3ux0_content` AS `a`
LEFT JOIN `j3ux0_languages` AS `l` ON `l`.`lang_code` = `a`.`language`
LEFT JOIN `j3ux0_content_frontpage` AS `fp` ON `fp`.`content_id` = `a`.`id`
LEFT JOIN `j3ux0_users` AS `uc` FORCE INDEX(PRIMARY) ON `uc`.`id` = `a`.`checked_out`
LEFT JOIN `j3ux0_viewlevels` AS `ag` ON `ag`.`id` = `a`.`access`
LEFT JOIN `j3ux0_categories` AS `c` ON `c`.`id` = `a`.`catid`
LEFT JOIN `j3ux0_categories` AS `parent` ON `parent`.`id` = `c`.`parent_id`
LEFT JOIN `j3ux0_users` AS `ua` FORCE INDEX(PRIMARY) ON `ua`.`id` = `a`.`created_by`
INNER JOIN `j3ux0_workflow_associations` AS `wa` FORCE INDEX(PRIMARY) ON `wa`.`item_id` = `a`.`id`
INNER JOIN `j3ux0_workflow_stages` AS `ws` FORCE INDEX(PRIMARY) ON `ws`.`id` = `wa`.`stage_id`
INNER JOIN `j3ux0_workflows` AS `w` FORCE INDEX(PRIMARY) ON `w`.`id` = `ws`.`workflow_id`
WHERE `wa`.`extension` = 'com_content.article' AND `a`.`state` IN (1,0)
ORDER BY a.id desc LIMIT 20;
As you can see, forcing the PRIMARY key makes other keys disappear from the list of possible keys, and in case of the workflow associations "wa" it results in the "idx_extensions" not being used anymore.
The reason for that might be that in my scenario, all categories and all articles belong to the the same stage of the same workflow, and there are no other workflows and workflow stages.
This fits to the first answer to the stackoverflow question here https://stackoverflow.com/questions/945925/mysql-index-being-ignored :
You can also use FORCE INDEX, which acts like USE INDEX (index_list) but with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the given indexes to find rows in the table.
Since you aren't actually "finding" any rows (you are selecting them all), a table scan is always going to be fastest, and the optimizer is smart enough to know that in spite of what you are telling them.
The citation is taken from here: https://dev.mysql.com/doc/refman/8.4/en/index-hints.html
Now if you scroll a bit down on that MySQL documentation, you will find an important note:
Note
MySQL 8.4 supports the index-level optimizer hints JOIN_INDEX, GROUP_INDEX, ORDER_INDEX, and INDEX, which are equivalent to and intended to supersede FORCE INDEX index hints, as well as the NO_JOIN_INDEX, NO_GROUP_INDEX, NO_ORDER_INDEX, and NO_INDEX optimizer hints, which are equivalent to and intended to supersede IGNORE INDEX index hints. Thus, you should expect USE INDEX, FORCE INDEX, and IGNORE INDEX to be deprecated in a future release of MySQL, and at some time thereafter to be removed altogether.
These index-level optimizer hints are supported with both single-table and multi-table DELETE statements.
For more information, see Index-Level Optimizer Hints.
When the "Thus, you should expect USE INDEX, FORCE INDEX, and IGNORE INDEX to be deprecated in a future release of MySQL," will happen, we will have to change our code to be different for MySQL and MariaDB, as the latter seems not to make that change, see https://mariadb.com/kb/en/force-index/ , and optimizer hints seem to work very differently to MySQL, see https://mariadb.com/kb/en/index-hints-how-to-force-query-plans/ .
So I am not sure if this PR here is the way to go.
@SniperSister @Fedik Maybe you can provide more details about your scenario, e.g. how workflows are used? Or could you test my scenario if you get the same results as I did? Maybe something is special with my MySQL installation on Ubuntu (however, I am not aware of having changed anything on the default configuration regarding the opimizer such stuff).
I am testing on local Mysql 8.0.32 without any extra configurations.
Originally I made installation and all dummy content with joomla 5.2 and then updated to 5.4. With Workflow disabled.
My test result is:
~2.5s vs ~1.5s (for full page loading)
~1.6s vs ~1s (only this query)
with this PR and ~44k dummy articles.
After PR:
I just noticed that this PR is missing FORCE INDEX(idx_langcode)
from my original PR. With this I get few ms faster (~980ms).
But I see in your "explain" this index is already used, that is strange, may it be some server configuration difference?
Or maybe newest mysql is smarter?
And I think you have a faster setup 😉
Sure,
Without:
SELECT `a`.`id`,`a`.`asset_id`,`a`.`title`,`a`.`alias`,`a`.`checked_out`,`a`.`checked_out_time`,`a`.`catid`,`a`.`state`,`a`.`access`,`a`.`created`,`a`.`created_by`,`a`.`created_by_alias`,`a`.`modified`,`a`.`ordering`,`a`.`featured`,`a`.`language`,`a`.`hits`,`a`.`publish_up`,`a`.`publish_down`,`a`.`introtext`,`a`.`fulltext`,`a`.`note`,`a`.`images`,`a`.`metakey`,`a`.`metadesc`,`a`.`metadata`,`a`.`version`,`fp`.`featured_up`,`fp`.`featured_down`,`l`.`title` AS `language_title`,`l`.`image` AS `language_image`,`uc`.`name` AS `editor`,`ag`.`title` AS `access_level`,`c`.`title` AS `category_title`,`c`.`created_user_id` AS `category_uid`,`c`.`level` AS `category_level`,`c`.`published` AS `category_published`,`parent`.`title` AS `parent_category_title`,`parent`.`id` AS `parent_category_id`,`parent`.`created_user_id` AS `parent_category_uid`,`parent`.`level` AS `parent_category_level`,`ua`.`name` AS `author_name`,`wa`.`stage_id` AS `stage_id`,`ws`.`title` AS `stage_title`,`ws`.`workflow_id` AS `workflow_id`,`w`.`title` AS `workflow_title`
FROM `bd5_content` AS `a`
LEFT JOIN `bd5_languages` AS `l` ON `l`.`lang_code` = `a`.`language`
LEFT JOIN `bd5_content_frontpage` AS `fp` ON `fp`.`content_id` = `a`.`id`
LEFT JOIN `bd5_users` AS `uc` ON `uc`.`id` = `a`.`checked_out`
LEFT JOIN `bd5_viewlevels` AS `ag` ON `ag`.`id` = `a`.`access`
LEFT JOIN `bd5_categories` AS `c` ON `c`.`id` = `a`.`catid`
LEFT JOIN `bd5_categories` AS `parent` ON `parent`.`id` = `c`.`parent_id`
LEFT JOIN `bd5_users` AS `ua` ON `ua`.`id` = `a`.`created_by`
INNER JOIN `bd5_workflow_associations` AS `wa` ON `wa`.`item_id` = `a`.`id`
INNER JOIN `bd5_workflow_stages` AS `ws` ON `ws`.`id` = `wa`.`stage_id`
INNER JOIN `bd5_workflows` AS `w` ON `w`.`id` = `ws`.`workflow_id`
WHERE `wa`.`extension` = 'com_content.article' AND `a`.`state` IN (1,0)
ORDER BY a.id desc LIMIT 20
With:
SELECT `a`.`id`,`a`.`asset_id`,`a`.`title`,`a`.`alias`,`a`.`checked_out`,`a`.`checked_out_time`,`a`.`catid`,`a`.`state`,`a`.`access`,`a`.`created`,`a`.`created_by`,`a`.`created_by_alias`,`a`.`modified`,`a`.`ordering`,`a`.`featured`,`a`.`language`,`a`.`hits`,`a`.`publish_up`,`a`.`publish_down`,`a`.`introtext`,`a`.`fulltext`,`a`.`note`,`a`.`images`,`a`.`metakey`,`a`.`metadesc`,`a`.`metadata`,`a`.`version`,`fp`.`featured_up`,`fp`.`featured_down`,`l`.`title` AS `language_title`,`l`.`image` AS `language_image`,`uc`.`name` AS `editor`,`ag`.`title` AS `access_level`,`c`.`title` AS `category_title`,`c`.`created_user_id` AS `category_uid`,`c`.`level` AS `category_level`,`c`.`published` AS `category_published`,`parent`.`title` AS `parent_category_title`,`parent`.`id` AS `parent_category_id`,`parent`.`created_user_id` AS `parent_category_uid`,`parent`.`level` AS `parent_category_level`,`ua`.`name` AS `author_name`,`wa`.`stage_id` AS `stage_id`,`ws`.`title` AS `stage_title`,`ws`.`workflow_id` AS `workflow_id`,`w`.`title` AS `workflow_title`
FROM `bd5_content` AS `a`
LEFT JOIN `bd5_languages` AS `l` ON `l`.`lang_code` = `a`.`language`
LEFT JOIN `bd5_content_frontpage` AS `fp` ON `fp`.`content_id` = `a`.`id`
LEFT JOIN `bd5_users` AS `uc` FORCE INDEX(PRIMARY) ON `uc`.`id` = `a`.`checked_out`
LEFT JOIN `bd5_viewlevels` AS `ag` ON `ag`.`id` = `a`.`access`
LEFT JOIN `bd5_categories` AS `c` ON `c`.`id` = `a`.`catid`
LEFT JOIN `bd5_categories` AS `parent` ON `parent`.`id` = `c`.`parent_id`
LEFT JOIN `bd5_users` AS `ua` FORCE INDEX(PRIMARY) ON `ua`.`id` = `a`.`created_by`
INNER JOIN `bd5_workflow_associations` AS `wa` FORCE INDEX(PRIMARY) ON `wa`.`item_id` = `a`.`id`
INNER JOIN `bd5_workflow_stages` AS `ws` FORCE INDEX(PRIMARY) ON `ws`.`id` = `wa`.`stage_id`
INNER JOIN `bd5_workflows` AS `w` FORCE INDEX(PRIMARY) ON `w`.`id` = `ws`.`workflow_id`
WHERE `wa`.`extension` = 'com_content.article' AND `a`.`state` IN (1,0)
ORDER BY a.id desc LIMIT 20
The query looks the same to what you have with enabled Workflow.
I just tried to enable workflow, but I did not get any time difference. It stays the same 1.6s vs 1s for the query on the same data. But maybe because I do not have any real workflow in use.
So it seems that for you the PRIMARY key of the "ws" table is also not used with the PR?
yeap
My guess, MySQL optimizer have some kind of internal performance rating for its server, and based on this doing its optimization. Or maybe some memory limit config.
That can explain difference in our results.
My guess, MySQL optimizer have some kind of internal performance rating for its server, and based on this doing its optimization. Or maybe some memory limit config.
Yep, that's my assumption too. Getting the exact same EXPLAIN results is going to be hard if the query is performed on different machines / setups.
If we could find which configuration could improve performance with large amount of data, we then probably can skip this hacks, and add "mysql configuration hint" in to joomla manual, somewhere.
This is interesting https://stackoverflow.com/questions/50508632/mysql-not-picking-up-the-optimal-index
There also a link to how to trace the optimizer https://oysteing.blogspot.com/2016/01/how-to-get-optimizer-trace-for-query.html
Here is my trace the query optimiser-trace.txt
But hard to read, for example for index idx_langcode
it say chosen: true
but in the end it still not used, hm.
A long time ago we had, what sounds like, a very similar issue and in that case although the indexing improved the performance on a large site it had a negative experience impact on sites with a more typical number of items. I am not saying that is the case here just that it needs to be checked