User tests: Successful: Unsuccessful:
Pull Request for Issue #43701
Based on the PR #44079 by @Fedik , slightly adjusted and ported to 5.4
In our company we are supporting a rather large Joomla 5.x client site with more than 50.000 articles that makes use of the workflows. For years, the performance in the backend was perfectly fine, however recently we experienced significant performance issues in the article list views in the frontend but most importantly in the backend.
During my debugging I realized that the main query for the backend article list could use indexes on all relevant joins but doesn't do so. Instead, it creates a temporary table which has a significant performance impact. During further diagnosis it became apparent that the MySQL query optimizer stops using the available keys if a certain number of articles are in the #__content table, which is a document behavior (see https://stackoverflow.com/questions/72807286/index-not-being-utilized-in-mysql-when-larger-dataset) and helpful most of the times. In Joomla's usecase however the performance impact is significant: By enforcing the index usage as suggested in this PR, the query time was reduced from 2.6s down to 48ms.
Example screenshot of an EXPLAIN output WITHOUR the PR:
No key is used, instead temporary table with filesort
Example screenshot WITH the PR:
Please note: this is an issue that only affects MariaDB/MySQL servers. Postgres is unaffected as it uses it's own query optimization logic.
Please select:
Status | New | ⇒ | Pending |
Category | ⇒ | Administration com_content Front End |
Labels |
Added:
PR-5.4-dev
|
I reviewed the affected query on small (< 100 articles) and medium size (1200 articles) sites, in both cases the indexes that are enforced by this PR have been in use anyways. Only on large scale size the optimizer decides to skip the index.
Well, we can do like that also, a bit hacky, but should work.
I thought there is something smarter to trick the optimizer, but did not found anything.
I thought there is something smarter to trick the optimizer, but did not found anything.
Same for me. I played around with various changes to the indexes, join types and even tested multiple mariadb version - no effect.
I have tested this item ✅ successfully on e55df52
btw, the query works faster when use filter by category, that is why the frontend does not have the same slow down effect, even though query almost identical.
@richard67 MySQL
Labels |
Added:
Performance
|
I have tried to reproduce the actual and the expected result on diverse Ubuntu VMs with different MySQL 8.0 versions, but I was not able to get the same results. Unfortunately I don't have MariaDB available for tests.
What I did:
I've made a new installation of a current 5.4-dev. The I have disabled the "Content - Smart Search" and "Behaviour - Versionable" plugins as recommended in the Readme of @Fedik 's plugin https://github.com/Fedik/plg_sampledata_bigdata , and I have enabled Workflows in the integration options of com_content.
Then I've installed an enabled the latest release of that plugin plg_sampledata_bigdata_1.0.0_328f117_20240927_1253.zip .
Then I have run it several times to that I had more than 50k articles and workflow associations on one site, and on another I had more than 30k.
In both cases the same result:
Without the FORCE INDEX
I had only one index not being used and not 2 like shown in the screenshot of the actual result.
EXPLAIN SELECT `a`.`id`,`a`.`asset_id`,`a`.`title`,`a`.`alias`,`a`.`checked_out`,`a`.`checked_out_time`,`a`.`catid`,`a`.`state`,`a`.`access`,`a`.`created`,`a`.`created_by`,`a`.`created_by_alias`,`a`.`modified`,`a`.`ordering`,`a`.`featured`,`a`.`language`,`a`.`hits`,`a`.`publish_up`,`a`.`publish_down`,`a`.`introtext`,`a`.`fulltext`,`a`.`note`,`a`.`images`,`a`.`metakey`,`a`.`metadesc`,`a`.`metadata`,`a`.`version`,`fp`.`featured_up`,`fp`.`featured_down`,`l`.`title` AS `language_title`,`l`.`image` AS `language_image`,`uc`.`name` AS `editor`,`ag`.`title` AS `access_level`,`c`.`title` AS `category_title`,`c`.`created_user_id` AS `category_uid`,`c`.`level` AS `category_level`,`c`.`published` AS `category_published`,`parent`.`title` AS `parent_category_title`,`parent`.`id` AS `parent_category_id`,`parent`.`created_user_id` AS `parent_category_uid`,`parent`.`level` AS `parent_category_level`,`ua`.`name` AS `author_name`,`wa`.`stage_id` AS `stage_id`,`ws`.`title` AS `stage_title`,`ws`.`workflow_id` AS `workflow_id`,`w`.`title` AS `workflow_title`
FROM `j3ux0_content` AS `a`
LEFT JOIN `j3ux0_languages` AS `l` ON `l`.`lang_code` = `a`.`language`
LEFT JOIN `j3ux0_content_frontpage` AS `fp` ON `fp`.`content_id` = `a`.`id`
LEFT JOIN `j3ux0_users` AS `uc` ON `uc`.`id` = `a`.`checked_out`
LEFT JOIN `j3ux0_viewlevels` AS `ag` ON `ag`.`id` = `a`.`access`
LEFT JOIN `j3ux0_categories` AS `c` ON `c`.`id` = `a`.`catid`
LEFT JOIN `j3ux0_categories` AS `parent` ON `parent`.`id` = `c`.`parent_id`
LEFT JOIN `j3ux0_users` AS `ua` ON `ua`.`id` = `a`.`created_by`
INNER JOIN `j3ux0_workflow_associations` AS `wa` ON `wa`.`item_id` = `a`.`id`
INNER JOIN `j3ux0_workflow_stages` AS `ws` ON `ws`.`id` = `wa`.`stage_id`
INNER JOIN `j3ux0_workflows` AS `w` ON `w`.`id` = `ws`.`workflow_id`
WHERE `wa`.`extension` = 'com_content.article' AND `a`.`state` IN (1,0)
ORDER BY a.id desc LIMIT 20;
As you can see, the primary key is used for the workflows table "w", in opposite to the screen shot for the actual result in the description of this PR.
With the FORCE INDEX
2 indexes were not used, and the execution time of the query did not differ much, it was in both cases something between 350 and 390 ms.
EXPLAIN SELECT `a`.`id`,`a`.`asset_id`,`a`.`title`,`a`.`alias`,`a`.`checked_out`,`a`.`checked_out_time`,`a`.`catid`,`a`.`state`,`a`.`access`,`a`.`created`,`a`.`created_by`,`a`.`created_by_alias`,`a`.`modified`,`a`.`ordering`,`a`.`featured`,`a`.`language`,`a`.`hits`,`a`.`publish_up`,`a`.`publish_down`,`a`.`introtext`,`a`.`fulltext`,`a`.`note`,`a`.`images`,`a`.`metakey`,`a`.`metadesc`,`a`.`metadata`,`a`.`version`,`fp`.`featured_up`,`fp`.`featured_down`,`l`.`title` AS `language_title`,`l`.`image` AS `language_image`,`uc`.`name` AS `editor`,`ag`.`title` AS `access_level`,`c`.`title` AS `category_title`,`c`.`created_user_id` AS `category_uid`,`c`.`level` AS `category_level`,`c`.`published` AS `category_published`,`parent`.`title` AS `parent_category_title`,`parent`.`id` AS `parent_category_id`,`parent`.`created_user_id` AS `parent_category_uid`,`parent`.`level` AS `parent_category_level`,`ua`.`name` AS `author_name`,`wa`.`stage_id` AS `stage_id`,`ws`.`title` AS `stage_title`,`ws`.`workflow_id` AS `workflow_id`,`w`.`title` AS `workflow_title`
FROM `j3ux0_content` AS `a`
LEFT JOIN `j3ux0_languages` AS `l` ON `l`.`lang_code` = `a`.`language`
LEFT JOIN `j3ux0_content_frontpage` AS `fp` ON `fp`.`content_id` = `a`.`id`
LEFT JOIN `j3ux0_users` AS `uc` FORCE INDEX(PRIMARY) ON `uc`.`id` = `a`.`checked_out`
LEFT JOIN `j3ux0_viewlevels` AS `ag` ON `ag`.`id` = `a`.`access`
LEFT JOIN `j3ux0_categories` AS `c` ON `c`.`id` = `a`.`catid`
LEFT JOIN `j3ux0_categories` AS `parent` ON `parent`.`id` = `c`.`parent_id`
LEFT JOIN `j3ux0_users` AS `ua` FORCE INDEX(PRIMARY) ON `ua`.`id` = `a`.`created_by`
INNER JOIN `j3ux0_workflow_associations` AS `wa` FORCE INDEX(PRIMARY) ON `wa`.`item_id` = `a`.`id`
INNER JOIN `j3ux0_workflow_stages` AS `ws` FORCE INDEX(PRIMARY) ON `ws`.`id` = `wa`.`stage_id`
INNER JOIN `j3ux0_workflows` AS `w` FORCE INDEX(PRIMARY) ON `w`.`id` = `ws`.`workflow_id`
WHERE `wa`.`extension` = 'com_content.article' AND `a`.`state` IN (1,0)
ORDER BY a.id desc LIMIT 20;
As you can see, forcing the PRIMARY key makes other keys disappear from the list of possible keys, and in case of the workflow associations "wa" it results in the "idx_extensions" not being used anymore.
The reason for that might be that in my scenario, all categories and all articles belong to the the same stage of the same workflow, and there are no other workflows and workflow stages.
This fits to the first answer to the stackoverflow question here https://stackoverflow.com/questions/945925/mysql-index-being-ignored :
You can also use FORCE INDEX, which acts like USE INDEX (index_list) but with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the given indexes to find rows in the table.
Since you aren't actually "finding" any rows (you are selecting them all), a table scan is always going to be fastest, and the optimizer is smart enough to know that in spite of what you are telling them.
The citation is taken from here: https://dev.mysql.com/doc/refman/8.4/en/index-hints.html
Now if you scroll a bit down on that MySQL documentation, you will find an important note:
Note
MySQL 8.4 supports the index-level optimizer hints JOIN_INDEX, GROUP_INDEX, ORDER_INDEX, and INDEX, which are equivalent to and intended to supersede FORCE INDEX index hints, as well as the NO_JOIN_INDEX, NO_GROUP_INDEX, NO_ORDER_INDEX, and NO_INDEX optimizer hints, which are equivalent to and intended to supersede IGNORE INDEX index hints. Thus, you should expect USE INDEX, FORCE INDEX, and IGNORE INDEX to be deprecated in a future release of MySQL, and at some time thereafter to be removed altogether.
These index-level optimizer hints are supported with both single-table and multi-table DELETE statements.
For more information, see Index-Level Optimizer Hints.
When the "Thus, you should expect USE INDEX, FORCE INDEX, and IGNORE INDEX to be deprecated in a future release of MySQL," will happen, we will have to change our code to be different for MySQL and MariaDB, as the latter seems not to make that change, see https://mariadb.com/kb/en/force-index/ , and optimizer hints seem to work very differently to MySQL, see https://mariadb.com/kb/en/index-hints-how-to-force-query-plans/ .
So I am not sure if this PR here is the way to go.
@SniperSister @Fedik Maybe you can provide more details about your scenario, e.g. how workflows are used? Or could you test my scenario if you get the same results as I did? Maybe something is special with my MySQL installation on Ubuntu (however, I am not aware of having changed anything on the default configuration regarding the opimizer such stuff).
I am testing on local Mysql 8.0.32 without any extra configurations.
Originally I made installation and all dummy content with joomla 5.2 and then updated to 5.4. With Workflow disabled.
My test result is:
~2.5s vs ~1.5s (for full page loading)
~1.6s vs ~1s (only this query)
with this PR and ~44k dummy articles.
After PR:
I just noticed that this PR is missing FORCE INDEX(idx_langcode)
from my original PR. With this I get few ms faster (~980ms).
But I see in your "explain" this index is already used, that is strange, may it be some server configuration difference?
Or maybe newest mysql is smarter?
And I think you have a faster setup 😉
Sure,
Without:
SELECT `a`.`id`,`a`.`asset_id`,`a`.`title`,`a`.`alias`,`a`.`checked_out`,`a`.`checked_out_time`,`a`.`catid`,`a`.`state`,`a`.`access`,`a`.`created`,`a`.`created_by`,`a`.`created_by_alias`,`a`.`modified`,`a`.`ordering`,`a`.`featured`,`a`.`language`,`a`.`hits`,`a`.`publish_up`,`a`.`publish_down`,`a`.`introtext`,`a`.`fulltext`,`a`.`note`,`a`.`images`,`a`.`metakey`,`a`.`metadesc`,`a`.`metadata`,`a`.`version`,`fp`.`featured_up`,`fp`.`featured_down`,`l`.`title` AS `language_title`,`l`.`image` AS `language_image`,`uc`.`name` AS `editor`,`ag`.`title` AS `access_level`,`c`.`title` AS `category_title`,`c`.`created_user_id` AS `category_uid`,`c`.`level` AS `category_level`,`c`.`published` AS `category_published`,`parent`.`title` AS `parent_category_title`,`parent`.`id` AS `parent_category_id`,`parent`.`created_user_id` AS `parent_category_uid`,`parent`.`level` AS `parent_category_level`,`ua`.`name` AS `author_name`,`wa`.`stage_id` AS `stage_id`,`ws`.`title` AS `stage_title`,`ws`.`workflow_id` AS `workflow_id`,`w`.`title` AS `workflow_title`
FROM `bd5_content` AS `a`
LEFT JOIN `bd5_languages` AS `l` ON `l`.`lang_code` = `a`.`language`
LEFT JOIN `bd5_content_frontpage` AS `fp` ON `fp`.`content_id` = `a`.`id`
LEFT JOIN `bd5_users` AS `uc` ON `uc`.`id` = `a`.`checked_out`
LEFT JOIN `bd5_viewlevels` AS `ag` ON `ag`.`id` = `a`.`access`
LEFT JOIN `bd5_categories` AS `c` ON `c`.`id` = `a`.`catid`
LEFT JOIN `bd5_categories` AS `parent` ON `parent`.`id` = `c`.`parent_id`
LEFT JOIN `bd5_users` AS `ua` ON `ua`.`id` = `a`.`created_by`
INNER JOIN `bd5_workflow_associations` AS `wa` ON `wa`.`item_id` = `a`.`id`
INNER JOIN `bd5_workflow_stages` AS `ws` ON `ws`.`id` = `wa`.`stage_id`
INNER JOIN `bd5_workflows` AS `w` ON `w`.`id` = `ws`.`workflow_id`
WHERE `wa`.`extension` = 'com_content.article' AND `a`.`state` IN (1,0)
ORDER BY a.id desc LIMIT 20
With:
SELECT `a`.`id`,`a`.`asset_id`,`a`.`title`,`a`.`alias`,`a`.`checked_out`,`a`.`checked_out_time`,`a`.`catid`,`a`.`state`,`a`.`access`,`a`.`created`,`a`.`created_by`,`a`.`created_by_alias`,`a`.`modified`,`a`.`ordering`,`a`.`featured`,`a`.`language`,`a`.`hits`,`a`.`publish_up`,`a`.`publish_down`,`a`.`introtext`,`a`.`fulltext`,`a`.`note`,`a`.`images`,`a`.`metakey`,`a`.`metadesc`,`a`.`metadata`,`a`.`version`,`fp`.`featured_up`,`fp`.`featured_down`,`l`.`title` AS `language_title`,`l`.`image` AS `language_image`,`uc`.`name` AS `editor`,`ag`.`title` AS `access_level`,`c`.`title` AS `category_title`,`c`.`created_user_id` AS `category_uid`,`c`.`level` AS `category_level`,`c`.`published` AS `category_published`,`parent`.`title` AS `parent_category_title`,`parent`.`id` AS `parent_category_id`,`parent`.`created_user_id` AS `parent_category_uid`,`parent`.`level` AS `parent_category_level`,`ua`.`name` AS `author_name`,`wa`.`stage_id` AS `stage_id`,`ws`.`title` AS `stage_title`,`ws`.`workflow_id` AS `workflow_id`,`w`.`title` AS `workflow_title`
FROM `bd5_content` AS `a`
LEFT JOIN `bd5_languages` AS `l` ON `l`.`lang_code` = `a`.`language`
LEFT JOIN `bd5_content_frontpage` AS `fp` ON `fp`.`content_id` = `a`.`id`
LEFT JOIN `bd5_users` AS `uc` FORCE INDEX(PRIMARY) ON `uc`.`id` = `a`.`checked_out`
LEFT JOIN `bd5_viewlevels` AS `ag` ON `ag`.`id` = `a`.`access`
LEFT JOIN `bd5_categories` AS `c` ON `c`.`id` = `a`.`catid`
LEFT JOIN `bd5_categories` AS `parent` ON `parent`.`id` = `c`.`parent_id`
LEFT JOIN `bd5_users` AS `ua` FORCE INDEX(PRIMARY) ON `ua`.`id` = `a`.`created_by`
INNER JOIN `bd5_workflow_associations` AS `wa` FORCE INDEX(PRIMARY) ON `wa`.`item_id` = `a`.`id`
INNER JOIN `bd5_workflow_stages` AS `ws` FORCE INDEX(PRIMARY) ON `ws`.`id` = `wa`.`stage_id`
INNER JOIN `bd5_workflows` AS `w` FORCE INDEX(PRIMARY) ON `w`.`id` = `ws`.`workflow_id`
WHERE `wa`.`extension` = 'com_content.article' AND `a`.`state` IN (1,0)
ORDER BY a.id desc LIMIT 20
The query looks the same to what you have with enabled Workflow.
I just tried to enable workflow, but I did not get any time difference. It stays the same 1.6s vs 1s for the query on the same data. But maybe because I do not have any real workflow in use.
So it seems that for you the PRIMARY key of the "ws" table is also not used with the PR?
yeap
My guess, MySQL optimizer have some kind of internal performance rating for its server, and based on this doing its optimization. Or maybe some memory limit config.
That can explain difference in our results.
My guess, MySQL optimizer have some kind of internal performance rating for its server, and based on this doing its optimization. Or maybe some memory limit config.
Yep, that's my assumption too. Getting the exact same EXPLAIN results is going to be hard if the query is performed on different machines / setups.
If we could find which configuration could improve performance with large amount of data, we then probably can skip this hacks, and add "mysql configuration hint" in to joomla manual, somewhere.
This is interesting https://stackoverflow.com/questions/50508632/mysql-not-picking-up-the-optimal-index
There also a link to how to trace the optimizer https://oysteing.blogspot.com/2016/01/how-to-get-optimizer-trace-for-query.html
Here is my trace the query optimiser-trace.txt
But hard to read, for example for index idx_langcode
it say chosen: true
but in the end it still not used, hm.
@SniperSister @Fedik Maybe I have an alternative to this PR here. Could you check if the folloiwing change compared to a normal 5.4-dev (without this PR here) results in a performance increase with your huge data site? 5.4-dev...richard67:5.4-dev-articles-list-workflow-none
You can download the ArticlesModel file for the administrator from my branch: https://raw.githubusercontent.com/richard67/joomla-cms/7892c604e28276222cbf77c64e19e88b7e5402ae/administrator/components/com_content/src/Model/ArticlesModel.php
I have started to work on a change to also show articles without workflow association, which would fix some issues, and here with my huge data test site it seems my change results in a huge performance gain.
P.S.: ... and the explain plan shows indexes being used wherever possible.
@SniperSister @Fedik P.P.S.: ... or just check my PR #45660 . It is work in progress and so still draft, but the main thing is already done there.
@richard67 thanks for looking in to it.
I have started to work on a change to also show articles without workflow association, which would fix some issues, and here with my huge data test site it seems my change results in a huge performance gain.
I also think this would be a better way, in past I made a quick test and the performance difference was huge.
I had an idea to move it to getItems() maybe, but did not had a time.
This is the case when two smaller query will be faster than one complex query on large scale.
Maybe we drop the idea with hacking the indexes, and do that?
PS. I will test your PR a bit later
Maybe we drop the idea with hacking the indexes, and do that?
Fine for me! Will test the changes next week!
My PR has an issue, will fix tomorrow. It might result in the performance issue not being solved.
@richard67 sorry I misunderstood your comment, about "show articles without workflow association", you actually were working on bug :)
I thought about removing workflow
query from the main query, and run them separately inside getItems() when workflow is enabled
LEFT JOIN `#__workflows` AS `w` ON `w`.`id` = `ws`.`workflow_id`
LEFT JOIN `#__workflow_associations` AS `wa` ON `wa`.`stage_id` = `ws`.`id`
LEFT JOIN `#__content` AS `a` ON `wa`.`item_id` = `a`.`id`
WHERE `wa`.`extension` = 'com_content.article' AND a.id IN (1,2,3)
But I now see you use subquery, that maybe even smarter :)
@Fedik Unfortunately my solution breaks the mod_latest on MySQL (but not on MariaDB) because MySQL abuse the max_join_size also for other purpose than the max. join size, and the mod_latest module uses the list query from the articles model and seems to wrap that into an outer query or something like that, so that on my testing site with 30k articles the list works well, but the home dashboard crashes due to that module.
mod_latest works fine for me with your PR.
mod_latest module uses the list query from the articles model and seems to wrap that into an outer query or something like that,
It just adds a few extra fields to select, and define the ordering.
upd: it set new list.select
(list of fields to select) not add
Can be, I have default docker container provided by mysql and it has sql_big_selects=ON by default,
when I trying to disable it then the mysql refuses to work at all 😄
But it still strange, because mod_latest only changes the list of fields to select, and ordering. It does not do anything with joins.
Or maybe I have modified the max_join_size long time ago on my DB to test issues with that and have forgotten to change it back?
It probably it 😄
I just tried to change max_join_size=300000
and got the error,
However by default the value is huge https://dev.mysql.com/doc/refman/8.4/en/server-system-variables.html#sysvar_max_join_size
@SniperSister @Fedik I've reverted the subquery part in my PR, so the performance gain is not that big anymore.
@SniperSister @Fedik It seems I have fixed my PR so the big performance improvement is back without running into the max join size problem. Could you check?
@richard67 Please also note that for some reason all workflow tables have SIGNED integers, while i.e. #__content.id
is UNSIGNED INT, same for asset_id
.
Not a huge problem (I don't see any changes in query explanation), but inconsistent with other core tables where id
, asset_id
and related xxx_id
columns are UNSIGNED.
Would be nice to add migrations and convert all required INTs to UNSIGNED in all workflows tables.
A long time ago we had, what sounds like, a very similar issue and in that case although the indexing improved the performance on a large site it had a negative experience impact on sites with a more typical number of items. I am not saying that is the case here just that it needs to be checked