ORDER BY a.hits DESC
If _content table contains many articles the result is a very slow query which is not acceptable for such a basic module
Just having an index on hits column in _content table would solve the problem
Labels |
Added:
?
|
If its just an index then there is no need for this to wait for a specific version or even to be in any version at all. @Notebit can simply add the index directly to the db can't they?
In general yes, but in this case I think it could make sense in the core because the most viewed articles module seems to be a common use case for me.
Are you sure. It's not my area of expertise but I recall that adding an index on a small table can actually be slower which is why everything isnt indexec. See https://www.sqlservercentral.com/forums/topic/indexing-small-tables-what-is-small-enough-to-avoid-indexes-altogether
Im not convinced.
Can you provide the EXPLAIN of the SQL in full with and without the index?
@brianteeman
I added the index to hits column on my demo site, but phpmyadmin required to change the default values of all the DATETIME columns in _content to NULL
I would prefer to have a more systematic approach to performances
btw it's not a big issue
In a database of 29469 articles...
The query run to get 5 articles for the module is
SELECT `a`.`id`,`a`.`title`,`a`.`alias`,`a`.`introtext`,`a`.`fulltext`,`a`.`checked_out`,`a`.`checked_out_time`,`a`.`catid`,`a`.`created`,`a`.`created_by`,`a`.`created_by_alias`,`a`.`modified`,`a`.`modified_by`,CASE WHEN `a`.`publish_up` IS NULL THEN `a`.`created` ELSE `a`.`publish_up` END AS `publish_up`,`a`.`publish_down`,`a`.`images`,`a`.`urls`,`a`.`attribs`,`a`.`metadata`,`a`.`metakey`,`a`.`metadesc`,`a`.`access`,`a`.`hits`,`a`.`featured`,`a`.`language`,LENGTH(`a`.`fulltext`) AS `readmore`,`a`.`ordering`,`fp`.`featured_up`,`fp`.`featured_down`,CASE WHEN `c`.`published` = 2 AND `a`.`state` > 0 THEN 2 WHEN `c`.`published` != 1 THEN 0 ELSE `a`.`state` END AS `state`,`c`.`title` AS `category_title`,`c`.`path` AS `category_route`,`c`.`access` AS `category_access`,`c`.`alias` AS `category_alias`,`c`.`language` AS `category_language`,`c`.`published`,`c`.`published` AS `parents_published`,`c`.`lft`,CASE WHEN `a`.`created_by_alias` > ' ' THEN `a`.`created_by_alias` ELSE `ua`.`name` END AS `author`,`ua`.`email` AS `author_email`,`uam`.`name` AS `modified_by_name`,`parent`.`title` AS `parent_title`,`parent`.`id` AS `parent_id`,`parent`.`path` AS `parent_route`,`parent`.`alias` AS `parent_alias`,`parent`.`language` AS `parent_language`,`fp`.`ordering`
FROM `jos_content` AS `a`
LEFT JOIN `jos_categories` AS `c` ON `c`.`id` = `a`.`catid`
LEFT JOIN `jos_users` AS `ua` ON `ua`.`id` = `a`.`created_by`
LEFT JOIN `jos_users` AS `uam` ON `uam`.`id` = `a`.`modified_by`
LEFT JOIN `jos_categories` AS `parent` ON `parent`.`id` = `c`.`parent_id`
LEFT JOIN `jos_content_frontpage` AS `fp` ON `fp`.`content_id` = `a`.`id`
WHERE `a`.`access` IN (1,1,5) AND `c`.`access` IN (1,1,5) AND `c`.`published` = 1 AND `a`.`state` = 1 AND (`a`.`publish_up` IS NULL OR `a`.`publish_up` <= '2021-04-20 20:09:12') AND (`a`.`publish_down` IS NULL OR `a`.`publish_down` >= '2021-04-20 20:09:12')
ORDER BY a.hits DESC LIMIT 5
and this takes (on my local machine) 146ms consistently +/- 1ms.
I then run:
ALTER TABLE `jos_content` ADD INDEX (`hits`);
and run the same query again, and get 146ms consistently +/- 1-2ms.
So no improvement on an already quick query.
IIRC indexes are not used in ordering, but are used in the conditions (the WHERE part).
... but phpmyadmin required to change the default values of all the DATETIME columns in _content to NULL
This can be avoided by putting a line SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
before the SQL statement for adding the index. This would make phpMyAdmin (or any other SQL client) ignore the issue with the default values for the datetime columns.
This can be avoided by putting
By correctly configuring your mysql server :)
IIRC indexes are not used in ordering, but are used in the conditions (the WHERE part).
https://stackoverflow.com/questions/6858844/does-mysql-use-index-for-sorting seems to say something different, that indexes are also used for sorting.
But you are right, it has to be checked, and maybe an index will not be useful here.
By correctly configuring your mysql server :)
@PhilETaylor The server IS correctly configured, that's why it needs the above statement to change that for the current session, like Joomla 3 does it because it uses the deprecated old zero dates, and that was the reason why we had to change that in J4 to use real null dates.
https://stackoverflow.com/questions/6858844/does-mysql-use-index-for-sorting says:
Yes, MySQL uses your index to sort the information when the order is by the sorted column.
Let me rephrase that:
Yes, MySQL uses your index to sort the information when the order is ALSO one of the columns you are selecting in the WHERE.
So
SELECT * from TABLE where id > 3 ORDER BY id
If there were an index on id
then it would be used.
Yes, that's right.
@PhilETaylor The server IS correctly configured,
By correctly configuring your mysql server "for the type of application you are running, E.g Joomla 3" which was designed before the version of mysql you are running and its default options" :)
I guess that's the reason why my hosting provider for shared hosting is still on MySQL 5.7.
I used my huge db (u: admin p: adminadminadmin) from my other PR - and proved that with over 26,000 articles this index addition doesn't gain anything, therefore this is not an issue.
I had similar performances, around 111ms then I added some indexes to _content including hits, now it runs in 14-17ms
but this is strange, I tried to remove the index on hits with phpmyadmin, and i still get 14ms, very strange, maybe some cache somewhere?
btw this is the EXPLAIN i get now that I run it in 14ms
`
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | a | NULL | range | idx_access,idx_state,idx_catid,created | created | 6 | NULL | 514 | 2.17 | Using index condition; Using where; Using temporary; Using filesort |
1 | SIMPLE | c | NULL | ALL | PRIMARY,idx_access | NON È POSSIBILE UTILIZZARE ALCUN INDICE | NULL | NULL | 14 | 7.14 | Using where; Using join buffer (Block Nested Loop) |
1 | SIMPLE | ua | NULL | eq_ref | PRIMARY | PRIMARY | 4 | db_joomla.a.created_by | 1 | 100.00 | Using where |
1 | SIMPLE | uam | NULL | eq_ref | PRIMARY | PRIMARY | 4 | db_joomla.a.modified_by | 1 | 100.00 | Using where |
1 | SIMPLE | parent | NULL | eq_ref | PRIMARY | PRIMARY | 4 | db_joomla.c.parent_id | 1 | 100.00 | Using where |
`
I guess that's the reason why my hosting provider for shared hosting is still on MySQL 5.7.
/Facepalm :) dont get me started on web hosts :) Support for Mysql 5.7 ended in October 2020
Your EXPLAIN only shows 5 rows, mine has 6...
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | a | ref | idx_access,idx_state,idx_catid,state | idx_state | 1 | const | 13006 | Using where; Using temporary; Using filesort |
1 | SIMPLE | ua | ALL | PRIMARY | NULL | NULL | NULL | 1 | Using where; Using join buffer (Block Nested Loop) |
1 | SIMPLE | uam | ALL | PRIMARY | NULL | NULL | NULL | 1 | Using where; Using join buffer (Block Nested Loop) |
1 | SIMPLE | c | eq_ref | PRIMARY,idx_access | PRIMARY | 4 | joomla4.a.catid | 1 | Using where |
1 | SIMPLE | parent | eq_ref | PRIMARY | PRIMARY | 4 | joomla4.c.parent_id | 1 | Using where |
1 | SIMPLE | fp | ALL | PRIMARY | NULL | NULL | NULL | 1 | Using where; Using join buffer (Block Nested Loop) |
I have a _content table of more than 30k articles
Adding an index on the hits
column will not achieve anything. Even your EXPLAIN output has no reference to any hits
index.
maybe because I limit by publishing time
AND (a.publish_up = '0000-00-00 00:00:00' OR a.publish_up <= '2021-04-20 20:11:58')
AND (a.publish_down = '0000-00-00 00:00:00' OR a.publish_down >= '2021-04-20 20:11:58')
AND a.created >= DATE_ADD('2021-04-20 20:11:58', INTERVAL -200 DAY)
otherwise it would be too huge. btw playing with indexes in _content and adding defaut NULL to datetime value somehow allowed me to reduce the query load to 14ms from 111ms. I removed the indexes I added but i still have default NULL and I still execute the query in 14ms
well duh... if you are limiting the number of rows, then the query will be faster to sort the number of rows ...
If I run the Joomla Core SQL Query, it runs like I said 146ms consistently +/- 1-2ms.
If I then (on my 26,000 article db) further restrict the number of rows it has to seek through and order, using your additional query condition:
AND a.created >= DATE_ADD('2021-04-20 20:11:58', INTERVAL -200 DAY)
Then I too then get times in the 51ms area... but that's nothing to do with indexing, its because with your additional condition I then only have 1610 rows that I need to order by hits, instead of over 16,000...
I would say, based on my testing and what you have said, that your manipulation of the SQL, to further restrict the rows it returns, before ordering, is the reason for your speed increase. Nothing to do with dates, nulls or indexing.
no I got 14ms from 111ms on the same query (always with the 200 days interval), just messing up with_content table.
btw you are right, it requires some further investigation
please ignore the PR
Maybe time moved on... and your condition now returns less rows because the date in the extra condition returns less rows as maybe you have less articles in the last 200 days than you did say the previous period of 200 days...
no i checked before and after i modified _content table today
@richard67 I am totally convinced it was too rushed by me and I need to check how really I got that huge performance improvement cause i was totally sure it was the index on hits i added, but now I am not that sure anymore
Status | New | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2021-04-20 20:44:55 |
Closed_By | ⇒ | Notebit |
@richard67 @PhilETaylor so after double checking I found that an index on hits had no effect as Phil already highlighted, that huge performance improvement I got came from adding an index on created column
I ran the same query with the 200 days limit similar to this AND a.created >= DATE_ADD('2021-04-20 20:11:58', INTERVAL -200 DAY)
with and without created index and I get 110-150ms without and 14-15ms with created index. it's a 1/10 of the original time
I guess the advantage comes only if you run the most viewed articles module with the created date interval, if you run the module without the date filter I guess there is no advantage in setting an index on created column
the most viewed articles module still generates a slow query if the _content table has many items in it, so improving its performances with a more generic and broad approach can benefit most joomla users
if you have time to double check this and if you think it can be useful to others, maybe we can open a new feature request for it
I guess the advantage comes only if you run the most viewed articles module with the created date interval, if you run the module without the date filter I guess there is no advantage in setting an index on created column
That makes much more sense
But I still stand by my comment #33205 (comment) that this should be applied on a per site basis and not globally
Indexes makes things only slower when you insert or update content, when you run select queries they haven't any negative impact.
But it's not simply adding an index for one column you have to know what select queries you run and which column combination makes sense for an index. It also could lead to a bad performance if you have the wrong indexes and the query optimizer selects the wrong index (mysql can only use one index per query/table). In this case you can give the query optimizer a hint which index to use.
But that's something for an "expert" ;-)
Yes, and I am definitely not an expert for optimization ... when I had to work with databases in my job, they already were optimized by someone else ;-)
@brianteeman adding the index on per site basis is totally safe for future joomla updates and joomla database table structure check? I don't want to make any custom change which can give me troubles with standard joomla updates
@Notebit Adding indexes is safe with following exception: If your Joomla is still using an old MySQL database which doesn't support utf8mb4, and you make an index for a varchar column, and then update your database server or migrate to a newer server which does support utf8mb4, then the utf8mb4 conversion of Joomla might fail due to index length.
But this is a very rare case, and here we are talking about indexes for numerical columns or datetime columns mainly, and with these such a problem will for sure never happen.
And I assume your database is already using utf8mb4, and in this case you can also do what you want with indexes for varchar columns.
P.S.: The database structure check only checks indexes which appear in SQL update scripts of the core, or in case of Joomla 4 also of extensions. I.e. it will complain if you modify an existing index coming from the core. It doesn't care about any other index you add.
@richard67 clear, thanks
I don't want to make any custom change which can give me troubles with standard joomla updates
You will have issues as you openly admit you have modified core PHP code to add additional conditions on the sql query. These will be wiped out on upgrade!
@PhilETaylor it's not a custom modification, joomla standard most viewed articles module supports relative date settings for filtering by date, so the query becomes like the one i use
oh... well I learned something new again today - thanks :) I never knew that.
Would be ok for me to add such an index. @alikon What do you think?
And of course the big question will be: Is it something for 3.x or has it to go into 4.0? @HLeithner @zero-24 What's your opinion on that?