?
avatar Notebit
Notebit
20 Apr 2021

Most viewed articles module performs a query on _content trying to get the most viewed articles

ORDER BY a.hits DESC

If _content table contains many articles the result is a very slow query which is not acceptable for such a basic module

Describe the solution you'd like

Just having an index on hits column in _content table would solve the problem

Additional context

avatar Notebit Notebit - open - 20 Apr 2021
avatar joomla-cms-bot joomla-cms-bot - change - 20 Apr 2021
Labels Added: ?
avatar joomla-cms-bot joomla-cms-bot - labeled - 20 Apr 2021
avatar richard67
richard67 - comment - 20 Apr 2021

Would be ok for me to add such an index. @alikon What do you think?

And of course the big question will be: Is it something for 3.x or has it to go into 4.0? @HLeithner @zero-24 What's your opinion on that?

avatar brianteeman
brianteeman - comment - 20 Apr 2021

If its just an index then there is no need for this to wait for a specific version or even to be in any version at all. @Notebit can simply add the index directly to the db can't they?

avatar richard67
richard67 - comment - 20 Apr 2021

If its just an index then there is no need for this to wait for a specific version or even to be in any version at all. @Notebit can simply add the index directly to the db can't they?

In general yes, but in this case I think it could make sense in the core because the most viewed articles module seems to be a common use case for me.

avatar brianteeman
brianteeman - comment - 20 Apr 2021

Are you sure. It's not my area of expertise but I recall that adding an index on a small table can actually be slower which is why everything isnt indexec. See https://www.sqlservercentral.com/forums/topic/indexing-small-tables-what-is-small-enough-to-avoid-indexes-altogether

avatar PhilETaylor
PhilETaylor - comment - 20 Apr 2021

Im not convinced.

Can you provide the EXPLAIN of the SQL in full with and without the index?

avatar Notebit
Notebit - comment - 20 Apr 2021

@brianteeman
I added the index to hits column on my demo site, but phpmyadmin required to change the default values of all the DATETIME columns in _content to NULL
I would prefer to have a more systematic approach to performances
btw it's not a big issue

avatar PhilETaylor
PhilETaylor - comment - 20 Apr 2021

In a database of 29469 articles...

The query run to get 5 articles for the module is

SELECT `a`.`id`,`a`.`title`,`a`.`alias`,`a`.`introtext`,`a`.`fulltext`,`a`.`checked_out`,`a`.`checked_out_time`,`a`.`catid`,`a`.`created`,`a`.`created_by`,`a`.`created_by_alias`,`a`.`modified`,`a`.`modified_by`,CASE WHEN `a`.`publish_up` IS NULL THEN `a`.`created` ELSE `a`.`publish_up` END AS `publish_up`,`a`.`publish_down`,`a`.`images`,`a`.`urls`,`a`.`attribs`,`a`.`metadata`,`a`.`metakey`,`a`.`metadesc`,`a`.`access`,`a`.`hits`,`a`.`featured`,`a`.`language`,LENGTH(`a`.`fulltext`) AS `readmore`,`a`.`ordering`,`fp`.`featured_up`,`fp`.`featured_down`,CASE WHEN `c`.`published` = 2 AND `a`.`state` > 0 THEN 2 WHEN `c`.`published` != 1 THEN 0 ELSE `a`.`state` END AS `state`,`c`.`title` AS `category_title`,`c`.`path` AS `category_route`,`c`.`access` AS `category_access`,`c`.`alias` AS `category_alias`,`c`.`language` AS `category_language`,`c`.`published`,`c`.`published` AS `parents_published`,`c`.`lft`,CASE WHEN `a`.`created_by_alias` > ' ' THEN `a`.`created_by_alias` ELSE `ua`.`name` END AS `author`,`ua`.`email` AS `author_email`,`uam`.`name` AS `modified_by_name`,`parent`.`title` AS `parent_title`,`parent`.`id` AS `parent_id`,`parent`.`path` AS `parent_route`,`parent`.`alias` AS `parent_alias`,`parent`.`language` AS `parent_language`,`fp`.`ordering`
FROM `jos_content` AS `a`
LEFT JOIN `jos_categories` AS `c` ON `c`.`id` = `a`.`catid`
LEFT JOIN `jos_users` AS `ua` ON `ua`.`id` = `a`.`created_by`
LEFT JOIN `jos_users` AS `uam` ON `uam`.`id` = `a`.`modified_by`
LEFT JOIN `jos_categories` AS `parent` ON `parent`.`id` = `c`.`parent_id`
LEFT JOIN `jos_content_frontpage` AS `fp` ON `fp`.`content_id` = `a`.`id`
WHERE `a`.`access` IN (1,1,5) AND `c`.`access` IN (1,1,5) AND `c`.`published` = 1 AND `a`.`state` = 1 AND (`a`.`publish_up` IS NULL OR `a`.`publish_up` <= '2021-04-20 20:09:12') AND (`a`.`publish_down` IS NULL OR `a`.`publish_down` >= '2021-04-20 20:09:12')
ORDER BY a.hits DESC LIMIT 5

and this takes (on my local machine) 146ms consistently +/- 1ms.

I then run:

ALTER TABLE `jos_content` ADD INDEX (`hits`);

and run the same query again, and get 146ms consistently +/- 1-2ms.

So no improvement on an already quick query.

IIRC indexes are not used in ordering, but are used in the conditions (the WHERE part).

avatar richard67
richard67 - comment - 20 Apr 2021

... but phpmyadmin required to change the default values of all the DATETIME columns in _content to NULL

This can be avoided by putting a line SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO"; before the SQL statement for adding the index. This would make phpMyAdmin (or any other SQL client) ignore the issue with the default values for the datetime columns.

avatar PhilETaylor
PhilETaylor - comment - 20 Apr 2021

This can be avoided by putting

By correctly configuring your mysql server :)

avatar richard67
richard67 - comment - 20 Apr 2021

IIRC indexes are not used in ordering, but are used in the conditions (the WHERE part).

https://stackoverflow.com/questions/6858844/does-mysql-use-index-for-sorting seems to say something different, that indexes are also used for sorting.

But you are right, it has to be checked, and maybe an index will not be useful here.

avatar richard67
richard67 - comment - 20 Apr 2021

By correctly configuring your mysql server :)

@PhilETaylor The server IS correctly configured, that's why it needs the above statement to change that for the current session, like Joomla 3 does it because it uses the deprecated old zero dates, and that was the reason why we had to change that in J4 to use real null dates.

avatar PhilETaylor
PhilETaylor - comment - 20 Apr 2021

https://stackoverflow.com/questions/6858844/does-mysql-use-index-for-sorting says:

Yes, MySQL uses your index to sort the information when the order is by the sorted column.

Let me rephrase that:

Yes, MySQL uses your index to sort the information when the order is ALSO one of the columns you are selecting in the WHERE.

So

SELECT * from TABLE where id > 3 ORDER BY id

If there were an index on id then it would be used.

avatar richard67
richard67 - comment - 20 Apr 2021

Yes, that's right.

avatar PhilETaylor
PhilETaylor - comment - 20 Apr 2021

@PhilETaylor The server IS correctly configured,

By correctly configuring your mysql server "for the type of application you are running, E.g Joomla 3" which was designed before the version of mysql you are running and its default options" :)

avatar richard67
richard67 - comment - 20 Apr 2021

I guess that's the reason why my hosting provider for shared hosting is still on MySQL 5.7.

avatar PhilETaylor
PhilETaylor - comment - 20 Apr 2021

I used my huge db (u: admin p: adminadminadmin) from my other PR - and proved that with over 26,000 articles this index addition doesn't gain anything, therefore this is not an issue.

avatar Notebit
Notebit - comment - 20 Apr 2021

I had similar performances, around 111ms then I added some indexes to _content including hits, now it runs in 14-17ms
but this is strange, I tried to remove the index on hits with phpmyadmin, and i still get 14ms, very strange, maybe some cache somewhere?
btw this is the EXPLAIN i get now that I run it in 14ms

`

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE a NULL range idx_access,idx_state,idx_catid,created created 6 NULL 514 2.17 Using index condition; Using where; Using temporary; Using filesort
1 SIMPLE c NULL ALL PRIMARY,idx_access NON È POSSIBILE UTILIZZARE ALCUN INDICE NULL NULL 14 7.14 Using where; Using join buffer (Block Nested Loop)
1 SIMPLE ua NULL eq_ref PRIMARY PRIMARY 4 db_joomla.a.created_by 1 100.00 Using where
1 SIMPLE uam NULL eq_ref PRIMARY PRIMARY 4 db_joomla.a.modified_by 1 100.00 Using where
1 SIMPLE parent NULL eq_ref PRIMARY PRIMARY 4 db_joomla.c.parent_id 1 100.00 Using where

`

avatar PhilETaylor
PhilETaylor - comment - 20 Apr 2021

I guess that's the reason why my hosting provider for shared hosting is still on MySQL 5.7.

/Facepalm :) dont get me started on web hosts :) Support for Mysql 5.7 ended in October 2020

avatar PhilETaylor
PhilETaylor - comment - 20 Apr 2021

@Notebit

Your EXPLAIN only shows 5 rows, mine has 6...

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE a ref idx_access,idx_state,idx_catid,state idx_state 1 const 13006 Using where; Using temporary; Using filesort
1 SIMPLE ua ALL PRIMARY NULL NULL NULL 1 Using where; Using join buffer (Block Nested Loop)
1 SIMPLE uam ALL PRIMARY NULL NULL NULL 1 Using where; Using join buffer (Block Nested Loop)
1 SIMPLE c eq_ref PRIMARY,idx_access PRIMARY 4 joomla4.a.catid 1 Using where
1 SIMPLE parent eq_ref PRIMARY PRIMARY 4 joomla4.c.parent_id 1 Using where
1 SIMPLE fp ALL PRIMARY NULL NULL NULL 1 Using where; Using join buffer (Block Nested Loop)
avatar Notebit
Notebit - comment - 20 Apr 2021

I have a _content table of more than 30k articles

avatar PhilETaylor
PhilETaylor - comment - 20 Apr 2021

Adding an index on the hits column will not achieve anything. Even your EXPLAIN output has no reference to any hits index.

avatar Notebit
Notebit - comment - 20 Apr 2021

maybe because I limit by publishing time
AND (a.publish_up = '0000-00-00 00:00:00' OR a.publish_up <= '2021-04-20 20:11:58')
AND (a.publish_down = '0000-00-00 00:00:00' OR a.publish_down >= '2021-04-20 20:11:58')
AND a.created >= DATE_ADD('2021-04-20 20:11:58', INTERVAL -200 DAY)

otherwise it would be too huge. btw playing with indexes in _content and adding defaut NULL to datetime value somehow allowed me to reduce the query load to 14ms from 111ms. I removed the indexes I added but i still have default NULL and I still execute the query in 14ms

avatar PhilETaylor
PhilETaylor - comment - 20 Apr 2021

well duh... if you are limiting the number of rows, then the query will be faster to sort the number of rows ...

If I run the Joomla Core SQL Query, it runs like I said 146ms consistently +/- 1-2ms.

If I then (on my 26,000 article db) further restrict the number of rows it has to seek through and order, using your additional query condition:

AND a.created >= DATE_ADD('2021-04-20 20:11:58', INTERVAL -200 DAY)

Then I too then get times in the 51ms area... but that's nothing to do with indexing, its because with your additional condition I then only have 1610 rows that I need to order by hits, instead of over 16,000...

avatar PhilETaylor
PhilETaylor - comment - 20 Apr 2021

I would say, based on my testing and what you have said, that your manipulation of the SQL, to further restrict the rows it returns, before ordering, is the reason for your speed increase. Nothing to do with dates, nulls or indexing.

avatar Notebit
Notebit - comment - 20 Apr 2021

no I got 14ms from 111ms on the same query (always with the 200 days interval), just messing up with_content table.
btw you are right, it requires some further investigation
please ignore the PR

avatar PhilETaylor
PhilETaylor - comment - 20 Apr 2021

Maybe time moved on... and your condition now returns less rows because the date in the extra condition returns less rows as maybe you have less articles in the last 200 days than you did say the previous period of 200 days...

avatar richard67
richard67 - comment - 20 Apr 2021

@Notebit If you are convinced you can close this issue here (it's not a PR) yourself, or I can close it if you are ok with that.

avatar Notebit
Notebit - comment - 20 Apr 2021

no i checked before and after i modified _content table today

avatar Notebit
Notebit - comment - 20 Apr 2021

@richard67 I am totally convinced it was too rushed by me and I need to check how really I got that huge performance improvement cause i was totally sure it was the index on hits i added, but now I am not that sure anymore

avatar Notebit Notebit - change - 20 Apr 2021
Status New Closed
Closed_Date 0000-00-00 00:00:00 2021-04-20 20:44:55
Closed_By Notebit
avatar Notebit Notebit - close - 20 Apr 2021
avatar Notebit
Notebit - comment - 21 Apr 2021

@richard67 @PhilETaylor so after double checking I found that an index on hits had no effect as Phil already highlighted, that huge performance improvement I got came from adding an index on created column
I ran the same query with the 200 days limit similar to this AND a.created >= DATE_ADD('2021-04-20 20:11:58', INTERVAL -200 DAY)
with and without created index and I get 110-150ms without and 14-15ms with created index. it's a 1/10 of the original time

I guess the advantage comes only if you run the most viewed articles module with the created date interval, if you run the module without the date filter I guess there is no advantage in setting an index on created column

the most viewed articles module still generates a slow query if the _content table has many items in it, so improving its performances with a more generic and broad approach can benefit most joomla users

if you have time to double check this and if you think it can be useful to others, maybe we can open a new feature request for it

avatar brianteeman
brianteeman - comment - 21 Apr 2021

I guess the advantage comes only if you run the most viewed articles module with the created date interval, if you run the module without the date filter I guess there is no advantage in setting an index on created column

That makes much more sense

But I still stand by my comment #33205 (comment) that this should be applied on a per site basis and not globally

avatar HLeithner
HLeithner - comment - 21 Apr 2021

Indexes makes things only slower when you insert or update content, when you run select queries they haven't any negative impact.

But it's not simply adding an index for one column you have to know what select queries you run and which column combination makes sense for an index. It also could lead to a bad performance if you have the wrong indexes and the query optimizer selects the wrong index (mysql can only use one index per query/table). In this case you can give the query optimizer a hint which index to use.

But that's something for an "expert" ;-)

avatar richard67
richard67 - comment - 21 Apr 2021

Yes, and I am definitely not an expert for optimization ... when I had to work with databases in my job, they already were optimized by someone else ;-)

avatar Notebit
Notebit - comment - 21 Apr 2021

@brianteeman adding the index on per site basis is totally safe for future joomla updates and joomla database table structure check? I don't want to make any custom change which can give me troubles with standard joomla updates

avatar richard67
richard67 - comment - 21 Apr 2021

@Notebit Adding indexes is safe with following exception: If your Joomla is still using an old MySQL database which doesn't support utf8mb4, and you make an index for a varchar column, and then update your database server or migrate to a newer server which does support utf8mb4, then the utf8mb4 conversion of Joomla might fail due to index length.

But this is a very rare case, and here we are talking about indexes for numerical columns or datetime columns mainly, and with these such a problem will for sure never happen.

And I assume your database is already using utf8mb4, and in this case you can also do what you want with indexes for varchar columns.

avatar richard67
richard67 - comment - 21 Apr 2021

P.S.: The database structure check only checks indexes which appear in SQL update scripts of the core, or in case of Joomla 4 also of extensions. I.e. it will complain if you modify an existing index coming from the core. It doesn't care about any other index you add.

avatar Notebit
Notebit - comment - 21 Apr 2021

@richard67 clear, thanks

avatar PhilETaylor
PhilETaylor - comment - 21 Apr 2021

I don't want to make any custom change which can give me troubles with standard joomla updates

You will have issues as you openly admit you have modified core PHP code to add additional conditions on the sql query. These will be wiped out on upgrade!

avatar Notebit
Notebit - comment - 21 Apr 2021

@PhilETaylor it's not a custom modification, joomla standard most viewed articles module supports relative date settings for filtering by date, so the query becomes like the one i use

avatar PhilETaylor
PhilETaylor - comment - 21 Apr 2021

oh... well I learned something new again today - thanks :) I never knew that.

Add a Comment

Login with GitHub to post a comment