? Success

User tests: Successful: Unsuccessful:

avatar pierinz
pierinz
28 Dec 2014

As the number of articles grows, the site becomes slower. Modules like mod_articles_popular are painful to load on sites with 5000+ articles.
I found that the main problem is the handling of "bad categories" (unpublished categories and their child categories), which requires too many costly joins. I split the query in two: we prefetch the bad categories' id, then we exclude them in the real query. The speed improvement is huge on big sites, and the impact on small sites is negligible.
In addition the model by default fetches all the fields, even if they are useless; I think we should fetch only the required fields or at least prevent the useless joins. I included the "ignore.*" state to fix that (see changes in the module). This is still a draft, but the result is very interesting.

Ok, these are the numbers:

Test site with test data, "Park Blog" page:
Shows that the impact on small sites is negligible

Unpatched, with badcats:
1- Query Time: 2.26 ms Query memory: 0.069 MB Rows returned: 2

Patched, with badcats:
1- Query Time: 0.76 ms Query memory: 0.053 MB Rows returned: 1
2- Query Time: 1.21 ms Query memory: 0.060 MB Rows returned: 2

Unpatched, no badcats:
1- Query Time: 2.00 ms Query memory: 0.069 MB Rows returned: 2

Patched, no badcats:
1- Query Time: 0.75 ms Query memory: 0.053 MB Rows returned: 0
2- Query Time: 1.31 ms Query memory: 0.060 MB Rows returned: 2

Test site with test data, "Most read" page:
Thanks to the "ignore" state we can crush some ms even on small sites

Unpatched:
1- Query Time: 2.40 ms Query memory: 0.083 MB Rows returned: 5

Patched:
1- Query Time: 0.81 ms Query memory: 0.067 MB Rows returned: 1
2- Query Time: 1.47 ms Query memory: 0.081 MB Rows returned: 5

See here for further details and other tests.

avatar pierinz pierinz - open - 28 Dec 2014
avatar jissues-bot jissues-bot - change - 28 Dec 2014
Labels Added: ?
avatar brianteeman brianteeman - change - 28 Dec 2014
Category Components SQL
avatar pierinz
pierinz - comment - 29 Dec 2014

You're right, i should replace "parameters" with "states".

avatar alikon
alikon - comment - 30 Dec 2014

test condition: MySQLi 5.6.16, PHP 5.5.9, OS Windows, Articles 6793
tested frontpage queries with mod_articles_popular published (with mysqlslap -c 10 -i 10)

before patch:
Benchmark
Average number of seconds to run all queries: 28.227 seconds
Minimum number of seconds to run all queries: 23.559 seconds
Maximum number of seconds to run all queries: 34.753 seconds
Number of clients running queries: 10
Average number of queries per client: 35

patch applyed

Benchmark
Average number of seconds to run all queries: 51.800 seconds
Minimum number of seconds to run all queries: 42.263 seconds
Maximum number of seconds to run all queries: 59.858 seconds
Number of clients running queries: 10
Average number of queries per client: 37

@test failed


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/5544.

avatar alikon alikon - test_item - 30 Dec 2014 - Tested unsuccessfully
avatar pierinz
pierinz - comment - 30 Dec 2014

@alikon: can you post the info about the two queries in debug mode? In addition, I'd like to see the benchmark with and without unpublished categories. Thanks.

I have about the same number of articles, but my tests were on PHP 5.4 and MySQL on Linux. This evening I'll test again and provide full details.

avatar alikon
alikon - comment - 30 Dec 2014

@pierinz my test is without unpublished categories and I've tested all the queries (35/37) executed in the frontpage with the module published


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/5544.

avatar pierinz
pierinz - comment - 30 Dec 2014

Test environment:
Database version: 5.5.5-10.0.15-MariaDB-log
PHP Version: 5.5.19-pl0-gentoo
Apache + php-fpm on Gentoo Linux (i7-3610QM, 8GB RAM)
Articles: 5936
4 unpublished categories

Unpatched:
44 Queries Logged 602.1 ms

Most read module:
Query time: 398.93 ms

Patched
44 Queries Logged 489.6 ms

Most read module:
Query time: 0.87 ms + 282.97 ms

I'll try mysqlslap later, I've never used it before.

avatar pierinz
pierinz - comment - 30 Dec 2014

Unpatched

mysqlslap --user=mysqlslap --password=mysqlslap --query=Desktop/unpatched.sql --delimiter=";" --concurrency 1 --iterations 1
Benchmark
Average number of seconds to run all queries: 0.184 seconds
Minimum number of seconds to run all queries: 0.184 seconds
Maximum number of seconds to run all queries: 0.184 seconds
Number of clients running queries: 1
Average number of queries per client: 42

mysqlslap --user=mysqlslap --password=mysqlslap --query=Desktop/unpatched.sql --delimiter=";" --concurrency 10 --iterations 10
Benchmark
Average number of seconds to run all queries: 0.569 seconds
Minimum number of seconds to run all queries: 0.540 seconds
Maximum number of seconds to run all queries: 0.608 seconds
Number of clients running queries: 10
Average number of queries per client: 42

Patched
mysqlslap --user=mysqlslap --password=mysqlslap --query=Desktop/patched.sql --delimiter=";" --concurrency 1 --iterations 1

Benchmark
Average number of seconds to run all queries: 0.466 seconds
Minimum number of seconds to run all queries: 0.466 seconds
Maximum number of seconds to run all queries: 0.466 seconds
Number of clients running queries: 1
Average number of queries per client: 45

mysqlslap --user=mysqlslap --password=mysqlslap --query=Desktop/patched.sql --delimiter=";" --concurrency 10 --iterations 10
Benchmark
Average number of seconds to run all queries: 1.521 seconds
Minimum number of seconds to run all queries: 1.490 seconds
Maximum number of seconds to run all queries: 1.580 seconds
Number of clients running queries: 10
Average number of queries per client: 48

@alikon: mysqlslap tells you're right, but I can't understand why the joomla internal profiler shows different results. What about your joomla profiler?

avatar alikon
alikon - comment - 31 Dec 2014

Usually I don't look too much at joomla profile, but next year I'll give it a look ;) , cause I prefer to simulate a real use case, running more than one client at time is more close to the reality. I would have preferred that I was wrong


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/5544.

avatar pierinz
pierinz - comment - 1 Jan 2015

Finally, the answer.
Since we are using MySql on a test environment, after the first query the tables are in the mysql, ram and disk cache, and they will stay there for the whole test (as we recall the same data).
The patched and unpatched version yelds identical results when this data is cached. But the patched version use two queries (the unpatched has only 1) to perform the same operation for each model call, that's why it's slower (we can fix this).
The profiler shows different results because PHP loads lots of files to display the page, which may change the content of the disk cache.

On Linux there are some useful commands to erase ALL caches:
sync
echo 3 > /proc/sys/vm/drop_caches
blockdev --flushbufs /dev/sda
hdparm -F /dev/sda
then restart mysql service.

Results:
DbForge Studio Profiler
Unpatched: 2.702s
Patched: 0,088s + 2,219s

Mysqlslap
As we increase the iteration number, more data get cached and the time decreases.
Unpatched: mysqlslap --user=mysqlslap --password=mysqlslap --query=Desktop/unpatched.sql --delimiter=";" --concurrency 10 --iterations 10
Benchmark
Average number of seconds to run all queries: 0.669 seconds
Minimum number of seconds to run all queries: 0.214 seconds
Maximum number of seconds to run all queries: 4.634 seconds
Number of clients running queries: 10
Average number of queries per client: 42

Patched: mysqlslap --user=mysqlslap --password=mysqlslap --query=Desktop/patched.sql --delimiter=";" --concurrency 10 --iterations 10
Benchmark
Average number of seconds to run all queries: 0.628 seconds
Minimum number of seconds to run all queries: 0.186 seconds
Maximum number of seconds to run all queries: 4.502 seconds
Number of clients running queries: 10
Average number of queries per client: 48

Conclusion: on a shared hosting with many users we should get a huge improvement, with a dedicated server it should be almost the same. I should cache the result of the first query for all model calls in that request to decrease the queries.

avatar photodude
photodude - comment - 15 Dec 2015

~~???? ~~

avatar Kubik-Rubik
Kubik-Rubik - comment - 8 May 2016

Thank you for creating this, @pierinz. It’s been some time since you created this and there are now some merge conflicts that prevent a direct merge. Can you please update your PR?

avatar pierinz
pierinz - comment - 21 May 2016

I can, but in this time I wondered if this is the right approach to solve the slowness.
Caching the badcategories and ignoring useless fields are good things, but the big deal is sorting the content table.

I thought we should have a flag in the module settings to enable "really big site mode".
In this mode the module should use an additional table "most_read_table" with "article id" and "hits" fields, which should contain only a small number of rows (the module's count parameter, for example).
When any "hits" field of the content table is updated, a mysql trigger (or some hook in the article) should check if the new value is higher than any of the "most_read_table" rows, then insert a new row with the "article id" and remove the row with the lowest hits.
I'm sure this will have a negative impact on the performances, but still should be faster than sorting the articles table.

Anyone have any thoughts on this?

avatar photodude
photodude - comment - 21 May 2016

@mbabker @wilsonge do either of you have thoughts on this?

avatar mbabker
mbabker - comment - 21 May 2016

My honest opinion is that counter tables seem counter-intuitive to a system's design.

Also, database level triggers cannot be relied upon for a mass distributed application such as Joomla so any changes to implement this are going to have to be at the PHP layer, which is just going to add more processing time to article operations (especially page loads when hit counting is turned on as it's going to update both the content table and this "most_read_table").

avatar Bakual
Bakual - comment - 21 May 2016

Imho, the solution to solve slow modules is to cache the module output itself. If you only have to fetch the module data and render the output once every few hours, then the whole thing speeds up a lot ???? The good thing is, that feature is already there and can be turned on.
I certainly wouldn't add an own table for that module. That would be quite unique for a module.

avatar mbabker
mbabker - comment - 22 May 2016

Cache is only a partial solution. It helps speed up display of data, but for the request that builds the cache data it still needs to be optimized for the best performance we can give it. If that means making it possible for the module to execute a more performant query with the articles model then it's a win for that module and anyone using the model to do lookups.

We threw a cache layer in front of the stats API because the base request for all data takes several seconds (it takes a lot of resources to process 375K rows of data) but once that cache is built for the next 15 minutes it's a pretty instantaneous return. We also made performance tweaks like smaller queries (tradeoff being more had to be executed), shifting from returning stdClass objects to arrays (that did help a bit with the memory consumption, overall it allowed an additional iteration of our loop before hitting the server's memory limits), and ultimately using PHP 5.5's Generators (unfortunately that's not exactly an option here). Same logic needs to apply here in as B/C a manner as practical.

avatar Bakual
Bakual - comment - 22 May 2016

Yes sure, if the query can be optimised it's a no brainer to do it.

avatar PhilETaylor
PhilETaylor - comment - 7 Sep 2016

Yes sure, if the query can be optimised it's a no brainer to do it.

We need to be removing the SECONDS from the publish_up/down times in sql queries... been saying that for years, because else the query changes every second and this is then not cached by mysql caching layers :( :(

avatar brianteeman
brianteeman - comment - 7 Sep 2016

been saying that for years,

So submit a pull request

On 7 September 2016 at 21:22, Phil Taylor notifications@github.com wrote:

Yes sure, if the query can be optimised it's a no brainer to do it.

We need to be removing the SECONDS from the publish_up/down times in sql
queries... been saying that for years, because else the query changes
every second and this is then not cached by mysql caching layers :( :(


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#5544 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABPH8dpwxWzqAU3qBNXALRjJIOXWm8Odks5qnx0hgaJpZM4DMsXk
.

Brian Teeman
Co-founder Joomla! and OpenSourceMatters Inc.
http://brian.teeman.net/

avatar hans2103
hans2103 - comment - 4 Nov 2016

Wow! interesting PR. What are we going to do with this?

avatar csthomas
csthomas - comment - 10 Mar 2017

Please check competitive solution in PR #14439

avatar photodude
photodude - comment - 22 Mar 2017

Can this be closed in favor of PR #14439 ?

avatar pierinz
pierinz - comment - 22 Mar 2017

Yes, I like it.

avatar pierinz pierinz - change - 22 Mar 2017
Status Pending Closed
Closed_Date 0000-00-00 00:00:00 2017-03-22 21:33:49
Closed_By pierinz
avatar pierinz pierinz - close - 22 Mar 2017
avatar joomla-cms-bot joomla-cms-bot - change - 22 Mar 2017
Category Components SQL Front End com_content Modules Components

Add a Comment

Login with GitHub to post a comment