? Pending

User tests: Successful: Unsuccessful:

avatar regularlabs
regularlabs
21 Apr 2016

The COUNT(*) used in database queries is slow because it cannot use any indexes on the table.

This PR replaces these with a COUNT on the primary key where possible.

There are still some other occurrences of COUNT(*) in core, but these are in functions where the table is variable. So no (fast) way of knowing the primary key.

The changes in this PR are all pretty straightforward and can be reviewed by looking over the code changes.

avatar regularlabs regularlabs - open - 21 Apr 2016
avatar regularlabs regularlabs - change - 21 Apr 2016
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 21 Apr 2016
Labels Added: ?
avatar brianteeman
brianteeman - comment - 21 Apr 2016

Havent checked to make sure all those fields are a primary key but in principle this makes perfect sense - thanks

avatar brianteeman brianteeman - change - 21 Apr 2016
Category Code style SQL
avatar Bakual
Bakual - comment - 21 Apr 2016

I once read somewhere that COUNT(1) can be used instead of COUNT(*) to make it faster. Never tested that one but maybe it works for tables where there is no primary key?

avatar Fedik
Fedik - comment - 21 Apr 2016

The COUNT(*) used in database queries is slow because ...

I want to see the prove :smiley:

avatar richard67 richard67 - test_item - 21 Apr 2016 - Tested unsuccessfully
avatar richard67
richard67 - comment - 21 Apr 2016

I have tested this item :red_circle: unsuccessfully on 782bd2b

@regularlabs I found several mistakes, see my code comments in your PR.


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/10024.

avatar richard67
richard67 - comment - 21 Apr 2016

I started to research a bit what others say about count(*) and count(column) performance.

The first thing I found speaks against this PR: https://www.percona.com/blog/2007/04/10/count-vs-countcol/.

It tells exactly the opposite as this PR assumes:

It tells that "count(*)" is better because mysql is clever enough to include indexes and avoid full table scans.

So from my point of view this PR is wrong even if the mistakes I found will be corrected.


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/10024.

avatar smz
smz - comment - 21 Apr 2016

@richard67 Hi Richard! I see the article you cited is quite old and (it seems to me...) does cover MYISAM tables only. Is it possible that with INNODB tables the scenario could be different?

avatar smz
smz - comment - 21 Apr 2016

It seems it is the case: read down, in the comments:

Q:

In InnoDB, does a simple count() do anything evil with the database cache? I ask because some database engine’s I’ve seen don’t do caching carefully and tablescans (or count() that does a tablescan) can end up pulling the whole table into the cache (and pushing more useful pages out). Smarter db engines will just reuse the same buffer page to do table scans (after making sure it isn’t in-cache already), but many don’t.

A:

Right. Innodb does not have cached row count. so count(*) without where clause is slow with Innodb tables.

Edit: and more interesting stuff further down...

avatar richard67
richard67 - comment - 21 Apr 2016

@smz You are right, was for MyIsam. But I found another one which says count(*) is on InnoDB only bad if no where clause or bad indexes: http://stackoverflow.com/a/22628798.

I also found this one: http://kingori.co/minutae/2013/05/mysql-count-innodb/.

It also tells the main problem is if you have no where clause, but it recoommends changes similar to this PR here.

So maybe it is not a bad thing, ok.

But the statements corrected in this PR do have where clauses everywhere, and we have for those tables indexes on not-null columns, so this PR should not have any effect. But maybe @regularlabs can proof the opposite.

So or so the errors I commented on code in the PR on GitHub have to be solved, otherwise the count will be wrong for some cases.


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/10024.

avatar regularlabs
regularlabs - comment - 21 Apr 2016

@richard67 It doesn't matter much if the key is the actual primary key, as long as it is in the select clause.

If it is the primary key or an indexed key doesn't matter much. As long as MySQL can use the index.

This was meant to be a simple and quick PR.
The way it looks now it is turning into a complete discussion.
I have no interest/time in that, so someone else can pick this up if they want.

avatar regularlabs regularlabs - change - 21 Apr 2016
Status Pending Closed
Closed_Date 0000-00-00 00:00:00 2016-04-21 12:24:05
Closed_By regularlabs
avatar regularlabs regularlabs - close - 21 Apr 2016
avatar regularlabs regularlabs - close - 21 Apr 2016
avatar richard67
richard67 - comment - 21 Apr 2016

@regularlabs

It doesn't matter much if the key is the actual primary key, as long as it is in the select clause.
If it is the primary key or an indexed key doesn't matter much. As long as MySQL can use the index.

Correct, no matter which index, but in the cases I commented in the code, you definitely counted the wrong columns. This would have to be corrected if this PR should have been correct.

Was not my intention to discourage you with the other discussion. Sorry for that.

avatar regularlabs
regularlabs - comment - 21 Apr 2016

Thanks for kicking me anyway, as it seems I was wrong to assume that count(*) was always bad in the first place.
So this is one for the trashcan :)

avatar regularlabs regularlabs - head_ref_deleted - 21 Apr 2016

Add a Comment

Login with GitHub to post a comment