User tests: Successful: Unsuccessful:
The COUNT(*)
used in database queries is slow because it cannot use any indexes on the table.
This PR replaces these with a COUNT
on the primary key where possible.
There are still some other occurrences of COUNT(*)
in core, but these are in functions where the table is variable. So no (fast) way of knowing the primary key.
The changes in this PR are all pretty straightforward and can be reviewed by looking over the code changes.
Status | New | ⇒ | Pending |
Labels |
Added:
?
|
Category | ⇒ | Code style SQL |
I once read somewhere that COUNT(1)
can be used instead of COUNT(*)
to make it faster. Never tested that one but maybe it works for tables where there is no primary key?
The COUNT(*) used in database queries is slow because ...
I want to see the prove
I have tested this item unsuccessfully on 782bd2b
@regularlabs I found several mistakes, see my code comments in your PR.
I started to research a bit what others say about count(*) and count(column) performance.
The first thing I found speaks against this PR: https://www.percona.com/blog/2007/04/10/count-vs-countcol/.
It tells exactly the opposite as this PR assumes:
It tells that "count(*)" is better because mysql is clever enough to include indexes and avoid full table scans.
So from my point of view this PR is wrong even if the mistakes I found will be corrected.
@richard67 Hi Richard! I see the article you cited is quite old and (it seems to me...) does cover MYISAM tables only. Is it possible that with INNODB tables the scenario could be different?
It seems it is the case: read down, in the comments:
Q:
In InnoDB, does a simple count() do anything evil with the database cache? I ask because some database engine’s I’ve seen don’t do caching carefully and tablescans (or count() that does a tablescan) can end up pulling the whole table into the cache (and pushing more useful pages out). Smarter db engines will just reuse the same buffer page to do table scans (after making sure it isn’t in-cache already), but many don’t.
A:
Right. Innodb does not have cached row count. so count(*) without where clause is slow with Innodb tables.
Edit: and more interesting stuff further down...
@smz You are right, was for MyIsam. But I found another one which says count(*) is on InnoDB only bad if no where clause or bad indexes: http://stackoverflow.com/a/22628798.
I also found this one: http://kingori.co/minutae/2013/05/mysql-count-innodb/.
It also tells the main problem is if you have no where clause, but it recoommends changes similar to this PR here.
So maybe it is not a bad thing, ok.
But the statements corrected in this PR do have where clauses everywhere, and we have for those tables indexes on not-null columns, so this PR should not have any effect. But maybe @regularlabs can proof the opposite.
So or so the errors I commented on code in the PR on GitHub have to be solved, otherwise the count will be wrong for some cases.
@richard67 It doesn't matter much if the key is the actual primary key, as long as it is in the select clause.
If it is the primary key or an indexed key doesn't matter much. As long as MySQL can use the index.
This was meant to be a simple and quick PR.
The way it looks now it is turning into a complete discussion.
I have no interest/time in that, so someone else can pick this up if they want.
Status | Pending | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2016-04-21 12:24:05 |
Closed_By | ⇒ | regularlabs |
It doesn't matter much if the key is the actual primary key, as long as it is in the select clause.
If it is the primary key or an indexed key doesn't matter much. As long as MySQL can use the index.
Correct, no matter which index, but in the cases I commented in the code, you definitely counted the wrong columns. This would have to be corrected if this PR should have been correct.
Was not my intention to discourage you with the other discussion. Sorry for that.
Thanks for kicking me anyway, as it seems I was wrong to assume that count(*) was always bad in the first place.
So this is one for the trashcan :)
Havent checked to make sure all those fields are a primary key but in principle this makes perfect sense - thanks