No Code Attached Yet bug
avatar AndySDH
AndySDH
11 Oct 2021

What needs to be fixed

In the fields_values table, for some reason, the item_id is of type VARCHAR instead of being an INT.

The explanation given in Table Structure comment is:
"Allow references to items which have strings as ids, eg. none db systems."

I tried to trace back to the reason for this, and I couldn't find anything, it seems this traces all the way back to the original Custom Fields PR by @laoneo: #11833, and wasn't ever caught/noticed by anyone.

The explanation is ambiguous (not sure what it means), and I can't seem to find a reason why this column should ever contain strings. It always contains just the ID of the component item (eg. article id, user id, contact id, etc) which is always a numeric value.

Why this should be fixed

An important reason why this should be fixed is that even core queries are built treating item_id as an INT (without adding $db->quote around it), while it's actually a VARCHAR In the database.

This causes performance issues: trying to compare an integer value against a varchar column, means that the index won't be used on the query. See this for more info: https://shatteredsilicon.net/blog/2021/07/11/mariadb-mysql-performance-tuning-optimization-data-types/

For example, let's look at this file in Joomla 4:
administrator\components\com_fields\src\Model\FieldModel.php
(could be just one of many files)

the WHERE clause on item_id wrongly expects it to be an INT (without quotes around the value), so the index is not used on the query, making it unnecessarily slower.

How would you fix it

There are really two ways to go about it. (not a fan of option "a")

a) keeping the item_id column as it is, making sure that all queries on item_id are done with $db->quote around it so that is treated as a string.

b) change the column type of item_id to an INT, and have the queries treat item_id as an INT as well.

My vote is for option "b"), because again, I can't quite find any possible reason for when you would ever need non-numeric values as item_id. It's always a number.

Side Effects expected

Not many. Maybe extensions developers that treated item_id as a string in the queries might want to now treat it as an INT to take advantage of the improved performance, but otherwise, nothing will break.

We would have to do this for new installations AND for updates.

We can do this on Joomla 4 and leave Joomla 3 behind, or we can also do it in both Joomla 3 and Joomla 4 (no harm really).

avatar AndySDH AndySDH - open - 11 Oct 2021
avatar joomla-cms-bot joomla-cms-bot - change - 11 Oct 2021
Labels Added: No Code Attached Yet
avatar joomla-cms-bot joomla-cms-bot - labeled - 11 Oct 2021
avatar AndySDH AndySDH - change - 11 Oct 2021
The description was changed
avatar AndySDH AndySDH - edited - 11 Oct 2021
avatar AndySDH AndySDH - change - 11 Oct 2021
The description was changed
avatar AndySDH AndySDH - edited - 11 Oct 2021
avatar AndySDH AndySDH - change - 11 Oct 2021
The description was changed
avatar AndySDH AndySDH - edited - 11 Oct 2021
avatar AndySDH AndySDH - change - 11 Oct 2021
The description was changed
avatar AndySDH AndySDH - edited - 11 Oct 2021
avatar AndySDH AndySDH - change - 11 Oct 2021
The description was changed
avatar AndySDH AndySDH - edited - 11 Oct 2021
avatar laoneo
laoneo - comment - 11 Oct 2021

I understand the performance point, but when you reference entities of external systems, then the ID there is often a string. Making it an integer now can badly break 3rd party extensions.

avatar AndySDH
AndySDH - comment - 11 Oct 2021

Thanks @laoneo, can you make an example of custom fields being used for external system by 3rd party extensions where the ID is a string? In my mind I can't think of any, so I'm curious to know what you refer to.

avatar laoneo
laoneo - comment - 11 Oct 2021

For example DPCalendare integrates google calendar events where the id is a string https://developers.google.com/calendar/api/v3/reference/events#resource

avatar laoneo
laoneo - comment - 11 Oct 2021

Or also MS Exchange/Microsoft365 events.

avatar AndySDH
AndySDH - comment - 12 Oct 2021

So you made it this way for the only purpose of supporting your own extension... =/

avatar laoneo
laoneo - comment - 13 Oct 2021

No, just for every developer. You wanted an example, I gave you one as I can only speak for myself.

avatar AndySDH
AndySDH - comment - 13 Oct 2021

Yeah but an ID is usually a numeric auto-increment value in a component.

In DPCalendar you could have probably had an auto-increment id column as the primary id of an entity, and a secondary 'external-id' that pointed to the google-calendar string.

Same would go for other developers as well.

So that in the fields_values table, item_id would reference the primary (numeric) id and could universally be an INT.

Honestly, a non-numeric primary ID of an entity is something extremely rare or bad design. At least in Joomla because it's Joomla we're talking about. It's something that I've never seen for any Joomla extension I've ever used in 15 years

avatar laoneo
laoneo - comment - 15 Oct 2021

You really have to let it go that web sites are monolythic in itself closed systems. The web is connecting each other.

avatar AndySDH
AndySDH - comment - 15 Oct 2021

Well, whatever. If we can't change item_id to be an INT (solution "B"), we still have a problem to fix.

So solution "A" in my opening post.

We have to fix the queries that use item_id, and make them treat item_id as a string. Otherwise, the index is not used in the queries, making them unnecessarily much slower, because they're comparing against the wrong data type.

avatar laoneo
laoneo - comment - 21 Oct 2021

Fixing the queries would be a welcome addition 👍

avatar salitygen
salitygen - comment - 16 Feb 2022

I want to supplement!

It is very difficult to work with the #__fields_values table, I am currently developing a material filter in which this table slows down the filter. The problem is that the category_id columns are simply needed in the table, This is necessary in order to make a selection already when you are in the parent category, a situation arises when you need to filter the materials of child categories, here a problem arises, you need to make a join from the table #__fields_categories, which may NOT contain child categories, but only the parent. Or Make a selection of materials by category, and then join #__fields_values to this selection by linking the material id and item_id fields from #__fields_values Such selections consume a lot of resources! Since we have to list all the IDs of materials from child categories.

There may be a conflict with the category IDs of other components, so the context field is also needed the same as in the table #__fields

The category id column should contain the actual category of the material, not the selected category in the field settings.
If you do, I will give joomla users a real, fast filter!

avatar brianteeman
brianteeman - comment - 16 Feb 2022

Don't forget that you will need to convert all the existing data

avatar salitygen
salitygen - comment - 16 Feb 2022

Don't forget that you will need to convert all the existing data

Maybe just a new table with a transfer from the old one? ))))

avatar brianteeman
brianteeman - comment - 16 Feb 2022

It would still need to be done however you do it

avatar laoneo
laoneo - comment - 17 Feb 2022

Or you can create an aggregated view, no need to duplicate the data.

avatar Hackwar Hackwar - change - 22 Feb 2023
Labels Added: bug
avatar Hackwar Hackwar - labeled - 22 Feb 2023
avatar AndySDH
AndySDH - comment - 17 Jun 2023

Hello, any update on an improvement with this?

If we can't change item_id to be an INT (solution "B"), we still have a problem to fix.

So solution "A" in my opening post.

We have to fix the queries that use item_id, and make them treat item_id as a string. Otherwise, the index is not used in the queries, making them unnecessarily much slower, because they're comparing against the wrong data type.

avatar Scrabble96
Scrabble96 - comment - 6 Feb 2024

In addition to the above, the fields_values table does not have a primary ID. Shouldn't all tables have one?

avatar Scrabble96
Scrabble96 - comment - 6 Feb 2024

ALSO, I have tried to join the fields_values to the content table by converting the content.id to a string, using:

INNER JOIN m2f0c_fields_values ON
CAST(m2f0c_content.id AS CHAR(5)) = m2f0c_fields_values.item_id

but get a red message telling me

"Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_unicode_ci,IMPLICIT) for operation '=' "

which implies that the database is a mixture of general and unicode types.
The database I'm using was new for a reference copy of Joomla 5.

@AndySDH's option a) above might work within the Joomla environment, but I need to do some filtering in the database first, as it's not possible - as far as I know - to sort by custom field value in Joomla.

avatar salitygen
salitygen - comment - 7 Feb 2024

In addition to the above, the fields_values table does not have a primary ID. Shouldn't all tables have one?

in this case, a unique identifier is unnecessary and even harmful! since the unique identifier has the property of ending

avatar Fedik
Fedik - comment - 14 Feb 2024

There are really two ways to go about it.

Not really, I add option c:
c) Add a new content_id column, and use it instead of item_id. And drop item_id somewehre in future.

avatar Hackwar
Hackwar - comment - 21 Feb 2024

I encountered the same issue with a customer a few days ago. I have around 2 million rows in that table and running my queries on the table took more than 60 seconds for just that one query. I first converted the column to integer and that reduced the time to a little bit more than 30 seconds. However, when I added a multi-column-key across item_id and field_id, it reduced this to around 12 seconds. Maybe that key could help at least a little bit as well.

The code would be

ALTER TABLE `d58fc_fields_values`
	ADD INDEX `idx_field_id_item_id` (`field_id`, `item_id`);
avatar richard67
richard67 - comment - 2 Mar 2024

There are really two ways to go about it. (not a fan of option "a")

a) keeping the item_id column as it is, making sure that all queries on item_id are done with $db->quote around it so that is treated as a string.

@AndySDH The core uses as far as I can see prepared statements and binds the item_id already as a string and handles it also as a string in PHP. $db->quote should only be used with not-prepared statements, of which I haven't found any in the core for that column. The only place I could find where the core handles it as an integer is in the Blog Sample Data plugin when it inserts that author field which is used in that sample data.

So for b/c reasons I would go with solution a.

Regarding the indexes: In J5 index lengths in MySQL/MariaDB are not limited anymore to 191 characters with utf8mb4, so we can change the existing index on that column to use the full length, and we can add the combined index over the field_id and index_id columns.

avatar richard67
richard67 - comment - 2 Mar 2024

I encountered the same issue with a customer a few days ago. I have around 2 million rows in that table ...

@Hackwar Is that site J4? Or J5? If J4: Could you update a copy of that site to J5 (assuming the database requirement is fulfilled)?
And then, when J5:
What happens on your test site with the 2 million field values if you revert the change of the item_id column to integer so it's again a varchar(255), modify the existing index so that it uses the full length of that column and not only 191 characters and add a combined index like you suggested, all in following way because the database checker would complain if you use the same name for the index which you modify:

ALTER TABLE `#__fields_values` DROP INDEX `idx_item_id`;
ALTER TABLE `#__fields_values` DROP INDEX `idx_field_id_item_id`;
ALTER TABLE `#__fields_values` MODIFY `item_id` varchar(255) NOT NULL;
ALTER TABLE `#__fields_values` ADD INDEX `idx_item_id_2` (`item_id`);
ALTER TABLE `#__fields_values` ADD INDEX `idx_field_id_item_id` (`field_id`, `item_id`);

Is the performance of the CMS core similar as with your modifications? Or at least better than the unmodified core?

avatar Hackwar
Hackwar - comment - 3 Mar 2024

#__fields_values table contains 2 million rows and my testing query contains 5 joins on that table to get different field values, among them a group_concat to get a multi-value field.

Running that query without any modifications ran for 14 minutes and then I aborted it.

Running that query with column changed to int takes 33.37s.

Running that query with index added over field_id and item_id ran for 6 minutes and I also aborted.

Running that query with column changed to int and index over field_id and item_id takes 16.78s.

Result set each time is supposed to be 100k rows.

I think the result is pretty clear. We will have to change the type of the column.

Add a Comment

Login with GitHub to post a comment