No Code Attached Yet bug
avatar ceford
ceford
18 Jul 2023

Steps to reproduce the issue

Inspect the code: components/com_banners/src/Models/BannersModel.php Line 196
$regexp = '[[:<:]]' . $keyword . '[[:>:]]';
For MySQL 8.x and MariDB 10.x it needs to be '\\b' . $keyword . '\\b'

Edited: neglected to quote code.

Expected result

Actual result

System information (as much as possible)

Additional comments

avatar ceford ceford - open - 18 Jul 2023
avatar ceford ceford - change - 18 Jul 2023
Labels Removed: ?
avatar joomla-cms-bot joomla-cms-bot - change - 18 Jul 2023
Labels Added: No Code Attached Yet
avatar joomla-cms-bot joomla-cms-bot - labeled - 18 Jul 2023
avatar richard67
richard67 - comment - 18 Jul 2023

For MySQL 8.x and MariDB 10.x it needs to be '\b' . $keyword . '\b'

@ceford How do you know that? Is there some documentation about that? If so, could you link to it? Or did you have some issue which is solved by the suggested change? If so: Could you provide some instructions on how to reproduce it? Thanks in advance.

avatar richard67 richard67 - change - 18 Jul 2023
Labels Added: Information Required
avatar richard67 richard67 - labeled - 18 Jul 2023
avatar richard67
richard67 - comment - 18 Jul 2023

According to this document it should still work with MariaDB: https://mariadb.com/kb/en/regular-expressions-overview/

Word boundaries

The [:<:] and [:>:] patterns match the beginning and the end of a word respectively.

avatar brianteeman
brianteeman - comment - 18 Jul 2023

Prior to MySQL 8.0.4, MySQL used the Henry Spencer regular expression library to support regular expression operations, rather than International Components for Unicode (ICU).”
The [[:<:]] and [[:>:]] word boundary markers are only supported by the Spencer library, for ICU it would be \b instead.

https://dev.mysql.com/doc/refman/8.0/en/regexp.html

avatar richard67
richard67 - comment - 18 Jul 2023

What I found for MariaDB on the page linked in my previous comment is:

MariaDB starting with 10.0.5

Until MariaDB 10.0.5, MariaDB used the POSIX 1003.2 compliant regular expression library. The new PCRE library is mostly backwards compatible with what is described below - see the PCRE Regular Expressions article for the enhancements made in 10.0.5.

So it seems they now support \b, too.

avatar richard67
richard67 - comment - 18 Jul 2023

The thing is that in J4 we still have to support MySQL versions older than 8. And PostgreSQL seems to use \b for the backspace character and not for word boundaries.

avatar richard67 richard67 - change - 18 Jul 2023
Labels Removed: Information Required
avatar richard67 richard67 - unlabeled - 18 Jul 2023
avatar ceford
ceford - comment - 18 Jul 2023

A simple way to test whether a sequence of numbers contains a specific number (corrected typo):

SELECT '1,2,3,11,22,33,111,222,333' REGEXP CONCAT('[[:<:]]', '11', '[[:>:]]'),
'1,2,3,11,22,33,111,222,333' REGEXP CONCAT('[[:<:]]', '1111', '[[:>:]]')

MySQL said:
3685 - Illegal argument to a regular expression.

SELECT '1,2,3,11,22,33,111,222,333' REGEXP CONCAT('\\b', '11', '\\b'),
'1,2,3,11,22,33,111,222,333' REGEXP CONCAT('\\b', '1111', '\\b')

Returns 1 and 0

Documentation: https://dev.mysql.com/doc/refman/8.0/en/regexp.html#regexp-compatibility

avatar ceford
ceford - comment - 18 Jul 2023

With MariaDB 10.3.38 the first test returns 0 and 0. The second test returns 1 and 0. Joomla 5 requires MariaDB 10.4 which I have yet to install.

avatar richard67
richard67 - comment - 22 Jul 2023

If we want to avoid database type and version specific code in the CMS, we could extend the database query classes in the framework by new methods to return the right regexp pattern like e.g. for word boundaries (and whatever else might be used in the CMS with the regexp method of the query), so we could use e.g. $query->regexpPatternWordStart() and $query->regexpPatternWordEnd() (any ideas for shorter names are welcome), or we could use something like $query->regexpPatternWord($keyword) to prepend the word start and append the word end pattern, and similar for other use cases.

What do you all think?

Update: As far as I can see, only the banners model uses the regexp method of the query object.

avatar ceford ceford - change - 22 Jul 2023
The description was changed
avatar ceford ceford - edited - 22 Jul 2023
avatar richard67
richard67 - comment - 22 Jul 2023

A simple way to test whether a sequence of numbers contains a specific number:

SELECT '1,2,3,11,22,33,111,222,333' REGEXP CONCAT('[[:>:]]', '11', '[[:>:]]'),
'1,2,3,11,22,33,111,222,333' REGEXP CONCAT('[[:>:]]', '1111', '[[:>:]]')

With MariaDB 10.3.38 the first test returns 0 and 0. The second test returns 1 and 0. Joomla 5 requires MariaDB 10.4 which I have yet to install.

@ceford Shouldn't it use '[[:<:]]' for the word start, so it is as follows?

 SELECT '1,2,3,11,22,33,111,222,333' REGEXP CONCAT('[[:<:]]', '11', '[[:>:]]'),
'1,2,3,11,22,33,111,222,333' REGEXP CONCAT('[[:<:]]', '1111', '[[:>:]]')

Please test again with the right query.

avatar richard67
richard67 - comment - 22 Jul 2023

A possible fix in the CMS (4.3-dev branch, but should be merged all the way up) could look as follows:

j4-j5-issue-41183_possible-fix-1

See 4.3-dev...richard67:4.3-dev-fix-banners-model-regexp-query .

Code for that can be found here: https://github.com/richard67/joomla-cms/blob/321c5f2846136cc3147e46b0671bcd5640a48f82/components/com_banners/src/Model/BannersModel.php

As far as I can see we don't have to do such stuff anywhere else in the CMS up to now.

But a solution in the database framework would maybe be more elegant, so we could use something like this:
$regexp = $query->regexpWord($keyword);

avatar ceford
ceford - comment - 23 Jul 2023

Please test again with the right query.

Yes: that was a typo - I had updated my localhost db. Test on live site using MySQL 5.x:

SELECT '1,2,3,11,22,33,111,222,333' REGEXP CONCAT('[[:<:]]', '11', '[[:>:]]'),
'1,2,3,11,22,33,111,222,333' REGEXP CONCAT('[[:<:]]', '1111', '[[:>:]]');

Returns 1 and 0.

I don't have a test for the Banners instance. I don't use banners. I came across the problem in a custom component. I have about 8 instances there so your proposed solution would involve a lot of code repetition. A utility function would be better.

avatar richard67
richard67 - comment - 23 Jul 2023

Yes: that was a typo

@ceford Was that a typo just here in the cited code, and your above tests for MariaDB were made without the typo? Or were they made with the typo?

avatar richard67
richard67 - comment - 23 Jul 2023

On MariaDB 10.6 both ways '[[:<:]]'...'[[:>:]]' and '\\b'...'\\b'seem to work and give the expected result (1 and 0).

On MySQL 8.0.33 I get the same result as you reported, SQL error "Illegal argument to a regular expression" with '[[:<:]]'...'[[:>:]]', and expected result with '\\b'...'\\b'.

avatar ceford
ceford - comment - 23 Jul 2023

Apologies for my clumsiness - seems I tested MariaDB with my typo. New test:

Server version: 10.3.38-MariaDB-0ubuntu0.20.04.1 - Ubuntu 20.04

SELECT '1,2,3,11,22,33,111,222,333' REGEXP CONCAT('[[:<:]]', '11', '[[:>:]]'),
'1,2,3,11,22,33,111,222,333' REGEXP CONCAT('[[:<]]', '1111', '[[:>:]]');

MariaDB returns 1 and 0

The //b word boundary marker also returns 1 and 0.

avatar Hackwar Hackwar - change - 24 Aug 2023
Labels Added: bug
avatar Hackwar Hackwar - labeled - 24 Aug 2023

Add a Comment

Login with GitHub to post a comment