? ? Pending

User tests: Successful: Unsuccessful:

avatar csthomas
csthomas
2 Jul 2018

Summary of Changes

Better/faster regular expression for plugin "Content - Email Cloaking". No other changes.

Take a look at #11353 for more information.

Testing Instructions

Check if the email address between the tags is properly masked/cloaked.

E.g. <span title="xxx@example.com"><strong>xxx@example.com</strong></span>

Only email between tags should be masked as before PR.

Test time improvement

Execute a few php commands:

$a = array('<p><img src="data:image/png;base64,', '" alt="image" /></p>');
$b = implode(bin2hex(openssl_random_pseudo_bytes(50000)), $a);
$searchEmail = '([\w\.\'\-\+]+\@(?:[a-z0-9\.\-]+\.)+(?:[a-zA-Z0-9\-]{2,10}))';

$pattern_old = '~(?![^<>]*>)' . $searchEmail . '~i';
$pattern_new = '~<[^<]*>(*SKIP)(*F)|' . $searchEmail . '~i';

$t=microtime(1);preg_match($pattern_old, $b, $regs, PREG_OFFSET_CAPTURE);printf("%.5F", microtime(1)-$t);

$t=microtime(1);preg_match($pattern_new, $b, $regs, PREG_OFFSET_CAPTURE);printf("%.5F", microtime(1)-$t);

Expected result

No changes, only time improvement.

Actual result

preg_match on line 491 runs slowly.

/*
* Search for plain text email addresses, such as email@example.org but not within HTML tags:
* <img src="..." title="email@example.org"> or <input type="text" placeholder="email@example.org">
* The negative lookahead '(?![^<]*>)' is used to exclude this kind of occurrences
*/
$pattern = '~(?![^<>]*>)' . $searchEmail . '~i';
while (preg_match($pattern, $text, $regs, PREG_OFFSET_CAPTURE))

Documentation Changes Required

No

avatar csthomas csthomas - open - 2 Jul 2018
avatar csthomas csthomas - change - 2 Jul 2018
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 2 Jul 2018
Category Front End Plugins
avatar ggppdk ggppdk - test_item - 3 Jul 2018 - Tested successfully
avatar ggppdk
ggppdk - comment - 3 Jul 2018

I have tested this item successfully on 324f140

Thanks !!!
wanted to take time to look into this but neglected it

Current performance of this regular expression is just terrible, O(N*N) for N length of HTML body
Running in pages with a lot of HTML is terribly slow, and can even result in timeouts

After PR it is O(N)


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/20956.

avatar csthomas csthomas - change - 4 Jul 2018
Labels Added: ?
avatar csthomas csthomas - change - 19 Jul 2018
The description was changed
avatar csthomas csthomas - edited - 19 Jul 2018
avatar andrepereiradasilva
andrepereiradasilva - comment - 19 Jul 2018

hi nice performance improvement!

in theory similiar improvement could be used in several places with negative lookahead to improve performance.

example the ampReplace method (used in many uri generation - https://github.com/joomla/joomla-cms/search?q=ampReplace) method (https://github.com/joomla-framework/filter/blob/master/src/OutputFilter.php#L160)
in that case maybe (didn't test) this regex would work '/&[&#](*SKIP)(*F)|&[\w]+;(*SKIP)(*F)|&/'

the sef plugin (https://github.com/joomla/joomla-cms/blob/staging/plugins/system/sef/sef.php#L126) also has several negative lookahead regex

and surely many other places in the joomla core

avatar alikon alikon - test_item - 20 Jul 2018 - Tested successfully
avatar alikon
alikon - comment - 20 Jul 2018

i've Tested successfully this PR but unable to report the test result on github even it's ok on the issue tracker
https://issues.joomla.org/tracker/joomla-cms/20956#event-374308

avatar csthomas
csthomas - comment - 20 Jul 2018

Thanks all for comments and tests. It is difficult to find testers these days :)

@ggppdk Can you mark the test as a success after the last code change (only comment)?

avatar alikon
alikon - comment - 20 Jul 2018

testers are like "pandas" ... in extinction ?

avatar ggppdk ggppdk - test_item - 20 Jul 2018 - Tested successfully
avatar ggppdk
ggppdk - comment - 20 Jul 2018

I have tested this item successfully on 1344295


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/20956.

avatar Quy Quy - change - 20 Jul 2018
Status Pending Ready to Commit
avatar Quy
Quy - comment - 20 Jul 2018

RTC


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/20956.

avatar mbabker mbabker - change - 21 Jul 2018
Status Ready to Commit Fixed in Code Base
Closed_Date 0000-00-00 00:00:00 2018-07-21 01:03:27
Closed_By mbabker
Labels Added: ?
avatar mbabker mbabker - close - 21 Jul 2018
avatar mbabker mbabker - merge - 21 Jul 2018

Add a Comment

Login with GitHub to post a comment