Joomla! Issue Tracker | Joomla! CMS #18273 - Fix 2 issues with srcset handling in SEF plugin, finding multiple image URLs and also handling of URLs that contain commas

Closed
13 Oct 2017
Medium
Build: staging
# 18273
Diff
ggppdk:patch-50

Pending

User tests: Successful: Unsuccessful:

ggppdk
7 Oct 2017

Pull Request for Issue #18269

Summary of Changes

Fix 2 problems described in the above issue

Comma is a legitimate character in URLs it should be handled
Restoring multiple image URLs after handling should be comma + whitespace for browser to identify them

Testing Instructions

See cases in issue #18269

Expected result

Actual result

Documentation Changes Required

Votes

# of Users Experiencing Issue

1/1

Average Importance Score

5.00

983cb0d 7 Oct 2017

Fix URLs with commas in srcset being broken

5657e0b 7 Oct 2017

Also add the needed whitespace when restoring srcset value for multiple images

joomla-cms-bot - change - 7 Oct 2017

Category

⇒

Front End Plugins

ggppdk - open - 7 Oct 2017

ggppdk - change - 7 Oct 2017

Status

New

⇒

Pending

0eaf34a 7 Oct 2017

Update sef.php

ggppdk - change - 7 Oct 2017

Labels

Added: ?

ggppdk - comment - 7 Oct 2017

// A whitespace space character must follow a comma when defining multiple images inside 'srcset'

Can someone confirm the above ?

OlegKi - comment - 8 Oct 2017

@ggppdk Thank you for the pull request! The main problem, which I reported is solved by changing tow lines included in your PR. On the other side another line of the same code fragment can still produce wrong results. I described the problem in the comment including the pattern which can be easy used to reproduce the bug.

franz-wohlkoenig - comment - 8 Oct 2017

@OlegKi please mark your Test as successfully.

Please open Issue Tracker
Login with your github-Account
Click on blue "Test this"-Button above Authors-Picture
mark your Test as successfully
hit "submit test result"

OlegKi - test_item - 8 Oct 2017 - Tested successfully

OlegKi - comment - 8 Oct 2017

I have tested this item ✅ successfully on 0eaf34a

Thank you for the pull request! The main problem, which I reported is solved by changing tow lines included in your PR. On the other side another line of the same code fragment can still produce wrong results. I described the problem in the comment including the pattern which can be easy used to reproduce the bug.

_{This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/18273.}

OlegKi - comment - 8 Oct 2017

@ggppdk I reread the description of srcset in the HTML specification and I would say that the usage of

preg_split("/,\s+/u", $match[1])

would be incorrect for the common case. The problem is that allowed elements of srcset could be URL without a width descriptor (like 123w) or a pixel density descriptor (like 1.5x or 2x). In the case the HTML specification allows to use no whitespace before the URL. See

its value must consist of one or more image candidate strings, each separated from the next by a U+002C COMMA character (,). If an image candidate string contains no descriptors and no ASCII whitespace after the URL, the following image candidate string, if there is one, must begin with one or more ASCII whitespace.

In other words, the examples like

src="url1 1.5x,url2, url3 2x"

would be allowed value, but \s+, which you use, requires at least one whitespace.

Because URLs (and so the image candidate strings too) could not contain whitespaces (see here) then it would be probably better to split the value of srcset by whitespaces instead of commas (something like preg_split("/\s+/u", $match[1])). One will get the URLs, width descriptors (like 123w) or pixel density descriptors (like 1.5x or 2x) appended optional comma or URLs prepended with a descriptor (like 1.5x,url2). The descriptors are easy to parse and one could detect the special case 1.5x,url2. After all one will get pure image candidate strings (URLs) and pure descriptors. The full parsing could be a little longer, but the main idea is: start with spiting by whitespaces instead of splitting by commas.

Now one can use parse_url to parse URLs. One should apply the changes from the pull request #15300 only for URLs without scheme (http:, https:, ...) and other first parts of URL and only probably if the path part of URL isn't started with slash (/).

One more option would be to use RegEx ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? from Appendix B of rfc3986, which will be suggeseted to parse URLs. One can easy modify the RegEx to parse full srcset. The only problem is that the Appendix B are removed from later redaction of rfc3986 and I don't know the exact reason of that.

P.S.: To tell the trust I don't full understand all cases addressed by the pull request #15300. Currently I just commented the block of code and srcset will by displayed correctly without any modifications. I never use related paths in srcset and thus I haven't any need any modifications of srcset. I think that it would be better to have no modifications of srcset as to have wrong modifications.

ggppdk - comment - 10 Oct 2017

Maybe when i get some time
i will try to replace the code and use regular expression
or someone else can do this work

ryandemmer - comment - 13 Oct 2017

This might work:

if (strpos($buffer, 'srcset=') !== false) 
{
	$regex = '#\s+srcset="([^"]+)"#m';
			
	$buffer = preg_replace_callback(
		$regex,
		function ($match) use ($base, $protocols)
		{
			$data = array();
										
			preg_match_all('#((?:[^\s]+)\s*([\d\.]+[wx])?,*)#m', $match[1], $matches);
					
			foreach($matches[1] as $src) 
			{
				// trim spaces
				$src = trim($src);
					
				// remove comma seperator
				$src = trim($src, ",");
						
				$url = $src;
				$descriptor = '';
						
				// assign url and descriptor from match
				if (strpos(' ', $src) !== false) 
				{
					list($url, $descriptor) = explode(' ', $src);
				}

				// process url
				$url = preg_replace('#^(?!/|' . $protocols . '|\#|\')(.+)#', $base . '$1', trim($url));
						
				// join url and descriptor by space and trim
				$data[] = trim(implode(' ', array($url, $descriptor)));
			}

			return ' srcset="' . implode(", ", $data) . '"';
		},
		$buffer
	);	

	$this->checkBuffer($buffer);
}

ggppdk - comment - 13 Oct 2017

@ryandemmer

thanks , can you make a PR ?
you have spent more time studing this than me
i will happily close this PR in favour of the new one

ggppdk - comment - 13 Oct 2017

@ryandemmer

thanks , closing

ggppdk - change - 13 Oct 2017

Status	Pending	⇒	Closed
Closed_Date	0000-00-00 00:00:00	⇒	2017-10-13 11:44:36
Closed_By		⇒	ggppdk

ggppdk - close - 13 Oct 2017

Add a Comment

Older
Newer

Joomla! Issue Tracker - CMS

[#18273] - Fix 2 issues with srcset handling in SEF plugin, finding multiple image URLs and also handling of URLs that contain commas

Summary of Changes

Testing Instructions

Expected result

Actual result

Documentation Changes Required

Votes

Add a Comment