? ? ? Pending

User tests: Successful: Unsuccessful:

avatar brianteeman
brianteeman
26 Jun 2022

While a domain name is not case sensitive the rest of the url is. Joomla itself only creates lowercase urls. But what happens if you manually type a url with uppercase characters? In Joomla 3 the correct lowercase url is rendered but in j4 you get a 404. This is a b/c break with j3 that does not need to exist. You might have uppercase urls in your content because you manually typed them or you might have external sites linking to you that use uppercase links.

Summary of Changes

convert a url to lowercase perfore parsing.

Testing Instructions

Install sample data
Visit the following urls
/sample-layouts/category-list/your-template
/sample-layouts/category-list/YOUR-TEMPLATE
/sample-layouts/CATEGORY-LIST/your-template
/SAMPLE-LAYOUTS/category-list/your-template

Actual result BEFORE applying this Pull Request

Only the first url works - the rest produce a 404

Expected result AFTER applying this Pull Request

All urls work

Notes

  1. The imposter syndrome in me says that this fix is too simple and I must have missed something
  2. There should be unit tests for this but I have no idea how.

cc @Hackwar could you take a look as you probably know this code better than anyone

Votes

# of Users Experiencing Issue
1/2
Average Importance Score
5.00

avatar brianteeman brianteeman - open - 26 Jun 2022
avatar brianteeman brianteeman - change - 26 Jun 2022
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 26 Jun 2022
Category Libraries
avatar Webdongle Webdongle - test_item - 26 Jun 2022 - Tested successfully
avatar Webdongle
Webdongle - comment - 26 Jun 2022

I have tested this item successfully on eec53b0

Full success


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/38145.

avatar Abernyte-Git Abernyte-Git - test_item - 26 Jun 2022 - Tested successfully
avatar Abernyte-Git
Abernyte-Git - comment - 26 Jun 2022

I have tested this item successfully on eec53b0


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/38145.

avatar samlatif samlatif - test_item - 26 Jun 2022 - Tested successfully
avatar samlatif
samlatif - comment - 26 Jun 2022

I have tested this item successfully on eec53b0

Full success


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/38145.

avatar Twincarb
Twincarb - comment - 26 Jun 2022

I haven't tested but how is this going to affect urls that are similar to

domain.com/members/jobs/view-open-jobs?conn=view-open-jobs&uid=ghg89GD3GDD45GKK89gkkfd


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/38145.
avatar brianteeman
brianteeman - comment - 26 Jun 2022

I ssuggesst that you test it

avatar Hackwar
Hackwar - comment - 26 Jun 2022

This is wrong. A URL with differing capitalization is not the same URL.

avatar brianteeman
brianteeman - comment - 26 Jun 2022

@Hackwar I guess you didnt actually read the description of this PR.

avatar Hackwar
Hackwar - comment - 26 Jun 2022

I did and with 4.0 we solved a lot of situations where different URLs returned the same content, this one being among them. This is deliberate and I can guarantee you, that we will get more people complaining about "duplicate content" because of this PR than we will have for the current behavior.

avatar brianteeman
brianteeman - comment - 26 Jun 2022

It is deliberately wrong then

avatar brianteeman
brianteeman - comment - 26 Jun 2022

This is deliberate and I can guarantee you, that we will get more people complaining about "duplicate content" because of this PR than we will have for the current behavior.

I can count the number of times I have seen a complaint about the j3 behaviour on one finger.

I can count the number of times I have seen a complaint about an undocumented and unnessary break in existing behaviour on the fingers and toes of the entire production department and then will probably still run out of digits to count on.

avatar Fedik
Fedik - comment - 27 Jun 2022

I agree with @Hackwar
It is better do not do it, even if it simple 😉

In theory for this kind of behavior Devs can create a custom plugin if someone need.

avatar brianteeman
brianteeman - comment - 27 Jun 2022

Except it has been the behaviour of Joomla for 10+ years to be case insensitive and this is a b/c break that is just not necessary. PS check what other CMS do.

avatar Fedik
Fedik - comment - 27 Jun 2022

Unfortunately not that easy.
To support this behavior correctly, there 2 way:

  1. The page must porvide rel=canonical tag with a correct link (addittionaly to changes in this PR)
  2. It should be redirect 301 redirect (in theory this can be done in htaccess also)
avatar Fedik
Fedik - comment - 27 Jun 2022

Here quick plugin that implement redirect:

public function onAfterInitialise()
{
	$uri = Uri::getInstance();
	$lPath = Joomla\String\StringHelper::strtolower($uri->getPath());
	if ($lPath !== $uri->getPath())
	{
		$uri->setPath($lPath);
		Factory::getApplication()->redirect($uri, 301);
	}
}
avatar brianteeman
brianteeman - comment - 27 Jun 2022

i really dont care. i was asked to create this to restore previous behaviour

avatar Fedik
Fedik - comment - 27 Jun 2022

You can share this piece of plugin if someone ask you again :)

avatar Webdongle
Webdongle - comment - 27 Jun 2022

This is bs. Even @brianteeman is now experiencing the resistance by devs to use common sense.

avatar laoneo
laoneo - comment - 27 Jun 2022

Just wondering, can an extension not create urls with upper case?

avatar Fedik
Fedik - comment - 27 Jun 2022

can an extension not create urls with upper case?

yes they can, in their own router: UPPERCASE, camelCase, snake_case, any case ;)

upd: just noticed such changes will be b/c for extensions which use one of this,
but I did not seen any such in the wild

avatar joomla-bot
joomla-bot - comment - 27 Jun 2022

This pull requests has been automatically converted to the PSR-12 coding standard.

avatar HLeithner HLeithner - change - 27 Jun 2022
Labels Added: ? ? ?
avatar simbus82
simbus82 - comment - 30 Jun 2022

This is wrong. A URL with differing capitalization is not the same URL.

Technically true, but is not we want to let Jooma do in a website.
No one, webmasters, marketers, agencies and website users want this.

It's a world and mass standard to use lower case and stop confusing users, engines and agencies.

It's right that these urls must point to the same content/page:

  • /sample-layouts/category-list/your-template
  • /sample-layouts/category-list/YOUR-TEMPLATE
  • /sample-layouts/CATEGORY-LIST/your-template
  • /SAMPLE-LAYOUTS/category-list/your-template

Obviously with the correct canonical and redirect management said by @Fedik.
But it must be a default core url treatment, not a plugin or a choice.

We have an opportunity to improve one of Joomla's biggest flaws (canonicalization), let's do it!

Surely the first step is to prevent the occurrence of 404s, we must remember that these things cannot change if I am upgrading from J3 to J4, any admin assumes that the behavior is at least the same, if not better.

PS: The url parameters are another thing, and they must actually remain untouched.

avatar brianteeman
brianteeman - comment - 30 Jun 2022

looking at wp they work the same as in j3
urls are case insensitive
no 302 redirects

avatar Fedik
Fedik - comment - 30 Jun 2022

They probably add "canonical" link tag

avatar brianteeman
brianteeman - comment - 30 Jun 2022

They probably add "canonical" link tag

they do that on everything - I doubt it is specific to this case insentive stuff.

@weeblr your advice would be good here. I'm just trying to ensure that links which worked in 3 still work in 4

avatar Fedik
Fedik - comment - 30 Jun 2022

they do that on everything - I doubt it is specific to this case insentive stuff.

yeah, can be

avatar Fedik
Fedik - comment - 30 Jun 2022

Maybe we can add that redirect code in to our System - SEF plugin,
That more easy solution, unless I missed something

avatar simbus82
simbus82 - comment - 30 Jun 2022

They probably add "canonical" link tag

they do that on everything - I doubt it is specific to this case insentive stuff.

@weeblr your advice would be good here. I'm just trying to ensure that links which worked in 3 still work in 4

Exactly, WP generate a canonical tag for all pages so they can solve lot of problem about category pages, tags list pages, paging issues, etc. and be "good" for search engines.

We can do it better in Joomla ;-)

A canonical, added to the j3 behavior, is a good first step for the "case sensitive" problem, because maybe a 301 redirect can be unwanted and can give an erroneous signal to the search engine.

avatar brianteeman
brianteeman - comment - 30 Jun 2022

Maybe we can add that redirect code in to our System - SEF plugin,

its not a redirect

avatar Fedik
Fedik - comment - 30 Jun 2022

its not a redirect

yes, I know,
I mean that from 2 possible solutions, for Joomla redirect will be more easy to implement

avatar weeblr
weeblr - comment - 30 Jun 2022

@brianteeman: @Hackwar is right in that these URLs are different and Google (not Bing) do consider them separate. it's always been an issue in Joomla that it treats them the same (just like it does for something and something/ which are also 2 different URLs and pages).

BUT:

I do think it's more important to keep these links working in one way or the other:

Links typed-in by users in the browser

No SEO impact, but we want users to see a result. I don't think many people type in URLs though, so not a major factor.

Internal links

Potential issue here, if authors have embedded wrongly cased links, they were working in J3 and now broken in J4. Bad visitor experience. Possible SEO impact if SEO work has been done on internal linking (content hub, "silos"). These should be kept working.

Backlinks (found on other sites)

Links found by Google on external sites (backlincks) are those with an immediate large SEO impact.

If you start returning 404s for them, they'll be discounted. It's more important, SEO-wise, that they keep sending authority to the page, even if that page is considered a duplicate of another one, than losing the backlink entirely.

B/C matters here and I agree these different URLs should be kept working on J4. Ideally, we should automatically inject the correct canonical (Like others, I'd advise against a redirect, which is too strong a signal and would have bad SEO consequences if the redirect target is wrong)

The difficulty here is that the lowercased version of the page is sometimes going to be the correct canonical, and sometimes not.

For instance, the canonical for

/tags/books/SCI-FY/some-book

is not going to be

/tags/books/sci-fy/some-book

but instead:

/books/novels/some-book

That's why the canonical injected by the SEF plugin is often wrong and most SEO extensions disable or remove it.

Not sure what's the best move here. I'd advise against just blindly adding the lowercase version of the requested URL as a canonical.

At the very least, make it go through a hook so that extensions can modify it before it's injected.

@simbus82

Exactly, WP generate a canonical tag for all pages so they can solve lot of problem about category pages, tags list pages, paging issues, etc. and be "good" for search engines.

We can do it better in Joomla ;-)

Actually WordPress has a much easier job here because each and every piece of content has a unique URL from the start, so finding the canonical is trivial, you just read it from the database. Joomla routing is different and does not yield a valid canonical so easily in the real world.

Yannick

avatar brianteeman
brianteeman - comment - 30 Jun 2022

Thanks @weeblr appreciate your time and expertise commenting here.

Links typed-in by users in the browser

I don't think many people type in URLs though, so not a major factor.

Actually this is the one that hit me. A pdf magazine contained clickable links and because it looked nicer they had displayed the urls uppercase and the pdf generator automatically turned them into uppercase links

avatar weeblr
weeblr - comment - 30 Jun 2022

Still likely not the most common use case here, but it all points in the same direction. I agree in all cases, we should show the proper content, and canonicalize to the proper URL. It's just that finding the canonical is not always that easy.

avatar Webdongle
Webdongle - comment - 30 Jun 2022

As an ordinary user this poses an interesting dilemma
According to the World Wide Web Consortium ... TLD's are converted to lower case but everything after the TLD should be case sensitive.
According to Google ... URL's are treated as case sensitive.

Questions
Has Joomla been doing it wrong all this time?
If so then is the major version change (J3 to J4) the correct time to correct it?
So what then about backward compatibility?

From my point of view as an ordinary user
Joomla should conform to World Wide Web Consortium standards. If users have hard coded links with CamelCase then perhaps an option (or plugin) could give them the choice of backward compatibility.

avatar weeblr
weeblr - comment - 30 Jun 2022

Has Joomla been doing it wrong all this time?
No, the standards saying servers, browsers and other system must interpret URLs path as case sensitive is a different thing than users (ie Joomla) deciding how they want visitors to access pages.
Same for search engines, Google decided they'll consider different pages based on case, while Bing consider them the same.

  • WordPress and Joomla both create URLs that can be either all lower or mixed cases.
  • WordPress and Joomla both decided they accept a URL request without regards for the case. WordPress adds a canonical pointing at the "official" URL, Joomla does not.

Accepting different-case URLs was dropped in J4, creating a B/C break.

Joomla should conform to World Wide Web Consortium standards.

What Joomla does is not defined by the standards. It's up to software applications to decide how they want to respond to requests, so that's not a deciding factor here.

If users have hard coded links with CamelCase then perhaps an option (or plugin) could give them the choice of backward compatibility

Hardcoded backlinks are a minor part of this. Backlinks are the most important ones for any site, and backlinks are not under the control of the site owners. So I think Joomla should accept requests regardless of the path case.

What should be discussed is the addition of a canonical, which would help SEO significantly but is hard to do in practice for the general case. So that (canonical injection) I think should be opt-out, and compatible with extensions adding their own canonicals.

avatar simbus82
simbus82 - comment - 1 Jul 2022

As an ordinary user this poses an interesting dilemma According to the World Wide Web Consortium ... TLD's are converted to lower case but everything after the TLD should be case sensitive. According to Google ... URL's are treated as case sensitive.

Questions Has Joomla been doing it wrong all this time? If so then is the major version change (J3 to J4) the correct time to correct it? So what then about backward compatibility?

From my point of view as an ordinary user Joomla should conform to World Wide Web Consortium standards. If users have hard coded links with CamelCase then perhaps an option (or plugin) could give them the choice of backward compatibility.

Do not confuse the technical characteristics of the communication "medium" in comparison to what is expected for a website in 2022.

URLs (the slug/path part) are technically case-sensitive because is ALLOWED (if you want) to identify different resources simply by differentiating an upper case from a lower case.

But this is not the case with a CMS that has to manage contents that must be usable, indexable and user friendly.

The fact that in a website /mywebpage and /myWebpage need to point to the same resource (with the appropriate signals for search engines, such as a canonical or in extremis a 301 redirect) it is vastly confirmed by SEO and UX.

avatar simbus82
simbus82 - comment - 8 Jul 2022

I did a small survey with some Italian SEO colleagues.
The question, following some premises on the casing of parameters and query strings, was:

Should a CMS treat URL path/slug in a case-sensitive or case-insensitive way?
So, will /abc and /ABC show the same content?

71% replied that they need to show the same content with a 301 redirect.
29% replied that they must show the same content, with a canonical (lower case version of course), but without redirects.

No one replied that the two URLs should be treated as different content and no one replied that it is better to let the user decide.

image

avatar Fedik
Fedik - comment - 10 Jul 2022

Please check alternative version #38249

avatar brianteeman
brianteeman - comment - 10 Jul 2022

closing as the alternative is a better solution

avatar brianteeman brianteeman - close - 10 Jul 2022
avatar brianteeman brianteeman - change - 10 Jul 2022
Status Pending Closed
Closed_Date 0000-00-00 00:00:00 2022-07-10 09:27:10
Closed_By brianteeman
avatar weeblr
weeblr - comment - 11 Jul 2022

Alternative is not a valid solution. Will post there.

Add a Comment

Login with GitHub to post a comment