User tests: Successful: Unsuccessful:
While a domain name is not case sensitive the rest of the url is. Joomla itself only creates lowercase urls. But what happens if you manually type a url with uppercase characters? In Joomla 3 the correct lowercase url is rendered but in j4 you get a 404. This is a b/c break with j3 that does not need to exist. You might have uppercase urls in your content because you manually typed them or you might have external sites linking to you that use uppercase links.
convert a url to lowercase perfore parsing.
Install sample data
Visit the following urls
/sample-layouts/category-list/your-template
/sample-layouts/category-list/YOUR-TEMPLATE
/sample-layouts/CATEGORY-LIST/your-template
/SAMPLE-LAYOUTS/category-list/your-template
Only the first url works - the rest produce a 404
All urls work
cc @Hackwar could you take a look as you probably know this code better than anyone
Status | New | ⇒ | Pending |
Category | ⇒ | Libraries |
I have tested this item
I have tested this item
Full success
I haven't tested but how is this going to affect urls that are similar to
domain.com/members/jobs/view-open-jobs?conn=view-open-jobs&uid=ghg89GD3GDD45GKK89gkkfd
I ssuggesst that you test it
This is wrong. A URL with differing capitalization is not the same URL.
I did and with 4.0 we solved a lot of situations where different URLs returned the same content, this one being among them. This is deliberate and I can guarantee you, that we will get more people complaining about "duplicate content" because of this PR than we will have for the current behavior.
It is deliberately wrong then
This is deliberate and I can guarantee you, that we will get more people complaining about "duplicate content" because of this PR than we will have for the current behavior.
I can count the number of times I have seen a complaint about the j3 behaviour on one finger.
I can count the number of times I have seen a complaint about an undocumented and unnessary break in existing behaviour on the fingers and toes of the entire production department and then will probably still run out of digits to count on.
Except it has been the behaviour of Joomla for 10+ years to be case insensitive and this is a b/c break that is just not necessary. PS check what other CMS do.
Unfortunately not that easy.
To support this behavior correctly, there 2 way:
Here quick plugin that implement redirect:
public function onAfterInitialise()
{
$uri = Uri::getInstance();
$lPath = Joomla\String\StringHelper::strtolower($uri->getPath());
if ($lPath !== $uri->getPath())
{
$uri->setPath($lPath);
Factory::getApplication()->redirect($uri, 301);
}
}
i really dont care. i was asked to create this to restore previous behaviour
You can share this piece of plugin if someone ask you again :)
This is bs. Even @brianteeman is now experiencing the resistance by devs to use common sense.
Just wondering, can an extension not create urls with upper case?
can an extension not create urls with upper case?
yes they can, in their own router: UPPERCASE, camelCase, snake_case, any case ;)
upd: just noticed such changes will be b/c for extensions which use one of this,
but I did not seen any such in the wild
This pull requests has been automatically converted to the PSR-12 coding standard.
Labels |
Added:
?
?
?
|
This is wrong. A URL with differing capitalization is not the same URL.
Technically true, but is not we want to let Jooma do in a website.
No one, webmasters, marketers, agencies and website users want this.
It's a world and mass standard to use lower case and stop confusing users, engines and agencies.
It's right that these urls must point to the same content/page:
Obviously with the correct canonical and redirect management said by @Fedik.
But it must be a default core url treatment, not a plugin or a choice.
We have an opportunity to improve one of Joomla's biggest flaws (canonicalization), let's do it!
Surely the first step is to prevent the occurrence of 404s, we must remember that these things cannot change if I am upgrading from J3 to J4, any admin assumes that the behavior is at least the same, if not better.
PS: The url parameters are another thing, and they must actually remain untouched.
looking at wp they work the same as in j3
urls are case insensitive
no 302 redirects
They probably add "canonical" link tag
they do that on everything - I doubt it is specific to this case insentive stuff.
yeah, can be
Maybe we can add that redirect code in to our System - SEF plugin,
That more easy solution, unless I missed something
They
probablyadd "canonical" link tagthey do that on everything - I doubt it is specific to this case insentive stuff.
@weeblr your advice would be good here. I'm just trying to ensure that links which worked in 3 still work in 4
Exactly, WP generate a canonical tag for all pages so they can solve lot of problem about category pages, tags list pages, paging issues, etc. and be "good" for search engines.
We can do it better in Joomla ;-)
A canonical, added to the j3 behavior, is a good first step for the "case sensitive" problem, because maybe a 301 redirect can be unwanted and can give an erroneous signal to the search engine.
Maybe we can add that redirect code in to our System - SEF plugin,
its not a redirect
its not a redirect
yes, I know,
I mean that from 2 possible solutions, for Joomla redirect will be more easy to implement
@brianteeman: @Hackwar is right in that these URLs are different and Google (not Bing) do consider them separate. it's always been an issue in Joomla that it treats them the same (just like it does for something
and something/
which are also 2 different URLs and pages).
BUT:
I do think it's more important to keep these links working in one way or the other:
No SEO impact, but we want users to see a result. I don't think many people type in URLs though, so not a major factor.
Potential issue here, if authors have embedded wrongly cased links, they were working in J3 and now broken in J4. Bad visitor experience. Possible SEO impact if SEO work has been done on internal linking (content hub, "silos"). These should be kept working.
Links found by Google on external sites (backlincks) are those with an immediate large SEO impact.
If you start returning 404s for them, they'll be discounted. It's more important, SEO-wise, that they keep sending authority to the page, even if that page is considered a duplicate of another one, than losing the backlink entirely.
B/C matters here and I agree these different URLs should be kept working on J4. Ideally, we should automatically inject the correct canonical (Like others, I'd advise against a redirect, which is too strong a signal and would have bad SEO consequences if the redirect target is wrong)
The difficulty here is that the lowercased version of the page is sometimes going to be the correct canonical, and sometimes not.
For instance, the canonical for
/tags/books/SCI-FY/some-book
is not going to be
/tags/books/sci-fy/some-book
but instead:
/books/novels/some-book
That's why the canonical injected by the SEF plugin is often wrong and most SEO extensions disable or remove it.
Not sure what's the best move here. I'd advise against just blindly adding the lowercase version of the requested URL as a canonical.
At the very least, make it go through a hook so that extensions can modify it before it's injected.
Exactly, WP generate a canonical tag for all pages so they can solve lot of problem about category pages, tags list pages, paging issues, etc. and be "good" for search engines.
We can do it better in Joomla ;-)
Actually WordPress has a much easier job here because each and every piece of content has a unique URL from the start, so finding the canonical is trivial, you just read it from the database. Joomla routing is different and does not yield a valid canonical so easily in the real world.
Yannick
Thanks @weeblr appreciate your time and expertise commenting here.
Links typed-in by users in the browser
I don't think many people type in URLs though, so not a major factor.
Actually this is the one that hit me. A pdf magazine contained clickable links and because it looked nicer they had displayed the urls uppercase and the pdf generator automatically turned them into uppercase links
Still likely not the most common use case here, but it all points in the same direction. I agree in all cases, we should show the proper content, and canonicalize to the proper URL. It's just that finding the canonical is not always that easy.
As an ordinary user this poses an interesting dilemma
According to the World Wide Web Consortium ... TLD's are converted to lower case but everything after the TLD should be case sensitive.
According to Google ... URL's are treated as case sensitive.
Questions
Has Joomla been doing it wrong all this time?
If so then is the major version change (J3 to J4) the correct time to correct it?
So what then about backward compatibility?
From my point of view as an ordinary user
Joomla should conform to World Wide Web Consortium standards. If users have hard coded links with CamelCase then perhaps an option (or plugin) could give them the choice of backward compatibility.
Has Joomla been doing it wrong all this time?
No, the standards saying servers, browsers and other system must interpret URLs path as case sensitive is a different thing than users (ie Joomla) deciding how they want visitors to access pages.
Same for search engines, Google decided they'll consider different pages based on case, while Bing consider them the same.
Accepting different-case URLs was dropped in J4, creating a B/C break.
Joomla should conform to World Wide Web Consortium standards.
What Joomla does is not defined by the standards. It's up to software applications to decide how they want to respond to requests, so that's not a deciding factor here.
If users have hard coded links with CamelCase then perhaps an option (or plugin) could give them the choice of backward compatibility
Hardcoded backlinks are a minor part of this. Backlinks are the most important ones for any site, and backlinks are not under the control of the site owners. So I think Joomla should accept requests regardless of the path case.
What should be discussed is the addition of a canonical, which would help SEO significantly but is hard to do in practice for the general case. So that (canonical injection) I think should be opt-out, and compatible with extensions adding their own canonicals.
As an ordinary user this poses an interesting dilemma According to the World Wide Web Consortium ... TLD's are converted to lower case but everything after the TLD should be case sensitive. According to Google ... URL's are treated as case sensitive.
Questions Has Joomla been doing it wrong all this time? If so then is the major version change (J3 to J4) the correct time to correct it? So what then about backward compatibility?
From my point of view as an ordinary user Joomla should conform to World Wide Web Consortium standards. If users have hard coded links with CamelCase then perhaps an option (or plugin) could give them the choice of backward compatibility.
Do not confuse the technical characteristics of the communication "medium" in comparison to what is expected for a website in 2022.
URLs (the slug/path part) are technically case-sensitive because is ALLOWED (if you want) to identify different resources simply by differentiating an upper case from a lower case.
But this is not the case with a CMS that has to manage contents that must be usable, indexable and user friendly.
The fact that in a website /mywebpage and /myWebpage need to point to the same resource (with the appropriate signals for search engines, such as a canonical or in extremis a 301 redirect) it is vastly confirmed by SEO and UX.
I did a small survey with some Italian SEO colleagues.
The question, following some premises on the casing of parameters and query strings, was:
Should a CMS treat URL path/slug in a case-sensitive or case-insensitive way?
So, will /abc and /ABC show the same content?
71% replied that they need to show the same content with a 301 redirect.
29% replied that they must show the same content, with a canonical (lower case version of course), but without redirects.
No one replied that the two URLs should be treated as different content and no one replied that it is better to let the user decide.
closing as the alternative is a better solution
Status | Pending | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2022-07-10 09:27:10 |
Closed_By | ⇒ | brianteeman |
Alternative is not a valid solution. Will post there.
I have tested this item✅ successfully on eec53b0
Full success
This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/38145.