?
avatar isidrobaq
isidrobaq
20 Jun 2016

It looks like for Joomla, uppercased letters in URLs are the same as lowercased ones. So, for example, it's the same accessing any of the following URLs

misite.com/blog
misite.com/Blog
misite.com/BLOG
misite.com/bLog
...

That would be fine for SEO if Joomla canonical URL would be generated correctly, and all the above would "point" to, for example, the lowercased version, misite.com/blog. But unfortunately, that's not the case, and any of the variants above will produce a different canonical URL.

Steps to reproduce the issue

Pick any Joomla site and try to access any page with different combinations of uppercase and lowercase letters of the same URL.

Expected result

Ideally, I would expect that different URLs would point to different pages, returning a 404 or automatically redirecting to the correct one when the wrong URL is entered. And upper/lowercase letters should be different. Still, as I suspect this might be something related to the server, from a SEO perspective it would be enough if all combinations of upper/lowercase letters of a given url would "default" to one, set as canonical.

Actual result

The same page is loaded if you change any lowercase letter in the URL to uppercase, or viceversa. And even worse: canonical URL also keeps that change, instead of pointing to only one of the possible combinations (ie, the full lowercase one).

avatar isidrobaq isidrobaq - open - 20 Jun 2016
avatar infograf768
infograf768 - comment - 20 Jun 2016

Which canonical URLs are you speaking of? The only ones that can be generated by Joomla are done by the SEF plugin and ONLY deal with a different domain with same structure.

avatar brianteeman brianteeman - change - 20 Jun 2016
Status New Information Required
avatar ggppdk
ggppdk - comment - 20 Jun 2016
  • I see the effect in a popular component (it adds such rel canonical URLs as you described)
  • in Joomla component views the rel canonical is ? not added (which means that rel canonical is the current URL ... thus it is effectively similar to the above)
  • i don't see the effect on my component

so i think this depends on the component

  • as it is expected because the REL canonical is handled by the component

as of J3.5.0 Joomla does not force a REL canonical and lets the component handle it

  • i think a simple way to fix is to force the joomla views to make (built-in) components always add one ? or maybe to the check of current URL ?

but this is not a real issue unless the component (or extension) is generating such URLs
that is: it is generating urls with different casing, do you have steps to follow to see such an effect ?

only way for google to see such URLs is to manually copy a URL modify the casing and then add it to a web-page

avatar isidrobaq
isidrobaq - comment - 21 Jun 2016

Afaik, in core Joomla SEF plugin will add a rel=canonical in every page generated by the CMS AS LONG as the canonical domain is setup in the plugin settings (actually, it's the only parameter available in the plugin). Also, as ggppdk pointed, when the domain is setup the plugin will check for component specific canonical URLs, in case there's one already created by the component being loaded. Everything's ok until this point.

I can see 2 issues here:

1) Joomla router sees no difference at all between upper/lowercase letters in URLs. It will route any URL created with any combinations of upper/lowercase letters to the very same content, which is definitely wrong from a SEO perspective. Yeah, these URLs should be manually created, and the impact of this issue should be small, but this doesn't mean this is not an issue for SEO. Imagine that someone manually enters a wrong URL in an advert, with one misplaced capitalised letter; it will still route to the "right content", but it will effectively create 2 different URLs leading to the same content, aka duplicated content for Google... not good.

2) The second issue I see is that the one above wouldn't be a real issue IF all possible combinations of URLs would be "pointed" to a canonical one. The issue is that canonical URLs generated by Joomla core SEF plugin won't take this into account. There's a perfect example in community site. Check these 2 URLs:

https://community.joomla.org/Translations.html
https://community.joomla.org/translations.html

Both show the same exact content. Now inspect the code, and look for the <link ... rel="canonical"> tag in the head; you'll see they're different for each URL. These are generated by Joomla core. Now, in the same site, check these 2 URLs

https://community.joomla.org/user-groups.html
https://community.joomla.org/User-Groups.html

in this case, you can check that the <link ... rel="canonical"> tag points ALWAYS to the "full lowercase" version. In this case, it's generated by the component, and it's doing it the right way, avoiding generation of duplicated content.

I hope I was able to explain why I believe this is an issue now ;).

avatar Hackwar
Hackwar - comment - 4 Mar 2017

I understand your issue, but I disagree that this should be changed in core. Joomla by default only generates lowercase URLs and while you can cheat the system here and get mixed-case URLs, you already know what you are doing and should also know the consequences. At the same time, you are talking about duplicate content. Somehow people have started to believe that duplicate content means any content that is displayed under 2 somehow different URLs and that Google would penalize that. That is not the case. You can not have duplicate content on the same domain. https://community.joomla.org/user-groups.html
and https://community.joomla.org/User-Groups.html are on the same domain and even though they have the same content, they will not make Google penalize joomla.org. Otherwise you would expect Google to also understand what part of the site is the actual page-specific content and the difference between blog lists and the actual article for example. (Yes, you can do that, but it is pointless to implement all that heuristic, which will also fail in way to many cases, when the base issue is not present. duplicate content)

Last but not least, if someone were to manually create a URL and have mixed cases, then he or she already can enter that URL in the redirect plugin and let it point to the lowercase version.

If this is still an issue for you, then I would suggest creating a plugin that checks for mixed case URLs and then redirects to the lower case version. I would suggest closing this one.

avatar franz-wohlkoenig franz-wohlkoenig - change - 27 May 2017
The description was changed
Status Information Required Closed - No Reply
Closed_Date 0000-00-00 00:00:00 2017-05-27 09:57:00
Closed_By franz-wohlkoenig
avatar joomla-cms-bot joomla-cms-bot - change - 27 May 2017
The description was changed
avatar joomla-cms-bot joomla-cms-bot - edited - 27 May 2017
avatar joomla-cms-bot joomla-cms-bot - close - 27 May 2017
avatar joomla-cms-bot
joomla-cms-bot - comment - 27 May 2017
avatar franz-wohlkoenig
franz-wohlkoenig - comment - 27 May 2017

This has been closed due to lack of response to the requests above – it can always be reopened in the future if it is updated.

Add a Comment

Login with GitHub to post a comment