It looks like for Joomla, uppercased letters in URLs are the same as lowercased ones. So, for example, it's the same accessing any of the following URLs
misite.com/blog
misite.com/Blog
misite.com/BLOG
misite.com/bLog
...
That would be fine for SEO if Joomla canonical URL would be generated correctly, and all the above would "point" to, for example, the lowercased version, misite.com/blog. But unfortunately, that's not the case, and any of the variants above will produce a different canonical URL.
Pick any Joomla site and try to access any page with different combinations of uppercase and lowercase letters of the same URL.
Ideally, I would expect that different URLs would point to different pages, returning a 404 or automatically redirecting to the correct one when the wrong URL is entered. And upper/lowercase letters should be different. Still, as I suspect this might be something related to the server, from a SEO perspective it would be enough if all combinations of upper/lowercase letters of a given url would "default" to one, set as canonical.
The same page is loaded if you change any lowercase letter in the URL to uppercase, or viceversa. And even worse: canonical URL also keeps that change, instead of pointing to only one of the possible combinations (ie, the full lowercase one).
Status | New | ⇒ | Information Required |
so i think this depends on the component
as of J3.5.0 Joomla does not force a REL canonical and lets the component handle it
but this is not a real issue unless the component (or extension) is generating such URLs
that is: it is generating urls with different casing, do you have steps to follow to see such an effect ?
only way for google to see such URLs is to manually copy a URL modify the casing and then add it to a web-page
Afaik, in core Joomla SEF plugin will add a rel=canonical in every page generated by the CMS AS LONG as the canonical domain is setup in the plugin settings (actually, it's the only parameter available in the plugin). Also, as ggppdk pointed, when the domain is setup the plugin will check for component specific canonical URLs, in case there's one already created by the component being loaded. Everything's ok until this point.
I can see 2 issues here:
1) Joomla router sees no difference at all between upper/lowercase letters in URLs. It will route any URL created with any combinations of upper/lowercase letters to the very same content, which is definitely wrong from a SEO perspective. Yeah, these URLs should be manually created, and the impact of this issue should be small, but this doesn't mean this is not an issue for SEO. Imagine that someone manually enters a wrong URL in an advert, with one misplaced capitalised letter; it will still route to the "right content", but it will effectively create 2 different URLs leading to the same content, aka duplicated content for Google... not good.
2) The second issue I see is that the one above wouldn't be a real issue IF all possible combinations of URLs would be "pointed" to a canonical one. The issue is that canonical URLs generated by Joomla core SEF plugin won't take this into account. There's a perfect example in community site. Check these 2 URLs:
https://community.joomla.org/Translations.html
https://community.joomla.org/translations.html
Both show the same exact content. Now inspect the code, and look for the <link ... rel="canonical">
tag in the head; you'll see they're different for each URL. These are generated by Joomla core. Now, in the same site, check these 2 URLs
https://community.joomla.org/user-groups.html
https://community.joomla.org/User-Groups.html
in this case, you can check that the <link ... rel="canonical">
tag points ALWAYS to the "full lowercase" version. In this case, it's generated by the component, and it's doing it the right way, avoiding generation of duplicated content.
I hope I was able to explain why I believe this is an issue now ;).
I understand your issue, but I disagree that this should be changed in core. Joomla by default only generates lowercase URLs and while you can cheat the system here and get mixed-case URLs, you already know what you are doing and should also know the consequences. At the same time, you are talking about duplicate content. Somehow people have started to believe that duplicate content means any content that is displayed under 2 somehow different URLs and that Google would penalize that. That is not the case. You can not have duplicate content on the same domain. https://community.joomla.org/user-groups.html
and https://community.joomla.org/User-Groups.html are on the same domain and even though they have the same content, they will not make Google penalize joomla.org. Otherwise you would expect Google to also understand what part of the site is the actual page-specific content and the difference between blog lists and the actual article for example. (Yes, you can do that, but it is pointless to implement all that heuristic, which will also fail in way to many cases, when the base issue is not present. duplicate content)
Last but not least, if someone were to manually create a URL and have mixed cases, then he or she already can enter that URL in the redirect plugin and let it point to the lowercase version.
If this is still an issue for you, then I would suggest creating a plugin that checks for mixed case URLs and then redirects to the lower case version. I would suggest closing this one.
Status | Information Required | ⇒ | Closed - No Reply |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2017-05-27 09:57:00 |
Closed_By | ⇒ | franz-wohlkoenig |
Set to "closed" on behalf of @franz-wohlkoenig by The JTracker Application at issues.joomla.org/joomla-cms/10877
This has been closed due to lack of response to the requests above – it can always be reopened in the future if it is updated.
Which canonical URLs are you speaking of? The only ones that can be generated by Joomla are done by the SEF plugin and ONLY deal with a different domain with same structure.