The Joomla SEF plugin has a canonical URL feature that seems to be both poorly documented (it doesn't even say the word "canonical" in the Joomla backend) and it is implemented in a way that will create more SEO problems than it solves.
My suggestions below will not make this perfect for all websites, but it should dramatically improve SEO for a majority of Joomla websites that are using the built-in SEF URL functionality. And for websites that use 3rd party components for ecommerce or publishing, those users will likely be using a more powerful URL management system anyway (SH404SEF, etc).
This feature request is to simply fix a pretty big SEO issue that the Joomla Core code creates.
Add three new options to the SEF Plugin to give website owners more control over their canonical URLs. Screenshot attached.
Simpler solution:
More complex solution:
If a Joomla user has SEF URLs and rewrites turned off, then they do not need this option.
But with mod_rewrite on and SEF URLs enabled, most query parameters on a standard Joomla installation are for analytics or sorting/filtering. (to my knowledge)
More details on removing query parameters
Canonical URLs are instructions to search engines to collapse their index of duplicate or very similar URLs into one. This can have HUGE impacts on SEO rankings and website traffic. But under the current Joomla implementation, the following URLs would all canonicalise to themselves, telling search engines they are all unique, despite the content on page being identical:
https://www.example.com/
https://www.example.com/?utm_source=organic
https://www.example.com/?fbclid=XXXXX
https://www.example.com/?view=remind
More details on canonicalising index.php
Currently, Joomla can render these URLs with SEF turned on, and the content is identical. They should be canonicalised to the homepage URL.
https://www.example.com/
https://www.example.com/index.php
This also has some considerations on multi-language sites, where you get URLs like this:
https://www.example.com/index.php
https://www.example.com/es/index.php
These would need to canonicalise to:
https://www.example.com/
https://www.example.com/es/
More details on the option to enforce a trailing slash
Currently, Joomla can render these URLs with SEF turned on, and the content is identical. A website should adhere to a standard pattern and 301 redirect to the desired pattern to minimise duplicate content
https://www.example.com/category/my-article
https://www.example.com/category/my-article/
If the option is enabled, then the first URL is 301 redirected to the second URL.
Please consider these enhancements. URL management is tricky, but it is also critical for success in SEO and driving free traffic.
Labels |
Added:
?
|
Labels |
Added:
J4 Issue
|
Title |
|
Status | New | ⇒ | Information Required |
This all makes sense to me. I think this would be a great addition.
Status | Information Required | ⇒ | Discussion |
Joomla! core does not use trailing slash.
You can access URLs with trailing slashes but Joomla! core (except Language Switcher on homepage, but that seems to be unintentional) generates URLs without trailing slashes. If anything was to be enforced, it should be URLs without slashes. Unless we actually want to change URL structure to include a trailing slash for some reason.
The present so-called "canonical" setting in Joomla SEF has nothing to do with the real use of a canonical. To make it shortly: it does not deal at all with duplicate urls in the same domain/site and was never designed for that. It just redirects to another domain.
The Tip is clear.
PLG_SEF_DOMAIN_DESCRIPTION="If your site can be accessed through more than one domain enter the preferred (sometimes referred to as canonical) domain here. <br /><strong>Note:</strong> https://example.com and https://www.example.com are different domains."
(except Language Switcher on homepage, but that seems to be unintentional)
I think it is intentional as
mysite.com/en
is not at all equal to
mysite.com/en/
because en
is the lang sef prefix (for en-GB) and it is the only way to differentiate in code the language used for the Home page.
It loads the same page though. Also hreflang
does not contain a trailing slash:
<link href="http://localhost/index.php/de" rel="alternate" hreflang="de-DE">
<link href="http://localhost/index.php/en" rel="alternate" hreflang="en-GB">
<link href="http://localhost/index.php/en" rel="alternate" hreflang="x-default">
It does in 3.x, taking into account that here remove url language code is set for en-GB as site default language
<link href="http://localhost:8888/installmulti/trunkgitnew/fr/" rel="alternate" hreflang="fr-FR" />
<link href="http://localhost:8888/installmulti/trunkgitnew/" rel="alternate" hreflang="en-GB" />
<link href="http://localhost:8888/installmulti/trunkgitnew/it/" rel="alternate" hreflang="it-IT" />
So there were some changes in 4.0 and I think this could be an issue.
Needs further testing with cookie, default lang browser, etc.
RE: Joomla does not use a trailing slash: agreed this enforcement could be to remove the trailing slash. The SEO advantage to having a trailing slash is it signals a clear directory-type URL structure, but management of this becomes more difficult if users have enabled URL suffix, like ".html". Generally, you wouldn't want a URL to end with "alias.html/", it's odd formatting. Some people are very particular about their URL structure, so maybe giving people options is best: Do nothing, enforce trailing slash, remove trailing slash. So long as there is a 301 redirect in place, it solves issues of duplicate content that can happen when URLs get malformed by bugs, plugins, user error, etc.
RE: the current canonical implementation being intended for cross-domain canonicalization. If that's the case, its still falls very short of solving duplicate content problems. When you say "Redirects to another domain" what do you mean? Does it actually trigger a 301 redirect under some scenarios?
RE: Trailing slashes in HREFLANG tags on the homepage. I see a trailing slash in HREFLANG on my personal website when accessing the homepage. At least on my install, the configuration appears correct, but you might have found an issue under a different config. And if I navigate to the index.php page, the HREFLANGS reference the correct value while the canonical does not.
BROWSER REQUESTS: https://www.example.com/index.php
CANONICAL: https://www.example.com/index.php
HREFLANG EN: https://www.example.com/
HREFLANG TH: https://www.example.com/th/
RE: the current canonical implementation being intended for cross-domain canonicalization. If that's the case, its still falls very short of solving duplicate content problems. When you say "Redirects to another domain" what do you mean? Does it actually trigger a 301 redirect under some scenarios?
I badly expressed myself.
It is not a redirect per se. It just adds a canonical url where the domain is the domain entered in the field, telling search engines that the "real" domain to crawl is the "canonical" one.
It expects both domain to have exactly the same structure.
Ok, very interesting that this feature was initially added to solve cross-domain canonical issues. But I think the fact remains that this implementation is still problematic: it only solves a very narrow issue (cross-domain duplicate content), while creating new duplicate content issues by excluding critical options in the logic (ability to remove/whitelist/blacklist query parameters). Given the broad use of marketing and analytic tracking parameters in URLs, this is a pretty important concept to deal with and would be impactful for users trying to configure SEO options.
I see a trailing slash in HREFLANG on my personal website when accessing the homepage.
Yes, because you use 3.x and not 4.0
And if I navigate to the index.php page, the HREFLANGS reference the correct value while the canonical does not.
It does here, even in multilang (3.x)
<link href="http://anotherdomain.org/installmulti/trunkgitnew/fr/" rel="canonical" />
So, Yes, Joomla has no real code to deal with true canonical.
Normally, with the new routing in 4.0, we should get much less duplicates though if not none (not sure).
@infograf768 I'm not familiar with the new routing, but if there is no canonical URL with proper logic to remove query parameters from SEF URLs, then the duplicate content problem still exists and would be a huge thing to address for the platform.
I suggest you test 4.0 (php 7.2 minimum) https://developer.joomla.org/nightly-builds.html
@rhotog we can add a canonical tag. At least to some pages like articles.
we can add a canonical tag. At least to some pages like articles.
Not that I know of with default core. Can you explain?
To get one I had to do this hack for articles (to be sure I did not enter anything in the sef domain field)
diff --git a/administrator/components/com_content/models/forms/article.xml b/administrator/components/com_content/models/forms/article.xml
index 422206b..ead0f98 100644
--- a/administrator/components/com_content/models/forms/article.xml
+++ b/administrator/components/com_content/models/forms/article.xml
@@ -602,4 +602,19 @@
size="25"
/>
+
+ <field
+ name="spacer3"
+ type="spacer"
+ hr="true"
+ />
+
+ <field
+ name="canonical"
+ type="url"
+ label="JCANONICAL"
+ validate="url"
+ filter="url"
+ relative="false"
+ />
</fieldset>
diff --git a/administrator/language/en-GB/en-GB.ini b/administrator/language/en-GB/en-GB.ini
index f9d8e7f..985afa4 100644
--- a/administrator/language/en-GB/en-GB.ini
+++ b/administrator/language/en-GB/en-GB.ini
@@ -57,4 +57,5 @@
JASSOCIATIONS_DESC="Associations descending"
JCANCEL="Cancel"
+JCANONICAL="Canonical URL"
JCATEGORIES="Categories"
JCATEGORY="Category"
diff --git a/components/com_content/views/article/tmpl/default.php b/components/com_content/views/article/tmpl/default.php
index ad21b37..9b0c420 100644
--- a/components/com_content/views/article/tmpl/default.php
+++ b/components/com_content/views/article/tmpl/default.php
@@ -169,5 +169,10 @@
?>
<?php endif; ?>
- <?php // Content is generated by content plugin event "onContentAfterDisplay" ?>
+ <?php
+ if (!empty($params->get('canonical')))
+ {
+ JFactory::getApplication()->getDocument()->addHeadLink(htmlspecialchars($params->get('canonical')), 'canonical');
+ }
+ // Content is generated by content plugin event "onContentAfterDisplay" ?>
<?php echo $this->item->event->afterDisplayContent; ?>
</div>
It is a hack as I just entered the resulting sef link picked from frontend:
It should be done I guess with a JRoute and the non sef link in the field.
No, we can not set a canonical in our code.
First of all about the canonical link: This is NOT supposed to be on every page. The canonical URL is the URL that a page should be accessed by and if the page is rendered under a different URL, that different URL should have a canonical tag that links to the correct URL. But a page should NOT contain a canonical that just points to itself.
Regarding our canonical implementation: We, as Joomla core developer, have no way to know if the current page is supposed to be the canonical URL. Thus we can't set that canonical URL correctly. A site integrator would know the right URLs and could thus code something for the site he is working on, but for us it is not possible. A component itself could implement a canonical link behavior, but again, this is not something that I see us as Joomla core doing and instead would point to custom/third party solutions.
1.5 years later and I still stand to my previous comment. Can we please close this?
Status | Discussion | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2022-02-02 23:31:57 |
Closed_By | ⇒ | Quy | |
Labels |
Added:
No Code Attached Yet
Removed: ? |
Regarding our canonical implementation: We, as Joomla core developer, have no way to know if the current page is supposed to be the canonical URL. Thus we can't set that canonical URL correctly. A site integrator would know the right URLs and could thus code something for the site he is working on, but for us it is not possible. A component itself could implement a canonical link behavior, but again, this is not something that I see us as Joomla core doing and instead would point to custom/third party solutions.
@Hackwar why not make this an "option" for all MENU-items to give users the possibility to enter a Canonical link manually?
This manually entered canonical link would then be rendered on every page that is rendered under a different (duplicate) URL.
I think menu-level assignment would be a good way to manage this.
First of all about the canonical link: This is NOT supposed to be on every page. The canonical URL is the URL that a page should be accessed by and if the page is rendered under a different URL, that different URL should have a canonical tag that links to the correct URL. But a page should NOT contain a canonical that just points to itself.
This is not correct. Canonical tags in the HTML are used to consolidate SEO metrics and hint to search engines what the preferred URL is to rank. And when implemented correctly, they are available on every page, even if they self reference... URL A can have a canonical tag that points to URL A. This is perfectly valid and a best practice.
This is not correct. Canonical tags in the HTML are used to consolidate SEO metrics and hint to search engines what the preferred URL is to rank. And when implemented correctly, they are available on every page, even if they self reference... URL A can have a canonical tag that points to URL A. This is perfectly valid and a best practice.
correct. 3 URLS with different SEF-path resolving the same site/content should have 1 unique canonical. thats all.
Please can a Bug Squad Team-Member answer as this Issue is opened more than 12 hours ago?