No Code Attached Yet J4 Issue
avatar kartzke
kartzke
6 Jul 2019

Is your feature request related to a problem? Please describe.

The Joomla SEF plugin has a canonical URL feature that seems to be both poorly documented (it doesn't even say the word "canonical" in the Joomla backend) and it is implemented in a way that will create more SEO problems than it solves.

My suggestions below will not make this perfect for all websites, but it should dramatically improve SEO for a majority of Joomla websites that are using the built-in SEF URL functionality. And for websites that use 3rd party components for ecommerce or publishing, those users will likely be using a more powerful URL management system anyway (SH404SEF, etc).

This feature request is to simply fix a pretty big SEO issue that the Joomla Core code creates.

Describe the solution you'd like

Add three new options to the SEF Plugin to give website owners more control over their canonical URLs. Screenshot attached.

Simpler solution:

  1. Remove query parameters from Canonical URLs
  2. Canonicalise index.php to home
  3. Enforce a trailing slash on SEF URLs

More complex solution:

  1. Remove query parameters from canonical URLs, but give users the ability to whitelist / blacklist certain query parameters. More flexible, but more difficult to write and probably more error prone for users.
  2. Canonicalise index.php to home
  3. Enforce a trailing slash on SEF URLs

Additional context

If a Joomla user has SEF URLs and rewrites turned off, then they do not need this option.

But with mod_rewrite on and SEF URLs enabled, most query parameters on a standard Joomla installation are for analytics or sorting/filtering. (to my knowledge)

More details on removing query parameters
Canonical URLs are instructions to search engines to collapse their index of duplicate or very similar URLs into one. This can have HUGE impacts on SEO rankings and website traffic. But under the current Joomla implementation, the following URLs would all canonicalise to themselves, telling search engines they are all unique, despite the content on page being identical:

https://www.example.com/
https://www.example.com/?utm_source=organic
https://www.example.com/?fbclid=XXXXX
https://www.example.com/?view=remind

More details on canonicalising index.php
Currently, Joomla can render these URLs with SEF turned on, and the content is identical. They should be canonicalised to the homepage URL.

https://www.example.com/
https://www.example.com/index.php

This also has some considerations on multi-language sites, where you get URLs like this:

https://www.example.com/index.php
https://www.example.com/es/index.php

These would need to canonicalise to:

https://www.example.com/
https://www.example.com/es/

More details on the option to enforce a trailing slash
Currently, Joomla can render these URLs with SEF turned on, and the content is identical. A website should adhere to a standard pattern and 301 redirect to the desired pattern to minimise duplicate content

https://www.example.com/category/my-article
https://www.example.com/category/my-article/

If the option is enabled, then the first URL is 301 redirected to the second URL.

Please consider these enhancements. URL management is tricky, but it is also critical for success in SEO and driving free traffic.

Screen Shot 2019-07-06 at 18 05 44

avatar rhotog rhotog - open - 6 Jul 2019
avatar joomla-cms-bot joomla-cms-bot - change - 6 Jul 2019
Labels Added: ?
avatar joomla-cms-bot joomla-cms-bot - labeled - 6 Jul 2019
avatar franz-wohlkoenig franz-wohlkoenig - change - 6 Jul 2019
Labels Added: J4 Issue
avatar franz-wohlkoenig franz-wohlkoenig - labeled - 6 Jul 2019
avatar franz-wohlkoenig franz-wohlkoenig - change - 6 Jul 2019
Title
Dramatically Improve SEO -> New SEF canonical URL options
[4.0] Dramatically Improve SEO -> New SEF canonical URL options
avatar franz-wohlkoenig franz-wohlkoenig - edited - 6 Jul 2019
avatar ghost
ghost - comment - 7 Jul 2019

Please can a Bug Squad Team-Member answer as this Issue is opened more than 12 hours ago?

avatar franz-wohlkoenig franz-wohlkoenig - change - 7 Jul 2019
Status New Information Required
avatar roland-d
roland-d - comment - 7 Jul 2019

This all makes sense to me. I think this would be a great addition.

avatar franz-wohlkoenig franz-wohlkoenig - change - 7 Jul 2019
Status Information Required Discussion
avatar SharkyKZ
SharkyKZ - comment - 7 Jul 2019

Joomla! core does not use trailing slash.

avatar roland-d
roland-d - comment - 7 Jul 2019

@SharkyKZ How do you mean?

I can access https://www.joomla.org/announcements.html/ the same as https://www.joomla.org/announcements.html and they both display the same page.

On a client side I fixed this exact issue the other day.

avatar SharkyKZ
SharkyKZ - comment - 7 Jul 2019

You can access URLs with trailing slashes but Joomla! core (except Language Switcher on homepage, but that seems to be unintentional) generates URLs without trailing slashes. If anything was to be enforced, it should be URLs without slashes. Unless we actually want to change URL structure to include a trailing slash for some reason.

avatar infograf768
infograf768 - comment - 7 Jul 2019

The present so-called "canonical" setting in Joomla SEF has nothing to do with the real use of a canonical. To make it shortly: it does not deal at all with duplicate urls in the same domain/site and was never designed for that. It just redirects to another domain.
The Tip is clear.
PLG_SEF_DOMAIN_DESCRIPTION="If your site can be accessed through more than one domain enter the preferred (sometimes referred to as canonical) domain here. <br /><strong>Note:</strong> https://example.com and https://www.example.com are different domains."

avatar infograf768
infograf768 - comment - 7 Jul 2019

(except Language Switcher on homepage, but that seems to be unintentional)

I think it is intentional as
mysite.com/en
is not at all equal to
mysite.com/en/
because en is the lang sef prefix (for en-GB) and it is the only way to differentiate in code the language used for the Home page.

avatar SharkyKZ
SharkyKZ - comment - 7 Jul 2019

It loads the same page though. Also hreflang does not contain a trailing slash:

<link href="http://localhost/index.php/de" rel="alternate" hreflang="de-DE">
<link href="http://localhost/index.php/en" rel="alternate" hreflang="en-GB">
<link href="http://localhost/index.php/en" rel="alternate" hreflang="x-default">
avatar infograf768
infograf768 - comment - 7 Jul 2019

It does in 3.x, taking into account that here remove url language code is set for en-GB as site default language

<link href="http://localhost:8888/installmulti/trunkgitnew/fr/" rel="alternate" hreflang="fr-FR" />
	<link href="http://localhost:8888/installmulti/trunkgitnew/" rel="alternate" hreflang="en-GB" />
	<link href="http://localhost:8888/installmulti/trunkgitnew/it/" rel="alternate" hreflang="it-IT" />
avatar infograf768
infograf768 - comment - 7 Jul 2019

So there were some changes in 4.0 and I think this could be an issue.
Needs further testing with cookie, default lang browser, etc.

avatar kartzke
kartzke - comment - 7 Jul 2019

RE: Joomla does not use a trailing slash: agreed this enforcement could be to remove the trailing slash. The SEO advantage to having a trailing slash is it signals a clear directory-type URL structure, but management of this becomes more difficult if users have enabled URL suffix, like ".html". Generally, you wouldn't want a URL to end with "alias.html/", it's odd formatting. Some people are very particular about their URL structure, so maybe giving people options is best: Do nothing, enforce trailing slash, remove trailing slash. So long as there is a 301 redirect in place, it solves issues of duplicate content that can happen when URLs get malformed by bugs, plugins, user error, etc.

RE: the current canonical implementation being intended for cross-domain canonicalization. If that's the case, its still falls very short of solving duplicate content problems. When you say "Redirects to another domain" what do you mean? Does it actually trigger a 301 redirect under some scenarios?

RE: Trailing slashes in HREFLANG tags on the homepage. I see a trailing slash in HREFLANG on my personal website when accessing the homepage. At least on my install, the configuration appears correct, but you might have found an issue under a different config. And if I navigate to the index.php page, the HREFLANGS reference the correct value while the canonical does not.

BROWSER REQUESTS: https://www.example.com/index.php
CANONICAL: https://www.example.com/index.php
HREFLANG EN: https://www.example.com/
HREFLANG TH: https://www.example.com/th/

avatar infograf768
infograf768 - comment - 7 Jul 2019

RE: the current canonical implementation being intended for cross-domain canonicalization. If that's the case, its still falls very short of solving duplicate content problems. When you say "Redirects to another domain" what do you mean? Does it actually trigger a 301 redirect under some scenarios?

I badly expressed myself.
It is not a redirect per se. It just adds a canonical url where the domain is the domain entered in the field, telling search engines that the "real" domain to crawl is the "canonical" one.
It expects both domain to have exactly the same structure.

avatar kartzke
kartzke - comment - 7 Jul 2019

Ok, very interesting that this feature was initially added to solve cross-domain canonical issues. But I think the fact remains that this implementation is still problematic: it only solves a very narrow issue (cross-domain duplicate content), while creating new duplicate content issues by excluding critical options in the logic (ability to remove/whitelist/blacklist query parameters). Given the broad use of marketing and analytic tracking parameters in URLs, this is a pretty important concept to deal with and would be impactful for users trying to configure SEO options.

avatar infograf768
infograf768 - comment - 7 Jul 2019

I see a trailing slash in HREFLANG on my personal website when accessing the homepage.

Yes, because you use 3.x and not 4.0

And if I navigate to the index.php page, the HREFLANGS reference the correct value while the canonical does not.

It does here, even in multilang (3.x)
<link href="http://anotherdomain.org/installmulti/trunkgitnew/fr/" rel="canonical" />

So, Yes, Joomla has no real code to deal with true canonical.
Normally, with the new routing in 4.0, we should get much less duplicates though if not none (not sure).

avatar infograf768
infograf768 - comment - 7 Jul 2019

@SharkyKZ
Looks like I misunderstood your post about the switcher.

avatar kartzke
kartzke - comment - 12 Jul 2019

@infograf768 I'm not familiar with the new routing, but if there is no canonical URL with proper logic to remove query parameters from SEF URLs, then the duplicate content problem still exists and would be a huge thing to address for the platform.

avatar infograf768
infograf768 - comment - 12 Jul 2019

I suggest you test 4.0 (php 7.2 minimum) https://developer.joomla.org/nightly-builds.html

avatar SharkyKZ
SharkyKZ - comment - 12 Jul 2019

@rhotog we can add a canonical tag. At least to some pages like articles.

avatar infograf768
infograf768 - comment - 12 Jul 2019

@SharkyKZ

we can add a canonical tag. At least to some pages like articles.

Not that I know of with default core. Can you explain?

To get one I had to do this hack for articles (to be sure I did not enter anything in the sef domain field)

diff --git a/administrator/components/com_content/models/forms/article.xml b/administrator/components/com_content/models/forms/article.xml
index 422206b..ead0f98 100644
--- a/administrator/components/com_content/models/forms/article.xml
+++ b/administrator/components/com_content/models/forms/article.xml
@@ -602,4 +602,19 @@
 				size="25" 
 			/>
+
+			<field
+				name="spacer3"
+				type="spacer"
+				hr="true"
+			/>
+
+			<field
+				name="canonical"
+				type="url"
+				label="JCANONICAL"
+				validate="url"
+				filter="url"
+				relative="false"
+			/>
 		</fieldset>
 
diff --git a/administrator/language/en-GB/en-GB.ini b/administrator/language/en-GB/en-GB.ini
index f9d8e7f..985afa4 100644
--- a/administrator/language/en-GB/en-GB.ini
+++ b/administrator/language/en-GB/en-GB.ini
@@ -57,4 +57,5 @@
 JASSOCIATIONS_DESC="Associations descending"
 JCANCEL="Cancel"
+JCANONICAL="Canonical URL"
 JCATEGORIES="Categories"
 JCATEGORY="Category"
diff --git a/components/com_content/views/article/tmpl/default.php b/components/com_content/views/article/tmpl/default.php
index ad21b37..9b0c420 100644
--- a/components/com_content/views/article/tmpl/default.php
+++ b/components/com_content/views/article/tmpl/default.php
@@ -169,5 +169,10 @@
 	?>
 	<?php endif; ?>
-	<?php // Content is generated by content plugin event "onContentAfterDisplay" ?>
+	<?php
+	if (!empty($params->get('canonical')))
+	{
+		JFactory::getApplication()->getDocument()->addHeadLink(htmlspecialchars($params->get('canonical')), 'canonical');
+	}
+	 // Content is generated by content plugin event "onContentAfterDisplay" ?>
 	<?php echo $this->item->event->afterDisplayContent; ?>
 </div>

It is a hack as I just entered the resulting sef link picked from frontend:

Screen Shot 2019-07-12 at 17 25 21

It should be done I guess with a JRoute and the non sef link in the field.

avatar SharkyKZ
SharkyKZ - comment - 12 Jul 2019

We can use whatever we normally use to generate item links to get canonical URLs. Once such link is inserted it will stay the same even if the page is access from different URLs. See #18341 for example.

avatar Hackwar
Hackwar - comment - 6 Jun 2020

No, we can not set a canonical in our code.

First of all about the canonical link: This is NOT supposed to be on every page. The canonical URL is the URL that a page should be accessed by and if the page is rendered under a different URL, that different URL should have a canonical tag that links to the correct URL. But a page should NOT contain a canonical that just points to itself.

Regarding our canonical implementation: We, as Joomla core developer, have no way to know if the current page is supposed to be the canonical URL. Thus we can't set that canonical URL correctly. A site integrator would know the right URLs and could thus code something for the site he is working on, but for us it is not possible. A component itself could implement a canonical link behavior, but again, this is not something that I see us as Joomla core doing and instead would point to custom/third party solutions.

avatar Hackwar
Hackwar - comment - 2 Feb 2022

1.5 years later and I still stand to my previous comment. Can we please close this?

avatar Quy Quy - change - 2 Feb 2022
Status Discussion Closed
Closed_Date 0000-00-00 00:00:00 2022-02-02 23:31:57
Closed_By Quy
Labels Added: No Code Attached Yet
Removed: ?
avatar Quy Quy - close - 2 Feb 2022
avatar jokorntheuer
jokorntheuer - comment - 13 Apr 2023

Regarding our canonical implementation: We, as Joomla core developer, have no way to know if the current page is supposed to be the canonical URL. Thus we can't set that canonical URL correctly. A site integrator would know the right URLs and could thus code something for the site he is working on, but for us it is not possible. A component itself could implement a canonical link behavior, but again, this is not something that I see us as Joomla core doing and instead would point to custom/third party solutions.

@Hackwar why not make this an "option" for all MENU-items to give users the possibility to enter a Canonical link manually?
This manually entered canonical link would then be rendered on every page that is rendered under a different (duplicate) URL.

avatar kartzke
kartzke - comment - 13 Apr 2023

I think menu-level assignment would be a good way to manage this.

First of all about the canonical link: This is NOT supposed to be on every page. The canonical URL is the URL that a page should be accessed by and if the page is rendered under a different URL, that different URL should have a canonical tag that links to the correct URL. But a page should NOT contain a canonical that just points to itself.

This is not correct. Canonical tags in the HTML are used to consolidate SEO metrics and hint to search engines what the preferred URL is to rank. And when implemented correctly, they are available on every page, even if they self reference... URL A can have a canonical tag that points to URL A. This is perfectly valid and a best practice.

avatar jokorntheuer
jokorntheuer - comment - 14 Apr 2023

This is not correct. Canonical tags in the HTML are used to consolidate SEO metrics and hint to search engines what the preferred URL is to rank. And when implemented correctly, they are available on every page, even if they self reference... URL A can have a canonical tag that points to URL A. This is perfectly valid and a best practice.

correct. 3 URLS with different SEF-path resolving the same site/content should have 1 unique canonical. thats all.

Add a Comment

Login with GitHub to post a comment