?
avatar pAnd0rASBG
pAnd0rASBG
5 Dec 2018

Steps to reproduce the issue

Add URL in SEF Plugin to enable Canonicals, choose random URL and check canonicals.

Expected result

As per Google's canonical definition, all identical variations of an url should point to ONE url to identify, what should be indexed and prevent the bot from penalizing for duplicate content.

Actual result

(used a category blog called promotions for testing)
https://www.mysite.com/promotions -> generated canonical: "https://www.mysite.com/promotions"
https://www.mysite.com/promotions/ -> generated canonical: "https://www.mysite.com/promotions/"
https://www.mysite.com/index.php?option=com_content&view=category&layout=blog&id=11 -> generated canonical: "https://www.mysite.com/index.php?option=com_content&view=category&layout=blog&id=11"
https://www.mysite.com/promotions/?say=what -> generated canonical: "https://www.mysite.com/promotions/?say=what"

Paginated content, such as https://www.mysite.com/promotions?start=15 will give you "https://www.mysite.com/promotions" as canonical.

Per definition, the sole purpose of rel="canonical" is to point Crawlers to the proper version of a page, that is reachable under multiple urls, so that the expected url is indexed and multiple url versions are not seen as duplicate content.
Applied to above's example, all generated canonicals should point to https://www.mysite.com/promotions

Not only that this is not happening, by self-referencing canonicals on each variation, Joomla even tells Google et al, that all those variations are distinct pages and not url-variations of the same page, which makes it even more likely to get penalized for duplicate content.

Furthermore, paginated content should not have a canonical at all, but rather use rel="next" and rel="prev"
e.g. on a 5 article/page pagination on https://www.mysite.com/promotions?start=15 should contain no canonical, but

(first page should only have rel="next", last page only rel="prev")

Additional comments

Some references:
https://support.google.com/webmasters/answer/139066
https://webmasters.googleblog.com/2009/02/canonical-link-element-presentation.html
https://webmasters.googleblog.com/2009/02/specify-your-canonical.html
https://webmasters.googleblog.com/2013/04/5-common-mistakes-with-relcanonical.html
https://moz.com/learn/seo/canonicalization
https://moz.com/learn/seo/duplicate-content
https://webmasters.googleblog.com/2011/09/pagination-with-relnext-and-relprev.html

Votes

# of Users Experiencing Issue
1/1
Average Importance Score
5.00

avatar pAnd0rASBG pAnd0rASBG - open - 5 Dec 2018
avatar joomla-cms-bot joomla-cms-bot - labeled - 5 Dec 2018
avatar brianteeman
brianteeman - comment - 5 Dec 2018

As stated in the tooltip this is for a canonical domain NOT a canonical url

image

avatar pAnd0rASBG
pAnd0rASBG - comment - 5 Dec 2018

So this means, proper canonical URLs are basically unsupported?


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/23232.

avatar Bakual
Bakual - comment - 5 Dec 2018

There is a canonical URL support, but it doesn't work that well. It should work better with the new router.

and prevent the bot from penalizing for duplicate content.

On the other hand that statement from you is wrong anyway. There is no penalty for this as Google doesn't count it as duplicate content. Google is smart enough to recognize it as the same page with different URLs. It will just choose a "main" URL itself and list it only once. Canonical only makes sure that this "main" URL is the one you prefer, and not the one Google decides.
Duplicate content that gets a penalty is when the same content is on different sites (different domains). Eg when someone just copies the content from another site.

avatar pAnd0rASBG
pAnd0rASBG - comment - 5 Dec 2018

Ok, let me rephrase/concretize: it's true, that Google basically recognizes and chooses what it thinks is the best url, but
a) don't necessarily count on that 100%
b) you'll still get penalized, just in a different way. There are two things, that are really very important to Google: Quality of Index and Budget. In their a little bit more flowery words, they want to deliver the user the best possible results (quote: "We don't want to be a search engine, we want to be a find engine"), but the server farms, that run the crawler/indexers/analyzers/renderers etc. also cost a lot of money. So they assign a crawling budget to you, which is essentially a maximum amount of computational time, Google will spend on your stuff. This budget is dynamic and may grow or shrink depending on a number of factors, such as relevance, quality, link juice, page speed etc.
So, if for example Google crawls 80 of your pages in it's current crawling budget and all of them have 4 identical variations, that boils down to 20/80 relevant pages, which will downrank your site's quality and shrink your crawling budget. Even if you add 1500 pages of really cool blog posts, Google might then take a loooooong time to index them, because of your small crawling budget and will only slowly consider to raise it again.

At least, that's what "Google People" such as John Mueller and Martin Splitt said on recent SEO Conferences I've been to.

Finally, c) as omnipresent, as it seems these days, Google is not the only one. Other Search Engines, such as Bing, Yandex etc. are usually by far not as "clever" when it comes to such things, but they are definitely not irrelevant.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/23232.

avatar Bakual
Bakual - comment - 6 Dec 2018

a) and c) I've never seen duplicate URLs for the same Joomla content in Google or Bing (never used Yandex). So I think it's at least pretty close to 100%.

b) If you really have 1500 really cool blog posts, I'm almost 100% sure Google will index all of them. Also the crawler very likely will not find four identical URLs for the same content on its first attemp. It will find one "tree" and follow it and within that tree there will be only one type of URL.
So this even when true is a theoretical argument, not one from the practice.

Keep in mind, I'm not saying canonical URLs shouldn't be improved. They should, and are. I'm just saying the issue isn't as critical as you make it sound.
As said, with the new router they should work better since there should only be one URL for each content. You shouldn't get multiple URLs pointing to the same content (except of course non-SEF and SEF URLs). You can already try that out if you enable modern routing in the options.

avatar simbus82
simbus82 - comment - 6 Dec 2018

Sorry, i do SEO since a lot of years, and I'm reading some wrong things.
The trailing slash IS a problem for canonicalization.
Trailing slashes on root/hostnames don’t matter, it is right.

But IS a problem when we have trailing slash in "file/folder".

The problem is simple: example.com/page and example.com/page/ can show different things.

File names / folders with and without a forward slash can be seen as duplicate.
Thus if your web page can be reached by example.com/page and example.com/page/, you have a duplicate content issue.
If the real URL is /fish/ then your server should be redirecting /page to /page/, or the opposite.

This a Google guideline.
image

So, a rel=canonical tag MUST point to only one url, with or without trailing slash. If exists two canonical url that differ for the trailing slash, Google count two different content. Maybe Google will not penalize anyone for the presence of a duplicate content due to the slash. But for the search engine they are however two different contents!

PS. Same "hard" logic exist for url alias when we have something like example.com/keyword1_keyword2 and example.com/keyword1-keyword2.
Google, as a good software developed by technicians, considers the underscore symbols as a union and the hyphen/dash as a division of concepts. You can try it by doing a search ...
If you use an underscore Google will try not to distinguish between the two keywords in the url above.
So, when you have to compete in a high-competitiveness SERP, these things must be strongly considered. These are the things that make the difference in keyword positioning, and a good SEO needs to know them.

avatar Bakual
Bakual - comment - 6 Dec 2018

The problem is simple: example.com/page and example.com/page/ can show different things.

I'm aware of that Tweet (I saw it as well) and I agree with the statement. However again it's not the duplicate content issue, it's multiple URLs for the same content on the same site.
And again, please try with the "modern" router where this issue should be fixed to my knowledge. If it happens there as well, we can look into it.

avatar franz-wohlkoenig franz-wohlkoenig - change - 4 Mar 2019
Status New Information Required
avatar franz-wohlkoenig
franz-wohlkoenig - comment - 4 Mar 2019

Any Comment @simbus82 or can this Issue be closed?


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/23232.

avatar franz-wohlkoenig
franz-wohlkoenig - comment - 4 Mar 2019

Any Comment @simbus82 or can this Issue be closed?


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/23232.

avatar simbus82
simbus82 - comment - 4 Mar 2019

I have some comments @franz-wohlkoenig
@Bakual one of big problems is the crawl budget: in your example above, Google will take months to index "only" 1500 url if they are not "really clean and unique". (some news projects can have 150.000 urls...).
When we put online a project, we can't wait so many time, so we need a perfect url structur for obtaining indexing and positioning.

I'm working on a "vanilla" Joomla 3.9.3 website with Protostar and canonicals tag doesn't appear, only if switch to a template with "canonical url support", anyway they doesn't refer to an unique URL.
For example:
https://www.mysite.it/blog/leggi-e-normative
https://www.mysite.it/blog/leggi-e-normative/
are two urls for the same page (blog view of a category)
Each one doesn't have a unique canonical (for ex. https://www.mysite.it/blog/leggi-e-normative)

For now with Joomla 3.9.3 (modern router ON and url id removed), the canonical url for
https://www.mysite.it/blog/leggi-e-normative is https://www.mysite.it/blog/leggi-e-normative
And canonical url for https://www.mysite.it/blog/leggi-e-normative/ is https://www.mysite.it/blog/leggi-e-normative/.
Simply wrong from SEO perspective.

I can solve this only with this external plugin:
https://extensions.joomla.org/extension/canonical-links-all-in-one/

image

These setting could be a real good SEO starting point for "url canonical optimization", and can solve lot's of problem that Joomla have in SEO, and at same time giving a real boost in SEO efficence.

  • Canonical links are all without trailing slash
  • Joomla can redirect (301) all url with trailing slash to correct canonical url (and so without trailing slash)
  • Url with parameters are not redirect to the canonical (because they can be used for other things...)

Surely the last two points should be adjusted according to the website strategy and site structure, while the first point could be "hardcoded".

The canonical url of paginations instead, should work like this: for a url like https://www.mysite.it/blog?start=5 (second page of a list of articles) the canonical tag should be https://www.mysite.it/blog?start=5. Easy.

But the interesting thing, if we want to give a strong SEO footprint to the CMS (which I hope), for example would be the introduction of "REL = Prev / Next."
On the page "https://www.mysite.it/blog?start=5" we should have these two HTML tags in the head:

<link rel="prev" href="https://www.mysite.it/blog" />
<link rel="next" href="https://www.mysite.it/blog?start=10" />

Some good advice can be found here:
https://www.stonetemple.com/pagination-canonicalization-seo-your-technical-guide/

I'm so sorry that i can't help with some code development :-(
But, if in future there will be a "SEO team" for Joomla, I would like to help as much as I can!

avatar brianteeman
brianteeman - comment - 4 Mar 2019

It is not a canonical URL plugin.
It is only canonical domain

avatar simbus82
simbus82 - comment - 4 Mar 2019

Sorry, I thought the discussion had evolved on the issue of canonicals.

avatar infograf768
infograf768 - comment - 4 Mar 2019

Closing this.
As explained, proper canonical URLS as defined by various posters here (and the posted plugin) are not provided by joomla 3.9.x.
I am sure that for 4.0 any specific proposal (plugin or else) would be welcome.

avatar infograf768 infograf768 - close - 4 Mar 2019
avatar infograf768 infograf768 - change - 4 Mar 2019
Status Information Required Closed
Closed_Date 0000-00-00 00:00:00 2019-03-04 16:37:56
Closed_By infograf768

Add a Comment

Login with GitHub to post a comment