? Success

User tests: Successful: Unsuccessful:

avatar Kostelano Kostelano - open - 16 Feb 2015
avatar joomla-cms-bot joomla-cms-bot - change - 16 Feb 2015
Labels Added: ?
avatar brianteeman
brianteeman - comment - 16 Feb 2015

Not index content that is linked to categories "uncategorised". This page is a duplicate.

How is this automatically duplicate content?


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/6098.
avatar bertmert
bertmert - comment - 17 Feb 2015

All these new lines are NOT robots.txt standard.

http://www.robotstxt.org/robotstxt.html
See remark from "Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines...."

And I've never seen a plus sign in robots.txt standards. What is it good for?

avatar Bakual
Bakual - comment - 17 Feb 2015

@Kostelano Can you elaborate and answer the raised question?
To me it doesn't make much sense what you propose here.

avatar Kostelano
Kostelano - comment - 17 Feb 2015
  • About the "uncategorised". Almost always the category "Uncategorized" is not renamed. Any articles that are tied to this category, and thus get a link from the menu (for example) have a double:

http://SITE.COM/content_1/
http://SITE.COM/content_1/2-uncategorised/

This almost always takes place.

  • Sorry, copying accidentally added "+" symbol. Of course, it's not there! I'm sorry!

After listening to your opinions, I suggest to specify some strings so that it does not create problems with the menus (for example) - someone wrote above.
For example:

Disallow: /*?tmpl=component&print=*
Disallow: /*?start=*
Disallow: /*?limitstart=* // it duplicates the first page category with page navigation
Disallow: /*-uncategorised

Can be ignored "?start=" because the line is not a double. This page is just not interesting to users.

  • Perhaps google and well takes two identical pages (for Bakual), but why do we? It is logical to have one such page is correct. With that, adding the exception string with /component/, we get rid of at least five more unnecessary pages in the index of search engines.

After all, even with the authorization module off the search engine spiders find links to the registration, login and password recovery, etc.

Thank you for your attention.

avatar mbabker
mbabker - comment - 17 Feb 2015

Perhaps google and well takes two identical pages (for Bakual), but why do we? It is logical to have one such page is correct. With that, adding the exception string with /component/, we get rid of at least five more unnecessary pages in the index of search engines.

Like I noted, your change with that line would effectively block search engines from not indexing ALL routes using the default Joomla SEF implementation (where routes are named /component/<component_name>/<view_name>) and I don't think that is a change that we should make to our default robots.txt file. For what you're suggesting to accomplish, I really think you need more specific rules that aren't going to cause potential issues with other valid routes.

avatar richard67
richard67 - comment - 17 Feb 2015

As stated above by @bertmert (but ignored during further dicsussions), the globbing in the disallow lines is NOT robots.txt standard. It is a Google extension which is not standard and not supported by most other search engines.


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/6098.
avatar Bakual
Bakual - comment - 17 Feb 2015

Based on the overall feedback so far, I'm closing this PR.

Feel free to create a new one with the feedback included.

avatar Bakual Bakual - change - 17 Feb 2015
Status Pending Closed
Closed_Date 0000-00-00 00:00:00 2015-02-17 19:05:54
avatar Bakual Bakual - close - 17 Feb 2015

Add a Comment

Login with GitHub to post a comment