Joomla! Issue Tracker | Joomla! CMS #6098

Closed
17 Feb 2015
Medium
Build: staging
# 6098
Diff
Kostelano:patch-4

Success

Success The Travis CI build passed Details

User tests: Successful: Unsuccessful:

Kostelano
16 Feb 2015

Not index links to restore the login, password, etc:

http://SITE.COM/component/users/?view=login
http://SITE.COM/component/users/?view=registration
http://SITE.COM/component/users/?view=remind
http://SITE.COM/component/users/?view=reset

Send the article to a friend
http://SITE.COM/component/mailto/?tmpl=component&template=protostar&link=6ea8583f082e256dc6b945104b150650da776e58

Not index content that is linked to categories "uncategorised". This page is a duplicate.

Not index links:

http://SITE.COM/YOUR_CATEGORY?limitstart=0
http://SITE.COM/YOUR_CATEGORY?start=5
http://SITE.COM/YOUR_CATEGORY?start=10

Irrelevant information on the page http://tool.motoricerca.info/robots-checker.phtml, symbol * can be used.
http://savepic.su/5070990.png

560ecce 16 Feb 2015

Update robots.txt.dist

Kostelano - open - 16 Feb 2015

joomla-cms-bot - change - 16 Feb 2015

Labels

Added: ?

brianteeman - comment - 16 Feb 2015

Not index content that is linked to categories "uncategorised". This page is a duplicate.

How is this automatically duplicate content?

_{This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/6098.}

bertmert - comment - 17 Feb 2015

All these new lines are NOT robots.txt standard.

http://www.robotstxt.org/robotstxt.html
See remark from "Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines...."

And I've never seen a plus sign in robots.txt standards. What is it good for?

Bakual - comment - 17 Feb 2015

@Kostelano Can you elaborate and answer the raised question?
To me it doesn't make much sense what you propose here.

Kostelano - comment - 17 Feb 2015

About the "uncategorised". Almost always the category "Uncategorized" is not renamed. Any articles that are tied to this category, and thus get a link from the menu (for example) have a double:

http://SITE.COM/content_1/
http://SITE.COM/content_1/2-uncategorised/

This almost always takes place.

Sorry, copying accidentally added "+" symbol. Of course, it's not there! I'm sorry!

After listening to your opinions, I suggest to specify some strings so that it does not create problems with the menus (for example) - someone wrote above.
For example:

Disallow: /*?tmpl=component&print=*
Disallow: /*?start=*
Disallow: /*?limitstart=* // it duplicates the first page category with page navigation
Disallow: /*-uncategorised

Can be ignored "?start=" because the line is not a double. This page is just not interesting to users.

Perhaps google and well takes two identical pages (for Bakual), but why do we? It is logical to have one such page is correct. With that, adding the exception string with /component/, we get rid of at least five more unnecessary pages in the index of search engines.

After all, even with the authorization module off the search engine spiders find links to the registration, login and password recovery, etc.

Thank you for your attention.

mbabker - comment - 17 Feb 2015

Perhaps google and well takes two identical pages (for Bakual), but why do we? It is logical to have one such page is correct. With that, adding the exception string with /component/, we get rid of at least five more unnecessary pages in the index of search engines.

Like I noted, your change with that line would effectively block search engines from not indexing ALL routes using the default Joomla SEF implementation (where routes are named /component/<component_name>/<view_name>) and I don't think that is a change that we should make to our default robots.txt file. For what you're suggesting to accomplish, I really think you need more specific rules that aren't going to cause potential issues with other valid routes.

richard67 - comment - 17 Feb 2015

As stated above by @bertmert (but ignored during further dicsussions), the globbing in the disallow lines is NOT robots.txt standard. It is a Google extension which is not standard and not supported by most other search engines.

_{This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/6098.}

Bakual - comment - 17 Feb 2015

Based on the overall feedback so far, I'm closing this PR.

Feel free to create a new one with the feedback included.

Bakual - change - 17 Feb 2015

Status	Pending	⇒	Closed
Closed_Date	0000-00-00 00:00:00	⇒	2015-02-17 19:05:54

Bakual - close - 17 Feb 2015

Add a Comment

Older
Newer

Joomla! Issue Tracker - CMS

[#6098] - Update robots.txt.dist

Add a Comment