User tests: Successful: Unsuccessful:
Not index links to restore the login, password, etc:
http://SITE.COM/component/users/?view=login
http://SITE.COM/component/users/?view=registration
http://SITE.COM/component/users/?view=remind
http://SITE.COM/component/users/?view=reset
Send the article to a friend
http://SITE.COM/component/mailto/?tmpl=component&template=protostar&link=6ea8583f082e256dc6b945104b150650da776e58
Not index content that is linked to categories "uncategorised". This page is a duplicate.
Not index links:
http://SITE.COM/YOUR_CATEGORY?limitstart=0
http://SITE.COM/YOUR_CATEGORY?start=5
http://SITE.COM/YOUR_CATEGORY?start=10
Irrelevant information on the page http://tool.motoricerca.info/robots-checker.phtml, symbol * can be used.
http://savepic.su/5070990.png
Labels |
Added:
?
|
All these new lines are NOT robots.txt standard.
http://www.robotstxt.org/robotstxt.html
See remark from "Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines...."
And I've never seen a plus sign in robots.txt standards. What is it good for?
@Kostelano Can you elaborate and answer the raised question?
To me it doesn't make much sense what you propose here.
http://SITE.COM/content_1/
http://SITE.COM/content_1/2-uncategorised/
This almost always takes place.
After listening to your opinions, I suggest to specify some strings so that it does not create problems with the menus (for example) - someone wrote above.
For example:
Disallow: /*?tmpl=component&print=*
Disallow: /*?start=*
Disallow: /*?limitstart=* // it duplicates the first page category with page navigation
Disallow: /*-uncategorised
Can be ignored "?start=" because the line is not a double. This page is just not interesting to users.
After all, even with the authorization module off the search engine spiders find links to the registration, login and password recovery, etc.
Thank you for your attention.
Perhaps google and well takes two identical pages (for Bakual), but why do we? It is logical to have one such page is correct. With that, adding the exception string with /component/, we get rid of at least five more unnecessary pages in the index of search engines.
Like I noted, your change with that line would effectively block search engines from not indexing ALL routes using the default Joomla SEF implementation (where routes are named /component/<component_name>/<view_name>
) and I don't think that is a change that we should make to our default robots.txt file. For what you're suggesting to accomplish, I really think you need more specific rules that aren't going to cause potential issues with other valid routes.
As stated above by @bertmert (but ignored during further dicsussions), the globbing in the disallow lines is NOT robots.txt standard. It is a Google extension which is not standard and not supported by most other search engines.
Based on the overall feedback so far, I'm closing this PR.
Feel free to create a new one with the feedback included.
Status | Pending | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2015-02-17 19:05:54 |
How is this automatically duplicate content?
This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/6098.