Hey there, as per title I have this question.
I was looking in google webmasters tool the rendered page of a "fetch as google and render" request, and was bugged by the "partial" notification on it. Since robot.txt was blocking the template folder, all CSS
and js
files were unreachable and so the rendered page was totally broke.
I've searched a bit around and found some interesting points on this. Looks like this could be quite harming on a SEO perspective.
http://www.seoblog.com/2014/07/blocking-css-harmful-panda-4-0/
http://www.freshegg.com/blog/blocking-css-javascript-google-authorship
What is your take on this? Why is the template folder blocked in robot.txt?
Thanks for any info!
Agree. Heard same at a SEO talk at JDay Paris 2014.
In that case, images too, as that prevents images.google.com indexing. I would nevertheless add a comment section at top of the robots,txt file and keep those removed lines commented in there with instruction to uncomment in case images should not be indexed.
images
isn't excluded anymore in our default robots.txt.dist file. See https://github.com/joomla/joomla-cms/blob/staging/robots.txt.dist
That has been changed earlier this year.
Sorry I've been out all day and haven't got time to read the email notifications. I will change the file asap, still I'm wondering why those two folder were filtered in the first place. Is there any particular reason?
If there is, we could do a sort of exception rule for those two folder allowing to access only CSS
and js
files with something like this:
Allowed: /templates/*.css$
Allowed: /templates/*.js$
Allowed: /media/*.css$
Allowed: /media/*.js$
What do you say?
more info here at the bottom of page for matching rules examples:
https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
Been tinkering around and I guess it's not worth to put exception to files in those folders, I'll get a pull request with simply those folder commented. Any info on why those folders were included in the first place?
I don't think there was a particular reason. Maybe at the time when it was written it was indeed best practice because Google didn't care a lot about this stuff anyway. Today that seems to be different so it makes sense to change it.
Status | New | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2014-07-25 14:46:33 |
wouldn't this also apply to modules and plugins that load their own js/css files?
They should be loading them from the /media folder - that is what it is for. If they are not then yes you would need to update your robots.txt file for those specific badly written extensions
fair enough
Labels |
Added:
?
|
Then the same should apply to the
media
folder I think, as there all the css and js is stored ideally.Can you make a Pull Request to remove that from the robots.txt.dist file?
See http://docs.joomla.org/Using_the_Github_UI_to_Make_Pull_Requests for how to do that.