?
avatar UberGruber
UberGruber
22 Jul 2014

Hey there, as per title I have this question.
I was looking in google webmasters tool the rendered page of a "fetch as google and render" request, and was bugged by the "partial" notification on it. Since robot.txt was blocking the template folder, all CSS and js files were unreachable and so the rendered page was totally broke.

I've searched a bit around and found some interesting points on this. Looks like this could be quite harming on a SEO perspective.

http://www.seoblog.com/2014/07/blocking-css-harmful-panda-4-0/

http://www.freshegg.com/blog/blocking-css-javascript-google-authorship

What is your take on this? Why is the template folder blocked in robot.txt?

Thanks for any info!

avatar UberGruber UberGruber - open - 22 Jul 2014
avatar Bakual
Bakual - comment - 22 Jul 2014

Then the same should apply to the media folder I think, as there all the css and js is stored ideally.

Can you make a Pull Request to remove that from the robots.txt.dist file?
See http://docs.joomla.org/Using_the_Github_UI_to_Make_Pull_Requests for how to do that.

avatar beat
beat - comment - 22 Jul 2014

Agree. Heard same at a SEO talk at JDay Paris 2014.

In that case, images too, as that prevents images.google.com indexing. I would nevertheless add a comment section at top of the robots,txt file and keep those removed lines commented in there with instruction to uncomment in case images should not be indexed.

avatar Bakual
Bakual - comment - 22 Jul 2014

images isn't excluded anymore in our default robots.txt.dist file. See https://github.com/joomla/joomla-cms/blob/staging/robots.txt.dist
That has been changed earlier this year.

avatar UberGruber
UberGruber - comment - 22 Jul 2014

Sorry I've been out all day and haven't got time to read the email notifications. I will change the file asap, still I'm wondering why those two folder were filtered in the first place. Is there any particular reason?
If there is, we could do a sort of exception rule for those two folder allowing to access only CSS and js files with something like this:

Allowed: /templates/*.css$
Allowed: /templates/*.js$
Allowed: /media/*.css$
Allowed: /media/*.js$

What do you say?
more info here at the bottom of page for matching rules examples:
https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt

avatar UberGruber UberGruber - reference | - 22 Jul 14
avatar UberGruber
UberGruber - comment - 25 Jul 2014

Been tinkering around and I guess it's not worth to put exception to files in those folders, I'll get a pull request with simply those folder commented. Any info on why those folders were included in the first place?

avatar UberGruber UberGruber - reference | - 25 Jul 14
avatar Bakual
Bakual - comment - 25 Jul 2014

I don't think there was a particular reason. Maybe at the time when it was written it was indeed best practice because Google didn't care a lot about this stuff anyway. Today that seems to be different so it makes sense to change it.

avatar zero-24 zero-24 - close - 25 Jul 2014
avatar Bakual
Bakual - comment - 25 Jul 2014

Closing this in favor of the actual PR #3965 so comments are in one place. Thanks!

avatar Bakual Bakual - change - 25 Jul 2014
Status New Closed
Closed_Date 0000-00-00 00:00:00 2014-07-25 14:46:33
avatar Bakual Bakual - close - 25 Jul 2014
avatar robwent
robwent - comment - 15 Oct 2014

wouldn't this also apply to modules and plugins that load their own js/css files?

avatar brianteeman
brianteeman - comment - 15 Oct 2014

They should be loading them from the /media folder - that is what it is for. If they are not then yes you would need to update your robots.txt file for those specific badly written extensions

avatar robwent
robwent - comment - 15 Oct 2014

fair enough

avatar zero-24 zero-24 - change - 7 Jul 2015
Labels Added: ?

Add a Comment

Login with GitHub to post a comment