User tests: Successful: Unsuccessful:
In July 2015 a lot of people got "Googlebot cannot access CSS and JS files" notifications from Google.
This PR implements the fix described here:
http://upcity.com/blog/how-to-fix-googlebot-cannot-access-css-and-js-files-error-in-google-search-console/
Status | New | ⇒ | Pending |
Labels |
Added:
?
|
Easy | No | ⇒ | Yes |
Category | ⇒ | Front End |
We might just remove the Disallow
rule for plugins
, components
and modules
. All of these might contain .js and .css files.
I would also remove cache
since some plugins put cached images there (e.g. ImageSizer)
We might just remove the Disallow rule for plugins, components and modules. All of these might contain .js and .css files.
No. Those folders should not contain any js or css files to begin with (if the extension is properly developed), and Google should not index any of the other files in there.
Same for cache, there are files in there which definitively should not be indexed by Google.
If the proposed code here works, then that would be an acceptable solution imho.
No. Those folders should not contain any js or css files to begin with (if the extension is properly developed), and Google should not index any of the other files in there.
Then why do we need this patch at all?
The keywords in that post are "properly developed". Extensions which aren't following best practices and placing web assets in the media folder disallow you to do things like override the media with template level overrides and block the files from being indexed by bots without giving explicit permissions.
So, the patch is really only need if you are using extensions which don't use the images and media folders for assets that should be publicly accessible.
Of course, the other option is to just stop shipping a robots.txt file. Based on feedback in the forums, it seems that file is a major source of confusion and misunderstanding.
Also keep in mind that any change we do is only done to the robots.txt.dist file. The real robots.txt doesn't get changed, so the user who face this issue need to change it manually anyway.
We can of course do yet another postinstall message that we updated the robots.txt.dist file to test how many user actually read those
jommla 3.4.4 test was expected
I have tested this item successfully on 04aba94
works for me
Status | Pending | ⇒ | Ready to Commit |
Labels |
Added:
?
|
Still wondering, Why only allow for googlebot? If this is a valid solution, why not allow all "bots" to index .js and .css files?
Milestone |
Added: |
Milestone |
Added: |
I experienced a problem with the Google bot and therefore I implemented it only for Google bot.
Do other search engines check the js & css as well?
I don't know if other search engines do or not. But I'd say if they don't do it yet today, they probably will do sooner or later. And I don't want a PR every time a search engines adds that feature
Also is there a reason why they shouldn't index it when Google is allowed? Probably not.
From my understanding I would just remove the Google bot limitation and allow it for anyone. But then, I would first throw out those stupidly built extensions anyway
Ok, good point!
This PR has received new commits.
CC: @coolman01, @joomlamarco
Can you guys please test so we can merge it into 3.5? Thanks.
The checker that we link to in the robots.txt file itself says that this is invalid
Line31 Allow: .js
Unknown command. Acceptable commands are "User-agent" and "Disallow".
A robots.txt file doesn't say what files/directories you can allow but just what you can disallow. Please refer to Robots Exclusion Standard page for more informations.
Line 32 Allow: .css
Unknown command. Acceptable commands are "User-agent" and "Disallow".
A robots.txt file doesn't say what files/directories you can allow but just what you can disallow. Please refer to Robots Exclusion Standard page for more informations.
Google's robots.txt specification expands on the actual robots.txt standard to add support for non-standard commands. So *IF*
we want to add the Allow statements it has to be limited to bots from crawlers that use an extended standard. I've said it before and I'll say it again, I still think it's a bad idea to keep adding Google and/or crap extension specific workarounds to core.
I do agree with @mbabker here, if this is a Google specific change, we shouldn't allow it. Next one will come that they want a Baidu exception, where we do we stop then?
I wasn't aware at first that it was a robot specific setting. If people want it, they can add it to their own robots file. If it were generic, we could add it.
Status | Ready to Commit | ⇒ | Pending |
Labels |
Labels |
Removed:
?
|
Milestone |
Removed: |
Closing this issue as the solution is basically in place, we should use the media folder for media. Thanks everybody for your contributions.
Status | Pending | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2015-11-02 12:44:26 |
Closed_By | ⇒ | roland-d |
I just leave it here: #6361 #6702 #6839 #6098