PR-staging ?

Failure

User tests: Successful: Unsuccessful:

avatar matrikular
matrikular
7 Jul 2017

Pull Request for Issue #16998.

Summary of Changes

Prevent the api url from being indexed by search engines via the X-Robots-Tag which is acknowledged by at least Google and Bing.

Testing Instructions

  1. Code review
  2. Call http(s)://domain.tld/index.php?option=com_ajax&format=json in your browser (or programmatically)
  3. Check the return header for the X-Robots-Tag with the value "noindex"

Expected result

The X-Robots-Tag should be present in the response header.

Actual result

The X-Robots-Tag ist not yet present in the response header.

avatar matrikular matrikular - open - 7 Jul 2017
avatar matrikular matrikular - change - 7 Jul 2017
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 7 Jul 2017
Category Front End com_ajax
avatar matrikular matrikular - change - 7 Jul 2017
The description was changed
avatar matrikular matrikular - edited - 7 Jul 2017
avatar franz-wohlkoenig
franz-wohlkoenig - comment - 7 Jul 2017

@joo7 please test this PR if it solves your reported Issue #16998

avatar matrikular matrikular - change - 7 Jul 2017
Labels Added: PR-staging
avatar bembelimen bembelimen - test_item - 7 Jul 2017 - Tested successfully
avatar bembelimen
bembelimen - comment - 7 Jul 2017

I have tested this item successfully on b7d1585


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/17012.

avatar andrepereiradasilva
andrepereiradasilva - comment - 8 Jul 2017

Does google index JSON content type pages?
Couldn't the only problem be that the page is sending an HTML content type and not JSON content type as it should?

avatar matrikular
matrikular - comment - 8 Jul 2017

@andrepereiradasilva I wasn't sure and addressed that in this PR as well. The X-Robots-Tag would still be needed for non-json content.

avatar andrepereiradasilva
andrepereiradasilva - comment - 8 Jul 2017

Ok them not sure also.

avatar matrikular
matrikular - comment - 9 Jul 2017

After I spoke to Christopher from the SEO Team, we've decided to add the nofollow directive to the header as well. The result of a request to com_ajax could return links that, with the noindex directive alone, would be followed and if not declared otherwise, indexed (recursion, crawl budget).

Meanwhile we're looking for an answer as to how those urls got into the index.

avatar andrepereiradasilva
andrepereiradasilva - comment - 9 Jul 2017

Meanwhile we're looking for an answer as to how those urls got into the index.

my best guess: probably because of keepalive behaviour (which is added, for instance, in login forms) and the fact that json page is rendered (before this PR) as a text/html content type.

keepalive behaviour, to force the session to not expire, makes ajax calls from time to time using com_ajax
See:

avatar matrikular
matrikular - comment - 9 Jul 2017

@andrepereiradasilva without further digging, this could be the reason for it, yes. Thank you.

avatar roland-d roland-d - test_item - 9 Jul 2017 - Tested successfully
avatar roland-d
roland-d - comment - 9 Jul 2017

I have tested this item successfully on 16d8c59

Before the patch the headers are:

Date: Sun, 09 Jul 2017 19:56:00 GMT
Server: Apache
X-Powered-By: PHP/7.1.1
Set-Cookie: 17f10dcca6df4577e8dbae863c989bf6=79d6d48c837b471a2678414e0e3342e6; path=/; HttpOnly
Content-Disposition: attachment; filename="joomla.json"
Expires: Wed, 17 Aug 2005 00:00:00 GMT
Last-Modified: Sun, 09 Jul 2017 19:56:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Transfer-Encoding: chunked
Content-Type: application/json; charset=utf-8

After the patch

Date: Sun, 09 Jul 2017 19:56:28 GMT
Server: Apache
X-Powered-By: PHP/7.1.1
Set-Cookie: 17f10dcca6df4577e8dbae863c989bf6=eab98dfce97b0f4b911011670f7167e5; path=/; HttpOnly
X-Robots-Tag: noindex, nofollow
Content-Disposition: attachment; filename="joomla.json"
Expires: Wed, 17 Aug 2005 00:00:00 GMT
Last-Modified: Sun, 09 Jul 2017 19:56:28 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Transfer-Encoding: chunked
Content-Type: application/json; charset=utf-8

The X-Robots-Tag is present.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/17012.

avatar joo7
joo7 - comment - 9 Jul 2017

Content-Type: application/json; charset=utf-83 times at the end of last line looks wrong?

avatar roland-d
roland-d - comment - 9 Jul 2017

@joo7 Where do you see that?

avatar joo7
joo7 - comment - 9 Jul 2017

now it is gone, looks ok now, I am sorry, I guess github added it and/or J!Tracker Application...
please ignore that comment

avatar mbabker
mbabker - comment - 9 Jul 2017

If you're talking about the triple tickmark thing the issue tracker sometimes doesn't handle appending its comment footer to a user's input well (especially when the comment ends with some kind of markdown symbol like the tickmark). It was just badly formatted output, not an actual indication of what his browser was returning.

avatar joo7
joo7 - comment - 9 Jul 2017

Ah, yes, the triple tickmark is gone now, thank you for clarification

avatar joo7
joo7 - comment - 12 Jul 2017

"Remove mime type set to json so it can be fixed in library"
Now that the mime type is removed, is it necessary to create a new issue to fix it in /libraries/joomla/document/json.php?

avatar matrikular
matrikular - comment - 12 Jul 2017

@joo7 Yes, I'm working on that.

avatar franz-wohlkoenig
franz-wohlkoenig - comment - 26 Oct 2017

@matrikular any Update?


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/17012.

avatar wilsonge wilsonge - change - 1 Apr 2020
Status Pending Fixed in Code Base
Closed_Date 0000-00-00 00:00:00 2020-04-01 18:25:17
Closed_By wilsonge
Labels Added: ?
avatar wilsonge wilsonge - close - 1 Apr 2020
avatar wilsonge wilsonge - merge - 1 Apr 2020
avatar wilsonge
wilsonge - comment - 1 Apr 2020

Merging this on review with the one good test

Add a Comment

Login with GitHub to post a comment