For relevant information and confirmation that this is an issue with the host (Rochen) please see curl/curl#2836 Also pinging @wilsonge on this.
Yesterday I observed that when I try fetching https://downloads.joomla.org/latest
with cURL 7.58.0 (Ubuntu 18.04) the returned response was a GZip stream instead of the raw HTML I was expecting. If I pipe it through gunzip
it returns the HTML I expected. This was not an issue with earlier versions of cURL such as 7.47.0 (on Ubuntu 16.04).
After a substantial amount of debugging I found out that the problem is with the way Rochen's servers behave in the absence of an HTTP Accept-Encoding header. They return a GZip stream with the correct Content-Encoding
header. However, newer versions of cURL do not parse that header and return the raw GZip-encoded stream unless the --compressed flag is set. As you can see, the cURL maintainers have confirmed this is a server issue, not an issue with cURL itself.
The most immediate implication is that using cURL to retrieve the latest Joomla! version information from the Downloads site is not possible. Since Ubuntu 18.04 is the current LTS I can foresee a lot of issues with people trying to automate installing Joomla! with it.
I also checked the CDN, where the actual update information comes from. Right now it looks like it responds correctly: uncompressed stream when no Accept-Encoding is set. It also seems that it can only serve uncompressed streams. If in the future you get random update failures check if Rochen has changed the CDN to serve compressed streams even when it's not asked to (Joomla's cURL code would fail).
This takes me to the second issue. If Joomla at any point tries to fetch information off non-CDN project properties hosted on Rochen you will get impossible to trace bugs. This would happen if URL fopen wrappers are unavailable and Joomla drops back to using cURL as the transport layer. As far as I can tell there is no equivalent in libcurl for the CLI curl's --compressed flag. As a result, any server using libcurl 7.58.0 or later (e.g. an Ubuntu Server 18.04 LTS using the default PHP 7.2 distributed with it) will be returning the compressed stream, causing core code to bug out.
As I have mentioned in the cURL issue, other servers do not do that. Moreover, the cURL maintainers have confirmed that the downloads.joomla.org server should not send a compressed stream when Accept-Encoding is not set.
Finally, I have filed an issue with Rochen themselves about this issue. However, I can only ask them with regards to my own site, not Joomla's infrastructure. You will have to ask them to fix it. Or not, but then you might get the impossible to debug issues and lose a Sunday evening down a rabbit hole like yours truly :)
In the meantime I am using wget in my script which automatically installs the latest Joomla! version to overcome this issue. I just wanted to let you know.
Labels |
Added:
?
|
Category | ⇒ | Administration |
Status | New | ⇒ | Discussion |
@mbabker should this be moved to the https://github.com/joomla/joomla-websites list ?
Yeah. Unless there’s some asinine bug in the Joomla application classes
deciding to flip on gzip compression even though it’s turned off in global
config.
On Mon, Aug 6, 2018 at 12:58 PM Brian Teeman notifications@github.com
wrote:
@mbabker https://github.com/mbabker should this be moved to the
https://github.com/joomla/joomla-websites list ?—
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
#21423 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAWfoRUXFa_GK9AzH8bo74PerdVAHjKPks5uOIO4gaJpZM4Vwqmn
.
--
This is a Rochen issue. I have confirmed it. They do see that it happens only with HTTP/2. However, they don't seem to want to address it.
Per the HTTP/2 specification, the HTTP/2 protocol's Semantics and Content and governed by RFC 7231. The wording which says so:
HTTP/2 is intended to be as compatible as possible with current uses of HTTP. This means that, from the application perspective, the features of the protocol are largely unchanged. To achieve this, all request and response semantics are preserved, although the syntax of conveying those semantics has changed.
Thus, the specification and requirements of HTTP/1.1 Semantics and Content [RFC7231], Conditional Requests [RFC7232], Range Requests [RFC7233], Caching [RFC7234], and Authentication [RFC7235] are applicable to HTTP/2. Selected portions of HTTP/1.1 Message Syntax and Routing [RFC7230], such as the HTTP and HTTPS URI schemes, are also applicable in HTTP/2, but the expression of those semantics for this protocol are defined in the sections below.
A request without an Accept-Encoding header field implies that the user agent has no preferences regarding content-codings. Although this allows the server to use any content-coding in a response, it does not imply that the user agent will be able to correctly process all encodings.
NginX and Apache 2.4 will return the identity (uncompressed, non-encoded) stream in this case. So does CloudFlare. Basically, everyone does that except LiteSpeed, the web server that Rochen is using and the one server which always takes a very liberal and incompatible stance to the way it implements thing (like their caching module ignoring Expires headers, but that's another story for another time).
While in theory this behavior might be borderline correct the end result is that clients making a request without an Accept-Encoding header receive unexpected compressed data they do not know how to process. Even worse, libcurl (used by the PHP cURL module) does not have an option to send an Accept-Encoding: identity
header automatically; you need to do that yourself.
This means that right now we have a Joomla issue. When you are using Joomla\CMS\Http\Transport\CurlTransport against a Rochen server using HTTP/2 you will get an unexpected GZipped response. The only way to work around that as a developer is sending the Accept-Encoding: identity
header in the $headers
array when making a request. This is something most developers won't know to do because, frankly, it's both unexpected and an esoteric issue to boot.
It looks to me that we have to modify Joomla\CMS\Http\Http::__construct()
. After $this->options
is set we need to get headers
and recursively merge it with array('Accept-Encoding' => 'identity')
and then set it back into the $this->options
Registry object to ensure that an Accept-Encoding header is set at all times. Either that, or modify CurlTransport::request()
to apply a default Accept-Encoding header if none is set and $options[CURLOPT_ENCODING]
has not been set yet. Even that is a band-aid concerning core code. Anyone using their own code will be unpleasantly surprised when they make a request to a Rochen server, like I was.
Finally, I'd like to comment that seeing that the problem is with LiteSpeed I don't think Rochen will (or can) fix it in the foreseeable future. They'll probably just argue that it's part of the HTTP/2 specification (it's not) or don't see anything wrong with it, like they did in my support ticket with them. So... good luck!
While in theory this behavior might be borderline
Only commenting here to say that maybe this behavior is legitimate,
(or maybe i misunderstood the issue and in HTTP/2 it should not be like in the RFC for HTTP/1.1)
If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding
Trying searching for the above inside here
https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
Try googling the same phrase:
e.g.
https://bitbucket.org/atlassianlabs/httplease/pull-requests/25/automatically-decompress-gzip-and-deflate/diff
@ggppdk Really, did you just try to mansplain me this issue? Dude, I know how to search the Internet FFS. Actually I did search these things long before you did, I actually read the search results, I actually understood them, I even asked a world expert to make sure I am not missing anything and only then did I post here. It would seem that not only you did none of the above, you didn't even try to read my post and follow the links I gave you.
I linked you to the HTTP/2 specification which tells you which RFC applies (RFC7231). I also linked you to the RFC7231. At the very top it tells you "Obsoletes: 2616". So what did you do? You are quoting RFC2616, the one obsoleted by RFC7231.
You also didn't read the HTTP/2 specification which tells you that they are using the same RFCs as HTTP/1.1 when possible with a few explicitly stated exceptions (none are applicable for RFC7231).
Moreover, I have already quoted you the part of RFC7231 section 5.3.4 which says that "Although this allows the server to use any content-coding in a response, it does not imply that the user agent will be able to correctly process all encodings."
If you read further ahead (about half a page) you will see that the proposed server algorithm does say that the server MAY return whatever encoding it wants.
HOWEVER, in practice, no other server sends arbitrary encodings not explicitly requested because of the previous paragraph I quoted: the client is not guaranteed to understand the arbitrary encoding. This is exactly what happens with cURL and that's the crux of the issue. If you had paid attention you'd have seen that coming. It's not like I didn't explicitly state it...
Again, the only web server I have encountered which sends compressed encoding is LiteSpeed and it only does it for HTTP/2 (even though HTTP/1.1 is also governed by the same RFC). If nothing else, that's an inconsistency in LiteSpeed. For me, that's a bug in the server.
I already told you that the correct mitigation in PHP code is to send an Accept-Encoding header, NOT trying to automatically decompress the incoming stream. The why is not obvious: because HTTP/2 also allows for header compression, see HPACK, meaning that a misbehaving server's response may be completely unreadable by libcurl, long before you have the chance to handle it in PHP code.
I'd like to end on a personal note. I have seen you doing the same things on many a Joomla issue: you skim the conversation, apparently make a 5' Google search, write a comment that makes no sense to anyone even remotely familiar with the subject and has been addressed above, people have to stop to explain why you're wrong and at this point the discussion is derailed. Can you please refrain from doing so? The world doesn't need yet another armchair pundit. There are things you do know and you should really comment on these. Just please don't waste everyone's time commenting on what you have no idea about. Thanks!
i have a good answer for what you wrote, but i have a feeling you are not in the mood to read it,
i am sorry that i have annoyed
trying to spend time here (i mean this tracker, i do not mean this issue) to attempt to contribute something is a waste of time, you get insulted for very small reasons as
2 silly reasons for being insulted
again i am sorry for annoying you and for wasting your time (and my time)
please no more insults, thanks for understanding
As long as the reply does not reference an obsolete RFC, does not iterate what I have already said and does not imply that you never read my posts I am always in the mood of reading it.
Before you do, however, I have some more information. Rochen contacted me again on the ticket I have opened with them about this issue after it was escalated to engineering. They consider that the issue is indeed a bug in how LiteSpeed implements HTTP/2 and are filing a bug report with LiteSpeed's developers.
To be perfectly clear, RFC7231 applies to both HTTP/1.1 and HTTP/2. It says two things about what to do in the absence of Accept-Encoding header. One, the server MAY send the content encoded. Two, the client MAY NOT understand it. The words "MAY", "SHOULD" and "MUST" have special meaning in RFCs. Therefore, even though the server is allowed to send the content encoded with whatever they want, they should not expect the client to understand it. So, technically, sending GZipped data is allowed.
However, everyone but LiteSpeed have agreed that sending content encoded in a format the client is not guaranteed to understand is useless. At best it will result in the client asking for the content again in a supported encoding. At worst, the client will have no idea what's going in. That's why literally every other server does not send encoded content and that's why even Rochen engineers consider this a bug in LiteSpeed. More so when LiteSpeed does not exhibit this behavior for HTTP/1.1 which is governed by the same RFC with regards to content encoding.
For the Joomla! project this has two immediate effects.
First, people trying to use cURL to contact its servers will get unexpected results unless they use the --compressed
option. This is non-obvious and it's likely to make people think that Joomla's servers are broken or that it's Joomla's (the CMS') fault, driving them away. That's how I found out about this issue. I was updating my Vagrant box to use Ubuntu 18.04 which has an affected cURL version. My script to automatically install Joomla was failing, whereas my much simpler script to install WordPress wasn't. You see the point, yes?
Second and independent of the previous issue, Joomla! comes with (the class formerly known as) JHttp which supports different transports. One of the transports is cURL. If a server does not support URL fopen wrappers it will fall back to CurlTransport. These servers are all too common because hosts can't figure out that they should disable allow_url_include (security threat!), not allow_url_fopen (fairly innocuous method to contact third party servers). If a component using JHttp on this kind of server and with an affected version of cURL tries to contact a LiteSpeed server over HTTPS the content might be GZipped (or deflated, which is a different encoding!) and the script will bug out. This will be a common theme since Ubuntu Server 18.04 is affected and it will remain in service until April 2022 at the earliest. For the developer debugging this issue it will appear that Joomla is broken. Hence my suggestion to fix it which is really easy.
Regarding the second point, I did see the solution you linked to but as you can most definitely see it's declined (rejected). If you pay close attention, it's the same thinking as me: it cannot possibly be the default since you cannot guarantee if the server will send a compressed or uncompressed stream (or even what kind of compression it will use; gzip and deflate are two different algorithms). The correct solution is to have our client send an Accept-Encoding HTTP header.
I also did see that this is where you got the link to the obsolete RFC. It would have saved you a lot of time if you did read my previous reply since I was linking you to the correct section of the correct RFC (and even telling you why) where you could have read the updated wording.
In the end of the day I do not dispute that the RFC allows encoded data to be sent in this case. I dispute that it's the advisable thing to do and I warn that it will cause trouble. I mean, I actually said as much. I told in my reply that in theory the behavior is borderline correct but in practice it's not. So what was your point? Ignoring everything I wrote, linking me to an obsolete RFC and making the same point? Do you understand why I called your comment useless? It's because it does not serve any purpose.
If you have something interesting to add please do. If you don't have something interested to add don't argue for the sake of arguing. And for crying out loud, don't tell people to google something when they have linked you to the updated RFC you failed to read and have already made your point. I mean, you do understand that this is insulting?
Apparently it was a bug in LiteSpeed and it’s fixed following my report to Rochen. I will quote their final reply to my ticket with them:
We have heard back from the LiteSpeed team and they reported that in the past some browsers would not send the Accept-Encoding header but expected Gzip compression over HTTP/2, so they built in this logic to address those cases. They've done some additional testing and said that this is no longer the case, so they've adjusted LiteSpeed's behavior to require the Accept-Encoding header to be advertised from a given client before compression is enabled.
The updated LiteSpeed build (5.2.8 build 6) has now been pushed to our servers and I've tested your site confirming the issue no longer persists.
The proposed change in CurlTransport could probably save some poor soul dealing with an out of date server but since the upstream provider (LiteSpeed) fixed the issue in their latest release it’s no longer a pressing matter. I am closing this issue as resolved.
Have a great night, y’all!
Status | Discussion | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2018-08-07 19:58:39 |
Closed_By | ⇒ | nikosdion |
Great stuff @nikosdion
Thanks for reporting this @nikosdion - much appreciated!
You're welcome, guys!
There's clearly a hosting level issue here. We don't have Gzip enabled in Joomla on the site, nor any compression related rules defined in
.htaccess
, and even better when you load https://downloads.joomla.org/api-docs/ which is a static HTML document it has acontent-encoding: br
response header which seems to imply either something in Joomla is enabling encoding or there's a server level setting in place that I'm unaware of. Lovely...