? Pending

User tests: Successful: Unsuccessful:

avatar steffans
steffans
17 Sep 2021

This PR adds support for MD5 hashes for update file checksums. This allows easier integration with cloud hosted files which already provide different hashes as metadata.

For example Google Cloud Storage, see: https://cloud.google.com/storage/docs/hashes-etags

avatar steffans steffans - open - 17 Sep 2021
avatar steffans steffans - change - 17 Sep 2021
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 17 Sep 2021
Category Libraries
avatar PhilETaylor
PhilETaylor - comment - 17 Sep 2021

Md5 is not as good as the other types of hashes - what good reason is there for md5 over and above better hashes? @joomla/security

avatar zero-24
zero-24 - comment - 17 Sep 2021

When introducing that feature in the first place the decision to only go with sha256, sha384 and SHA512 has been taken.

Please take a look there and following:
#17619 (comment)

So i dont think we should start adding weaker hashing algorithmen now.

avatar PhilETaylor
PhilETaylor - comment - 17 Sep 2021

After reading the whole of #17619 I cannot see this PR being merged, especially as the defacto crypto expert scott has commented in that thread.

After reading up on Google's use of the hashes, I still cannot see what justification there is for md5 validation within Joomla installer even with google hosted files?

avatar steffans
steffans - comment - 17 Sep 2021

The current implementation isChecksumValid() actually checks all available hashes before returning the self::HASH_VALIDATED result, i am not sure if this is intended? Shouldnt one hash comparison be enough, since it takes sometime for larger files?

As mentioned sometimes only a few hashes are available when using cloud hosted or cdn files, therefore it would be good to support more hash options. I dont see this kind of limitation mentioned in the hash discussion link.

I would also sort them from strong to weak in order to for the client pick the strongest hash first, if multiple hashes are provided: array('sha3-512', 'sha3-384', 'sha3-256', 'sha512', 'sha384', 'sha256', 'sha1, 'md5'); Further they can be compared with the supported hashes of the PHP version using hash_algos().

avatar PhilETaylor
PhilETaylor - comment - 17 Sep 2021

As mentioned sometimes only a few hashes are available when using cloud hosted or cdn files

Forgive me as Ive not been too close to this code for a while, but surely the provided hashes that the installer gets from the XML file are manually input and provided in the content of the XML update file, and not generated by Google?

avatar steffans
steffans - comment - 17 Sep 2021

Forgive me as Ive not been too close to this code for a while, but surely the provided hashes that the installer gets from the XML file are manually input and provided in the content of the XML update file, and not generated by Google?

All our template/extension update XML files are auto-generated from Google Cloud Storage files (using cloud storage metadata like file sizes, modified date, ...)

avatar PhilETaylor
PhilETaylor - comment - 17 Sep 2021

All our template/extension update XML files are auto-generated from Google Cloud Storage files (using cloud storage metadata like file sizes, modified date, ...)

So if your google account gets hacked, and the files modified, then the hashes are automatically generated from the modified files and therefore the hash is not providing any security whatsoever... #facepalm

avatar brianteeman
brianteeman - comment - 17 Sep 2021

Thank you for your first pull request. Good to see youtheme here.

avatar zero-24
zero-24 - comment - 17 Sep 2021

The current implementation isChecksumValid() actually checks all available hashes before returning the self::HASH_VALIDATED result, i am not sure if this is intended? Shouldnt one hash comparison be enough, since it takes sometime for larger files?

You are not required to provide all three of them one would be ok too, but when you provide all three all three should be checked in my opinion.

I would also sort them from strong to weak in order to for the client pick the strongest hash first, if multiple hashes are provided: array('sha3-512', 'sha3-384', 'sha3-256', 'sha512', 'sha384', 'sha256', 'sha1, 'md5'); Further they can be compared with the supported hashes of the PHP version using hash_algos().

Fine for me to resort (strongest first) and I'm not against adding more secure hashes while i dont see a major benifit yet but I still disagree to add weaker hashing algos than we have right now.

avatar nikosdion
nikosdion - comment - 18 Sep 2021

Can we explain why the checksums are added and how 3PDs should use them to both offer security and not shoot their feet? I'll go first, since I've been there and done that — and I'd really like to see YooTheme ramping up security to the benefit of many Joomla users :)

It's possible that an attacker might replace the update ZIP package with a malicious one. This doesn't require a MITM. It could be that the developer's download repository was hacked and the update ZIP files silently replaced. I remember a well–known profile Joomla extension suffering from that about a decade or so ago (no, it wasn't any of my extensions).

The checksum guards from that as long as it's computationally prohibitive for the attacker to create a malicious ZIP file with the same checksum. Remember that ZIP files are especially vulnerable to manipulation without affecting their ability to be extracted. All you need to do is take a malicious ZIP package file, add the ZIP comment header and then start appending specially crafted data to make the whole file report the checksum you want.

Trying to do that with brute force is computationally expensive. However, both MD5 for a very long time and SHA-1 for nearly a decade now have been broken mathematically which means that figuring out the data you need to pad the malicious ZIP file with to make it consistent with the valid one is computationally cheap and within the realm of possibility for not very sophisticated attackers.

The checksum algorithms currently supported by Joomla are those which have not been mathematically broken and are VERY computationally expensive to brute force.

As a 3PD you should offer SHA-256 if you aim for maximum compatibility across servers. While possible to offer any number or even all hashes Joomla will only use the strongest one the server supports. However, including many algorithms greatly increases the size of the update XML file which means higher bandwidth usage — in my case it was a 30% increase in my monthly CDN costs (I have hundreds of millions of hits on the update streams every month) so I'm only offering SHA-256 for now.

avatar PhilETaylor
PhilETaylor - comment - 18 Sep 2021

The checksum guards from that

As an aside, However, what the OP is saying is that the md5 sum in the Google meta data of the zip file is being used in their XML, so regardless of the change to the zip file, the md5 will always validate, because its generated by the cloud hosting as the meta data of the actual zip file. Change the zip file, the md5 automatically changes as its used as meta data by Google hosting the files.

So basically that hash, in the XML, is totally useless for their specific use case and doesn't convey any additional security to Joomla, but is only conveying data integrity (that the whole file downloaded and is available for Joomla) not data integrity that the zip has not changed since a release was made.

Basically the way they are implementing their XML hashes is completely wrong as a 3PD.

The hashes and the zip file should not have any automatic relationship. The hashes should be manually generated and if the zip file changes then the hashes would not match - this is what Joomla is more interested in - security of the file not being modified since release, rather than using the hash for data integrity (that the whole file is downloaded)

when you provide all three all three should be checked in my opinion.

A total waste of resources, and PHP run time. The strongest hash THAT THE WEBHOSTING SERVER CAN CHECK should be checked first - we agree on that - but if that passes there is no point checking the rest unless you can provide a test case to back up your point.

avatar PhilETaylor
PhilETaylor - comment - 18 Sep 2021

The checksum algorithms currently supported by Joomla are those which have not been mathematically broken and are VERY computationally expensive to brute force.

And for this reason alone, this PR should be rejected on those grounds.

avatar nikosdion
nikosdion - comment - 18 Sep 2021

The hashes and the zip file should not have any automatic relationship. The hashes should be manually generated and if the zip file changes then the hashes would not match - this is what Joomla is more interested in - security of the file not being modified since release, rather than using the hash for data integrity (that the whole file is downloaded)

I agree. On my site the hashes are calculated exactly once, when the file is first uploaded and I create a new database entry for it in Akeeba Release System running on my web server. The update XML file is then downloaded from my web server and uploaded to the CDN. It would take hacking two different things (web server and CDN) within a very short time window to create a security issue. I also have other safeguards in place but I won't discuss them because they are not observable from the outside and I'd rather not show my hand ?

A total waste of resources, and PHP run time. The strongest hash THAT THE WEBHOSTING SERVER CAN CHECK should be checked first - we agree on that - but if that passes there is no point checking the rest unless you can provide a test case to back up your point.

Well, there is a theoretical point here that crafting a malicious file which spoofs multiple hashing algorithms is a far harder problem — mathematically and computationally speaking — than spoofing a single hashing algorithm.

On the flip side, this increases the computational time required to validate the hashes which can create a real world problem. The more hashes provided by a developer the more likely it becomes their updates will fail on slower hosts. This acts as a “perverse incentive” for developer to only ever provide the lowest common denominator. This is already true to some extent when you take into account the amount of time SHA-3-512 takes compared to SHA-256 on larger extensions and it already provided me with the “perverse incentive” to only offer SHA-256 instead.

And for this reason alone, this PR should be rejected on those grounds.

Correct. I am against this PR. MD5 and SHA-1 must NOT be used in production outside of HMAC algorithms (and that's only because HMAC goes into thousands of rounds of hashing which creates adequate mathematical complexity to make a mathematical solution more computationally expensive than brute–forcing).

avatar bembelimen
bembelimen - comment - 22 Sep 2021

Hello @steffans thank you for your first time contribution, I really appreciate it.
While I see your point and it's really a valid one, the concerns raised regarding security are also very important to consider. As this feature is not only there to validate the file but also a security meassure, I agree to not go for md5/sha1, as it would really weaken that check.

avatar bembelimen bembelimen - change - 22 Sep 2021
Status Pending Closed
Closed_Date 0000-00-00 00:00:00 2021-09-22 05:52:41
Closed_By bembelimen
Labels Added: ?
avatar bembelimen bembelimen - close - 22 Sep 2021

Add a Comment

Login with GitHub to post a comment