? Pending

User tests: Successful: Unsuccessful:

avatar PhilETaylor
PhilETaylor
23 Apr 2021

Summary of Changes

Following up on forked from #32879 about #32490 and closes #32880

This PR closes the last security issue where a user provided slug/alias in a URL is not validated.

It covers TWO situations.

  1. When SEF is disabled a url is generated like https://example.com/?view=article&id=3:my-article&catid=9 which as 3:my-article and loads that article correctly. Before this PR, you could change my-article to be i-hacked-you and the article would still load. After this PR, you will get a 404 Article Not Found if the 3 and the my-article alias/slug are not the same as the Articles ID and Alias in the db.

  2. When SEF is enabled, but you have gone to Articles -> Options -> Integration and disabled "Remove IDs from URLs" - this will generate URLS like: http://example.com/index.php/bottomc-menu-item/3-my-article where 3 is the article id and my-article is the slug/alias of the article. Before this PR, you could change my-article to be i-hacked-you and the article would still load. After this PR, you will get a 404 Article Not Found if the 3 and the my-article alias/slug are not the same as the Articles ID and Alias in the db.

Testing Instructions

As above.

Actual result BEFORE applying this Pull Request

you could manually manipulate the URL segments, but Joomla ignored them and loaded the URL anyway. This allowed anyone to put anything (malicious) in your URL and Joomla would ignore it.

Expected result AFTER applying this Pull Request

Article aliases provided in URLs are now validated once the article has been loaded from the db using the article id.

If the alias doesn't match the raw user input, then a 404 is thrown.

Documentation Changes Required

None.

Q&A

I saw "Careful, unfiltered user input" in your PR - isn't this insecure.

No. The Joomla Input API doesn't allow you to collect "my-article-title" as a hyphen delimited string with any of its methods.

We could call getString or getWord but these return the string without the hythens and therefore it cannot be directly compared to the alias from the db.

We are not "processing" the user supplied string at all. It is used in a single comparison using !== against the string from the articles alias in the db. We do not use it for any other purpose, we do not store or render it.

@ReLater "Just a question: Is there a reason that the alias is included in id value? Does the router code need it somewhere?"

No, someone in the past decided that appending the article alias to the id in the url was "good for seo". It serves no purpose, has never been used by Joomla, has never been filtered or validated before today.

@Hackwar "Are you planning to just check this during the parseing? Because I would be hesitant to get the aliases each time from the database for building the URLs. "

Joomla ALREADY appends the alias to the id. This PR doesn't change that. We do not validate it at the time Joomla creates (builds) the urls. There is no overhead caused by this PR at build time.

This PR only checks the url provided by the browser. There is no performance penalty because we are already loading the complete article form the db - there is basically only one more if statement before a 404.

That is what I described above. If you have a page with a hundred URLs, you get at least a 100 queries additionally to check the alias, which is why I'm asking to not do this in build, but in parse. We are not validating the ID during parseing of the URL. That is something that the component has to do later on. So it would be an additional query. But one additional query shouldn't really worry us. I'm just trying to bring up all the things that we have to keep in mind.

Correct. The PR doesn't actually validate in the parsing of the url, but after the article is actually loaded from the db, because to validate in the router or before then would mean additional SQL queries for no good reason when the article is loaded by a primary key lookup of an integer anyway.

// cc @Hackwar @MartinK63 @Ruud68 @Bakual

avatar PhilETaylor PhilETaylor - open - 23 Apr 2021
avatar PhilETaylor PhilETaylor - change - 23 Apr 2021
Status New Pending
avatar joomla-cms-bot joomla-cms-bot - change - 23 Apr 2021
Category Front End com_content Libraries
avatar PhilETaylor PhilETaylor - change - 23 Apr 2021
The description was changed
avatar PhilETaylor PhilETaylor - edited - 23 Apr 2021
avatar Hackwar
Hackwar - comment - 23 Apr 2021

Thank you for this PR. I don't think the scenario 1 is a valid issue, since even with your validation you could add &element=OMG-HACKED to any URL and it would still not throw an error. Scenario 2 is a valid issue, but I'm wondering why you are checking this in the model. This is the task of the router and the get<something>Id() methods in the routers would be the right place in my opinion. I don't have an issue with a simple query that checks this string against the database here. When you simply return false in those methods, the system will automatically throw a 404. In any case we would need this for all core content components. Otherwise the same issue still exists for contacts, newsfeeds, etc.

Since we are nitpicking already: You extract the alias part of the segment and call that a slug. Historically the slug has been the combination of ID and alias, so this wouldn't be the slug anymore, but just the alias.

avatar PhilETaylor
PhilETaylor - comment - 23 Apr 2021

As for 1. For as long as joomla is creating an id with an alias appended, it should be validated

Or maybe it's better to remove this useless alias from the id Param completely - it serves no point in joomla and is just fake seo!

I don't have an issue with a simple query that checks this string against the database here.

Yet another useless sql query :-( every query slows joomla down a bit... some pages in joomla are already 20+ queries per page load!this is why I did it after the article had loaded, as the ID provided is valid and the alias serves no purpose in the router up until this point historically, therefore it was simpler and more performance to check where I do.

avatar PhilETaylor
PhilETaylor - comment - 23 Apr 2021

This is the task of the router and the getId() methods in the routers would be the right place in my opinion

The router currently does ZERO validation on the Article ID - infact it does a rather nasty casting of "3-my-article-title" to "3" with (int) - that is the most it does.

Thank you for this PR. I don't think the scenario 1 is a valid issue, since even with your validation you could add &element=OMG-HACKED to any URL and it would still not throw an error.

Correct - but we are not talking about appending more params - we are interested in the param id that Joomla actually reads and the full value of that param in scenario one is 3:my-article - not as you assume just 3. Therefore the full user provided value for id should be validated and not just exploded and abandoned the part after :

However I vote for removing this "Fake SEO" completely - there is no need for it, Joomla doesn't use it, doesn't validate it (before this PR) and doesn't need it to work in scenario one

avatar Hackwar
Hackwar - comment - 24 Apr 2021

The added alias in non-SEF URL mode could indeed simply be dropped. For historical purposes I would keep it, just to not add another change for change sake. However we do use this alias in SEF URL mode, which prevents us from having to load the right alias for each and every URL again and again from the database. So right now, removing this from the URL is more work than keeping it and since the number of non-SEF URL mode sites out there should be fairly low, I don't see a reason to change this.

Still, I'd like the alias validated in the router instead of the model and in any case we would need this for all components and not just com_content.

avatar Ruud68
Ruud68 - comment - 24 Apr 2021

Still, I'd like the alias validated in the router instead of the model and in any case we would need this for all components and not just com_content.

@Hackwar that is exactly what I did in proposed PR #32500 : changing the content router to parse loose (as it is now) or strict (also take into account the alias). The PR itself is a bit messy as I only got requests for code styling (even for restyling coe I copied from some lines above) so at one point trying to please everybody another PR slipped in somewhere. Could you have a look at that PR, what is does and how it does it?
Sorry @PhilETaylor not trying to hijack your PR here, I really think this needs to be fixed one way or another.

avatar PhilETaylor
PhilETaylor - comment - 24 Apr 2021

yes well I cannot work on this today, as I now have to go back to working on the FTP Layer it seems. 69 Github emails overnight :)

avatar Ruud68
Ruud68 - comment - 24 Apr 2021

yes well I cannot work on this today, as I now have to go back to working on the FTP Layer it seems. 69 Github emails overnight :)

Big thumbs up for the energy you put in here and another one for the 'stamina' you have for hanging in here :)

avatar Hackwar
Hackwar - comment - 24 Apr 2021

Still, I'd like the alias validated in the router instead of the model and in any case we would need this for all components and not just com_content.

@Hackwar that is exactly what I did in proposed PR #32500 : changing the content router to parse loose (as it is now) or strict (also take into account the alias). The PR itself is a bit messy as I only got requests for code styling (even for restyling coe I copied from some lines above) so at one point trying to please everybody another PR slipped in somewhere. Could you have a look at that PR, what is does and how it does it?

If we change this here and now, we wont make this optional. Otherwise the code in getArticleId() looks more or less good. I would have loaded the alias from the DB and then compared that against the alias we have right now in PHP, then either returned false if the entry didn't exist or if it isn't equal, the ID otherwise.

avatar PhilETaylor
PhilETaylor - comment - 24 Apr 2021

This PR "as is" is dead and so Im closing it, and will return to implement it the way you have asked for later in the weekend.

That being, a check in the get<something>Id() methods, even though doing so adds an additional SQL query to the page load. Further, will have a look at other URLs in other components that might benefit from url validation at a router level.

avatar PhilETaylor PhilETaylor - change - 24 Apr 2021
Status Pending Closed
Closed_Date 0000-00-00 00:00:00 2021-04-24 12:58:30
Closed_By PhilETaylor
Labels Added: ?
avatar PhilETaylor PhilETaylor - close - 24 Apr 2021
avatar Ruud68
Ruud68 - comment - 24 Apr 2021

Still, I'd like the alias validated in the router instead of the model and in any case we would need this for all components and not just com_content.

@Hackwar that is exactly what I did in proposed PR #32500 : changing the content router to parse loose (as it is now) or strict (also take into account the alias). The PR itself is a bit messy as I only got requests for code styling (even for restyling coe I copied from some lines above) so at one point trying to please everybody another PR slipped in somewhere. Could you have a look at that PR, what is does and how it does it?

If we change this here and now, we wont make this optional. Otherwise the code in getArticleId() looks more or less good. I would have loaded the alias from the DB and then compared that against the alias we have right now in PHP, then either returned false if the entry didn't exist or if it isn't equal, the ID otherwise.

Thank you @Hackwar
The reason I made it optional is because J3 with legacy router URLs are NOT 'compatible' with J4 modern router URLs (with article ID). That would mean that when migrating a site from 3.9 > 3.10 > 4.0 a lot of URLs could (and if the can they will) break.
In those cases you need the 'loose' mode to facilitate these 'non-compatible' migrated URLs.

That said @Hackwar should I reopen / create #32500 again and try to see it through this time?

avatar richard67
richard67 - comment - 24 Apr 2021

Thanks @PhilETaylor for all you are doing, and thanks all others for the discussion. Looking forward to the new implementation.

avatar richard67
richard67 - comment - 24 Apr 2021

This is why I love Joomla, because people work together on things, from all over the world.

avatar Hackwar
Hackwar - comment - 24 Apr 2021

@Ruud68 sure.

avatar Ruud68
Ruud68 - comment - 24 Apr 2021

There you have it, not only article alias but also category alias check. #33282

avatar PhilETaylor
PhilETaylor - comment - 24 Apr 2021

Thanks @PhilETaylor for all you are doing, and thanks all others for the discussion. Looking forward to the new implementation.

The infinite monkey theorem states that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type any given text, such as the complete works of William Shakespeare

If you throw enough mud at the wall, eventually some of it will stick.

Add a Comment

Login with GitHub to post a comment