forked from #32879 about #32490
There is one more case I wanted to address but ran out of time and that is with the new MODERN routing its possible to generate a url like:
https://example.com/?view=article&id=3:my-article&catid=9
where my-article
is the alias of the My Article (id:3) article. This can also be manipulated like:
https://example.com/?view=article&id=3:HAHA&catid=9
and that url will still work correctly. This still needs addressing.
https://example.com/?view=article&id=3:HAHA&catid=9 is a 404 as HAHA has no part to play here
Or could just remove the alias part from the id...
https://example.com/?view=article&id=3&catid=9
https://example.com/?view=article&id=3:HAHA&catid=9 is a valid url that loads the article with id 3 and HAHA is not checked to see if its the same as the article alias (its not)
Checking Joomla 4, with "Remove IDs from URLs" turned off, then the same bug exists with SEF urls that #32887 aimed to fix in Joomla 3 whereas:
A url of http://127.0.0.1:4444/bottom-most/1-my-article is generated, and can be manipulated like http://127.0.0.1:4444/bottom-most/1-HAHAHAHAHAHAH without a redirect/404 being generated :-(
Labels |
Added:
?
|
It's not needed but for as long as it's provided it should be validated as user input
You don't understand the SEO effect, you prove this by thinking you can stop it, when you actually have no control over how people link to your site.
Some really nasty things can be inserted into the alias and linked to your site from other sites. Google will eventually associate that new fake url with the nasty stuff. Just like the other pr about the legacy router where I provide a concrete example at joomla.org urls.
No other system would allow hackers to manipulate urls in this way.
Google will eventually associate that new fake url with the nasty stuff.
Is there any proof that this will happen? Because I actually think Google is smart enough to not do that. After all, it's a single link from an external site and all internal links to the same content have a different URL. For Google that is a mild form of duplicate content and they pretty sure give priority to the same-site links when it comes to which URL is correct.
Correct forming of canonical links would actually solve the issue as well as it tells Google the correct URL for that content.
Is there any proof that this will happen?
Yes, according to #32490
Whatever. At the end of the day it's unsanitised user input. It should either be removed or be validated - it should not be ignored. No one else should be able to craft valid URLs for your site. Period.
I just checked with some major news-site here in Switzerland (20min.ch) and Germany (bild.de and spiegel.de). You can manipulate their URL of any article as well, but they redirect (301) to the correct page then. Which is what I would expect as a user and site owner. As a site owner, I don't like presenting 404 to my users, even if they mispelled the URL.
So imho the best solution would be to check the incoming URL and if it's not correct, automatically redirect to the correct one.
Plainly showing a 404 just because the alias doesn't match is not a good solution.
So basically you are saying that all the work in #32879 is completely wrong approach. Got it. I disagree. This is why Joomla makes no progress.
I don't like presenting 404 to my users, even if they mispelled the URL... Plainly showing a 404 just because the alias doesn't match is not a good solution.
A 404 is literally the correct response to an invalid url!
Status | New | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2021-03-27 14:27:41 |
Closed_By | ⇒ | PhilETaylor |
A 404 is literally the correct response to an invalid url!
Technically yes. But most sites actually like it when users find the correct page even if they misspelled the URL. So a redirect 301 to the correct URL is also an absolutely correct response.
I agree that the current state is not a desired behavior. However I think it's still better than showing a 404. And yes I think the other PR is not a correcct solution.
But then, I'm just one single guy. I have no permissions to merge or reject a PR (I used to have, but no longer wanted for a long time). I am not speaking for the whole project at all.
even if they misspelled the URL
I cannot for the life of me remember the last time I typed a full url longer than just its domain name and its TLD extension.. and maybe a slash and one word
Google will eventually associate that new fake url with the nasty stuff.
Is there any proof that this will happen? Because I actually think Google is smart enough to not do that. After all, it's a single link from an external site and all internal links to the same content have a different URL. For Google that is a mild form of duplicate content and they pretty sure give priority to the same-site links when it comes to which URL is correct.
Correct forming of canonical links would actually solve the issue as well as it tells Google the correct URL for that content.
@Bakual here you go: dm me if you need the domainname so you can see what google is actually showing in the index #32490 (comment)
No, this has not been changed in 4.0. When the URL is non-SEF, the router generally doesn't really change anything in it.
so, this is still a bug to fix in Joomla 4 then. Got it. Thanks for confirming.
Title |
|
Status | Closed | ⇒ | New |
Closed_Date | 2021-03-27 14:27:41 | ⇒ | |
Closed_By | PhilETaylor | ⇒ |
Are you planning to just check this during the parseing? Because I would be hesitant to get the aliases each time from the database for building the URLs. In worst case you add a few hundred queries for a single page by that.
I have no plans to work on this.
If the alias is in the url for absolutely no reason then it can simply be removed. It serves no purpose other than "fake seo"
There are not a "few hundred" aliases in the url. There is one in the url. Only one.
https://example.com/?view=article&id=3:my-article&catid=9
Im only talking about validating the incoming url vars... something is ALREADY making the db queries to generate these urls. Thats out of scope for this issue (unless the solution is to remove the "fake seo" alias from the url)
my-article
should be a valid alias of the article of id 3. If its not then its not valid. at the moment it can be anything.
Checking Joomla 4, with "Remove IDs from URLs" turned off, then the same bug exists with SEF urls that #32887 aimed to fix in Joomla 3 whereas:
A url of http://127.0.0.1:4444/bottom-most/1-my-article is generated, and can be manipulated like http://127.0.0.1:4444/bottom-most/1-HAHAHAHAHAHAH without a redirect/404 being generated :-(
Are you planning to just check this during the parseing? Because I would be hesitant to get the aliases each time from the database for building the URLs. In worst case you add a few hundred queries for a single page by that.
this is only for parsing, not for building. So that would involve only 1 (not additional but) changed query: you are already querying for the id that will be extended to query both the id and the alias
Title |
|
Checking Joomla 4, with "Remove IDs from URLs" turned off, then the same bug exists with SEF urls that #32887 aimed to fix in Joomla 3 whereas:
A url of http://127.0.0.1:4444/bottom-most/1-my-article is generated, and can be manipulated like http://127.0.0.1:4444/bottom-most/1-HAHAHAHAHAHAH without a redirect/404 being generated :-(
That is what I described above. If you have a page with a hundred URLs, you get at least a 100 queries additionally to check the alias, which is why I'm asking to not do this in build, but in parse.
this is only for parsing, not for building. So that would involve only 1 (not additional but) changed query: you are already querying for the id that will be extended to query both the id and the alias
We are not validating the ID during parseing of the URL. That is something that the component has to do later on. So it would be an additional query. But one additional query shouldn't really worry us. I'm just trying to bring up all the things that we have to keep in mind.
Generally: I, and none of the production team, are your enemies. Quite the opposite. We are very gratefull for your work. To me, you are coming over as if you think you had to fight against us. We have common goals here and we are all trying our best to reach these goals. We currently only differ on certain rules which we put up.
I totally disagree with the production team - this issue is a major flaw. No one these days is go to type in an url... everyone clicks on a link - a link that can be manipulated and will show up in google search results - google will eventually change the link if its a 301 but google will surely delete the false link if its a 404.
So this has to be fixed - in every version of joomla!
I totally disagree with the production team - this issue is a major flaw. No one these days is go to type in an url... everyone clicks on a link - a link that can be manipulated and will show up in google search results - google will eventually change the link if its a 301 but google will surely delete the false link if its a 404.
So this has to be fixed - in every version of joomla!
Agree, it is only a matter of time (but maybe that is already happening) before google categorizes / labels sites that in their eyes server p*rn links, so when you are running a legit business you will not show anymore on page 1 when somebody searches for your household equipment because you are categorized as running a completely different business.
We don't disagree that this is a major flaw. However we disagree that this can be fixed in a backwards compatible way. Fixing this in Joomla 3 will break thousands of websites and thus we can't fix this in the 3.x major version. Instead, we already have fixed it in 4.0.
Just FYI: I've been trying to fix this in Joomla 1.6 already and then pushed for the last big changes to the routing which at least partially fixed this for 6 years and yet another year to fix this in 4.0.
I cannot discuss on the b/c topic - which is very important surely - but this is issue is a problem which can badly affect the status of joomla as a reliable cms - even if the system was not hacked - a hacker can make it look like it was - at least on the url - some people wont notice the technical difference between an url and the content - they think the url is coming from the site. so i can only urge to fix this for 99% of the joomla sites (at the moment)
You mean that something that has been like this for 15 years needs to be fixed now, definitely breaking thousands of websites and requiring development work from them? At the same time breaking our semVer promises we made? I can guarantee you, that if we change this now, the Joomla project would loose half of its userbase. Not everyone would even be directly affected by this, but the break in trust would be devastating.
I can guarantee you, that no one in charge in the production part of the project will support this change in the 3.x branch of Joomla, especially since you can partially fix this by using modern routing without IDs and additional fixes have already also been deployed to Joomla 4.0.
We had such a change in another area in 2014, where someone thought it would be necessary to change the hashing of passwords and we still get people complaining about how unreliable we are because of that release 7 years ago.
@Hackwar The same logic applies to security issues that are in core for 15+ years, you fix them when you find out about them. Ignoring (next to security by obfuscation) is no security at all. We (the Joomla community) trust that these matters will be dealt with when they arise).
When looking at the usage stats you see that only 30% of the sites make the move from an older version, so it is already loosing 70% of its user base. The move to 4.0 will (imo) not be made for even more sites, that is what I hear and see.
So for me this should be fixed in 4.0 and if that means that it breaks b/c with 3.10 then so be it. provide people with a migration' path / instructions and that soothes the pain
People need to invest in the upgrade, investing in URL changes can also be added to the list.
I did a PR addressing (part of) this issue as a proof of concept for J4. It adds a toggle [loose|strict] where loose is parsing an URL as it is now, and strict it not only checks the id but also the alias of the article. That at least gives people a choice in the matter instead of making the choice for them.
The PR stranded due to lack (none) of interest, only comments I got where the (very motivating) code styling changes. Maybe you can have a look at it and see if this is maybe a route to pursue?
#32500
When looking at the usage stats you see that only 30% of the sites make the move from an older version, so it is already loosing 70% of its user base
The stats are useless. If you set the "send once" option then you only know about the first version installed.
You can always add a ? or a & to the url and add what ever text you want.
would also work as
https://example.com/?HAHA=mytext+and+it+still+works&view=article&id=3:my-article&catid=9
nothing we can do against this. even for
Additionally to this, google tries to remove the complete url for years and with a market share of a monopolist it wouldn't take long anymore
just my 2 cents
Google (and the world) understand that what is after a question mark is NOT the url, but request query params.
There is a huge difference between request params, and the url.
The whole point of SEARCH ENGINE FRIENDLINESS is getting rid of query params in favour of keyword stuffed urls.
The same is true for fragments (what is after a hash sign #)
Google can hide the URL in a browser. but it is still a huge part of SEO and their indexing. They cannot remove urls, that is literally how the internet world.
The fact is this issue can and should be fixed. No one should be able to manipulate the correct urls of your site.
also, if this is "not really a problem" how come that the URL I reported (which was perfectly valid, returning a 200 OK and loading the Joomla 3.9.25 release news), has now been "fixed" to show a 404...
https://www.joomla.org/announcements/i/HACK/HACKED/YOUR/SITE/YOU/SUCK/5834-LALALALALALALLALALALALALA
which used to work, and is exploiting this, has now been "fixed" on joomla.org to return a 404...
HAHAHA... literally one rule for joomla and one rule for the rest of the world.
also, if this is "not really a problem" how come that the URL I reported (which was perfectly valid, returning a 200 OK and loading the Joomla 3.9.25 release news), has now been "fixed" to show a 404...
looking forward to the PR for this, maybe one of the maintainers can share it here.
The fake URL in my blog informing my customers about this issue is still resolving okay though, so 'I have tested this PR unsuccessful' #lol
Fixing this in Joomla 3 will break thousands of websites and thus we can't fix this in the 3.x major version. I
So its a won't fix
then, despite being implemented in a b\c opt-in way, it was declared a new feature and not serious enough to be classed as a security fix and therefore was rejected to be fixed in Joomla 3, despite the PR and work being done to ensure b/c and opt-in nature of it = declared dead here: #32887 (comment)
we already have fixed it in 4.0.
Partially.
Only if "Remove IDs from URLs" is enabled. Else, with "Remove IDs from URLs" enabled, you can still manipulate parts of the url and they are not validated/checked.
Therefore this issue remains open.
This one still works
https://www.joomla.org/announcements/5834-phil-this-hack-still-works-though
Just out of curiosity, what is the real disadvantage of such URLs?
I get it that Google indexes it, and it gets found if you search for the URL. But Google is smart enough to not show this fake URLs in regular searches (with search terms, not URLs). Google knows the correct address and will not show the fake one. (I've tested that with the site Ruud mentioned).
So imho it's more that it scares the owner of the site when he looks at the Analytics, but it doesn't affect customers.
Or do I miss something?
It definitely impacts users as it where the customers who brought this to the attention of the site owner. It's a (business) vulnerability. Just like a security issue, there is no issue until you get hacked... Or in this case your business gets linked by your customers to for example anti-semitism, or other nasty stuff
Still wondering how the site visitors where impacted. How did they get to see those fake URLs?
I couldn't get them to show in the Google search results using search terms. I always got the correct URL in the search results, not the fake ones.
I'm not saying this shouldn't be fixed, don't get me wrong. I'm just wondering what the severety is.
According to SEO experts Google takes into consideration working URLs (non 404) from other websites (backlinks) for the ranking of a website. So if keywords in the URL does not appear in the content this could lead to downgrading - especially if its in a highly contested segment.
SEO and Expert are two words that should never be used in the same sentence
@Ruud68 Without knowing what search words you used, that doesn't mean much. Did you search for an URL or for a keyword?
As I said I know that if you search for the URL, you get the results. But if you search eg for "Scorpio Gold Reports", you don't get that URL.
That's why I'm curious.
I think that Ruud fixed this for his client some time ago (with his own patch) - the website shows 404 or 410 now, so google removed the url - imho it doesn't matter how to find this - it was in the search results - so 1) people could see it - may be searching for porn - and 2) may be google was downgrading the original website because of the fake url
@Ruud68 Without knowing what search words you used, that doesn't mean much. Did you search for an URL or for a keyword?
As I said I know that if you search for the URL, you get the results. But if you search eg for "Scorpio Gold Reports", you don't get that URL.
That's why I'm curious.
I don't know what they type in Google, they will not tell me.
I have worked as hired project manager for a large bank. Before giving external (but also internal) contractors a contract they are required by law to do a background check on the person they are hiring. How and what they do is classified, but I feel confident that when they 'stumbled' across references from my Joomla site on google with god knows what illegal activities / content then I would not get hired > again a business vulnerability.
And again I think, just like a security fix or the Google floc PR, this should be fixed before it has hit you
And as I've said before, we are happy to fix this, but not in Joomla 3. It has been like this for 16 years now and it is a rather well known issue. It is not something that we can properly fix in a backwards compatible way and thus we will not fix it in Joomla 3. You are welcome to provide a PR for Joomla 4 to fix this.
It is not something that we can properly fix in a backwards compatible way and thus we will not fix it in Joomla 3
Factually incorrect. I provided a backward compatible, opt-in fix and it was still rejected.
Ok, let me rephrase: Fixing this in a backwards compatible way would require yet another option in the GUI and I would consider that as a new feature. I'm very much against adding yet more options unless absolutely necessary. In addition, new features can only be added in a minor release and that could only be Joomla 3.10. We decided quite some time ago (and communicated that as well) that Joomla 3.10 will only be a compatibility release to ease the migration to Joomla 4 and will not contain any additional new features. Thus this will not be fixed in Joomla 3.
I would really prefer if instead of arguing about this here, we could concentrate on Joomla 4, fix this there and finally get this release out the door.
Fixing this in a backwards compatible way would require yet another option in the GUI and I would consider that as a new feature.
Also YOU:
we added a switch in the global configuration to remove this header. #33212
One rule for us and one rule for you.... Cough Cough....
Teaches me to ever volunteer to execute a decision by the PLT.
What about writing the option directly into configuration.php - not needing a gui option? for me it is still fixing a bug and not a new feature.
I would really prefer if instead of arguing about this here, we could concentrate on Joomla 4, fix this there and finally get this release out the door.
Wouldn't we all. Now if you only replied to comments specifically addressed to you we might make some progress.
Status | New | ⇒ | Closed |
Closed_Date | 0000-00-00 00:00:00 | ⇒ | 2021-04-23 19:20:34 |
Closed_By | ⇒ | PhilETaylor |
Just a question: Is there a reason that the alias is included in id value? Does the router code need it somewhere?
Generally speaking: I personally am happy that the alias is not checked and that it doesn't matter if there is one or not or the wrong one. (I also understand all the discussions about SEO problems but know how to avoid them.)