? No Code Attached Yet J4 Issue
avatar Bakual
Bakual
21 Mar 2017

Steps to reproduce the issue

  • Install Joomla with testing data. Leave "Modern/Experimental" Router disabled.
  • Open the "Article Category List" menu item (/article-category-list.html). Leave that page open.
  • In a second tab go to the com_content Options in backend and set the "URL Routing" parameter (in "Integration" tab) to "Experimental".
  • Go back to the first tab with the open "Article Category List" and try some links there (open them in new tabs). Some of those "legacy" links will work, some (eg "Getting Started") will give you a 404.

Expected result

All links should still be parsed

Actual result

Only the "correct" links according to new router are parsed, the others are discarded.

System information (as much as possible)

Staging from 2017-03-21

Additional comments

The example article link "Getting Started" is generating a link /getting-started/19-sample-data-articles/joomla/22-getting-started.html currently with legacy router. This link actually is wrong and should be /getting-started.html since it is a direct match with a menu item. Our current code does that wrong and the new router would do it right.
However our current code is able to parse both the wrong and correct URL just fine and give the expected page.
The new router will break those "wrong" links, which imho is a quite major B/C break with a big impact on search engines and incoming links in general.
You could argue that it's an option the admin has to enable, but that is only half of the truth. With 4.0, there will no such option anymore and we will face a lot of broken links after either enabling the router option or upgrading to 4.0. Both of which I think is unacceptable. There is also no real migration path.

Now what I would expect is the following:

  • New Router is always enabled
  • If an URL can't be parsed by the new router, there is some fallback code which tries with the legacy parsing rules
  • If legacy can parse the URL successfully, the wrong URL will be added to com_redirect with the new correct URL as target and a redirect 301 will be executed. This way, we don't loose any visitors and search engines will update their links.
  • If legacy can't parse it as well, a 404 is issued of course.
  • With 4.0, we can drop the fallback code if needed (personally I would give more time) or make it optional. Admins can then choose which "old" URLs they still want to be working by simply checking in com_redirect the existing redirects.

This way, we would have 100% B/C plus an easy migration path without loosing any incoming traffic.

Without that, I think we will face our next big Joomla drama when site owners realise the fancy new router will break some of their external links and Google Webmastertools starts listing a lot of 404.

avatar Bakual Bakual - open - 21 Mar 2017
avatar joomla-cms-bot joomla-cms-bot - change - 21 Mar 2017
Labels Added: ?
avatar joomla-cms-bot joomla-cms-bot - labeled - 21 Mar 2017
avatar Bakual Bakual - edited - 21 Mar 2017
avatar brianteeman
brianteeman - comment - 21 Mar 2017

Isnt that why the new router is not the default and has a warning message? I would say that this is expected behaviour?

avatar Bakual
Bakual - comment - 22 Mar 2017

It's a known issue and apparently expected behavior by the dev. But not by the user. And it certainly doesn't mean it is the correct thing to do.

Also, with 4.0 there will be no option and no warning anymore.
Thus the admin has no real choice. He will have to break the links sooner or later and we don't help him find out which links that may be.

Seriously? That's our plan and expected behaviour? We can do better than that for sure.

avatar chrisdavenport
chrisdavenport - comment - 22 Mar 2017

For reference, from https://developer.joomla.org/development-strategy.html#backward_compatibility

6.1.8 URLs

Any change to a URL that will give a 404 (or some other error) where it previously gave a 200 is a break in backwards compatibility. However, if the change results in a redirect to a new URL (which gives a 200) then that is acceptable.

In general, if a URL is changed then provided the new URL delivers the exact same resource rendered in the same way then that is not considered to be a break in backwards compatibility. For example, changing the order of the arguments in the query part of a URL is not considered to be a break.

avatar dgrammatiko
dgrammatiko - comment - 22 Mar 2017

Any change to a URL that will give a 404 (or some other error) where it previously gave a 200 is a break in backwards compatibility

Not if the old URL was falsely 200, e.g. @Bakual 's example in the description. That's not a valid URL, current router return something valid which IS wrong! This wrong behaviour CANNOT be supported, that was one of the goals of the new router: to be a lot stricter than the loose one we currently have!
My 2c

avatar Bakual
Bakual - comment - 22 Mar 2017

Not if the old URL was falsely 200, e.g. @Bakual 's example in the description. That's not a valid URL, current router return something valid which IS wrong!

That statement isn't true. It wasn't falsely a 200. It was a valid URL generated by our current router and is correctly parsed and gives the expected result. So it is not the URL it should have generated but it is a valid URL.
The current router doesn't return "something". It returns the correct and expected page.

This wrong behaviour CANNOT be supported, that was one of the goals of the new router: to be a lot stricter than the loose one we currently have!

I can live with that as the end goal (although I think it's stupid since site owners prefer visitors and not 404s), but I don't agree with doing that without any possibility for site owners to mitigate the effects of it.

avatar dgrammatiko
dgrammatiko - comment - 22 Mar 2017

Educate people and then they will be fine. Tell them to create a sitemap of the old site, create another on when they'll upgrade to the new system. Then explain them how to connect the dots (map the old links to the new)
The tools are widely available...

Similar to this problem is the UX improvement task of the back end. If we really want to improve (and not change some colours or some paddings) then we will end up with different workflows (that end users can't even imagine, therefore user surveys are useless).
But then again I might be wrong on both, time will tell...

avatar Bakual
Bakual - comment - 22 Mar 2017

Educate people and then they will be fine. Tell them to create a sitemap of the old site, create another on when they'll upgrade to the new system. Then explain them how to connect the dots (map the old links to the new)
The tools are widely available...

Seriously??!! That's the recommended solution? Wow...

Backend is another topic. Changing workflows is fine if it is an improvement. That's not similar at all.

avatar brianteeman
brianteeman - comment - 22 Mar 2017

Have to agree that your suggestion is not a solution at all - it might be just about ok on a site with just a few pages (although that site probably wont be effected anyway) but its completely impractical to suggest to do that on a site with even a few hundred pages - never mind one with thousands

avatar dgrammatiko
dgrammatiko - comment - 22 Mar 2017

@brianteeman I'm guessing here that anyone that wants to move to the new router (is not forced to do so) understands the impact of that change.

avatar Bakual
Bakual - comment - 22 Mar 2017

(is not forced to do so)

We will be forcing it with 4.0. It's not optional at all.

avatar mbabker
mbabker - comment - 22 Mar 2017

Any plan which mandates that the current broken URLs that get accepted by the routing system is in my eyes not a valid plan. By that logic I can craft the URL of https://www.joomla.org/announcements/6-joomla-leadership-team.html which results in a 200 response, gives me exactly the body content that I'm looking for (even if it is now wrapped an the incorrect category/menu configuration), and therefore by your argument must continue to work or automagically redirect. Even funnier is this isn't a URL that will ever get generated within the Joomla application but if you know anything about how wonky the current router is you know exactly how to craft URLs in such a way to get mixed pages like this which just work.

Sooner or later we have to cut the technical debt and we have to address some of the underlying issues users have with the routing system. One of the most frequent groans is people manage to get "duplicate content" because there are a plethora of URLs you can use to get to a page if you know what you're doing (https://www.joomla.org/component/content/category/6-joomla-leadership-team.html is another perfectly valid mutation of the leadership page but again wrapped with the wrong menu data). We need to stop having a system that allows you to mutate the URL structure and land on a valid page, this system moves in that direction.

Yes, it does mean that users will require additional education and additional work to validate their links. Yes, I get this is not optimal user experience. But short of always supporting routing what are very obviously FUBAR URLs to the right content within our code, there is no fix for that.

avatar brianteeman
brianteeman - comment - 22 Mar 2017

for the record i have absolutely no issue with making urls that "work" today but cannot be "generated" by Joomla no longer work

avatar Bakual
Bakual - comment - 22 Mar 2017

@mbabker Michael, I'm not saying to keep the old URLs working forever. I just want to have a way site admins realistically can redirect the old URLs to the new ones without having to manually add all of them.

One of the most frequent groans is people manage to get "duplicate content" because there are a plethora of URLs you can use to get to a page

That's actually a misunderstanding from the people about what "duplicate content" is. Google has no issue with multiple links pointing to the same content as long as it's on the same domain.
But as said, it's fine for me to get there where only one valid URL exists for a given page. I just don't agree on the path which is currently taken to get there (because there is no path).

But short of always supporting routing what are very obviously FUBAR URLs to the right content within our code, there is no fix for that.

There is a way to temporary keep supporting the "FUBAR URLs", collecting them and leave it to the admin to decide which to drop and which to keep after the legacy support has been dropped (eg in 4.0).

avatar peteruoi
peteruoi - comment - 22 Mar 2017

As i see it there is no problem with joomla 3.7 as the new router is optional.
Can an official joomla link migrator be developed for joomla 4? I 'm certainly no expert to say if it is possible with our current router mistakes, but if it is possible, could a link migrator automatically create 301 redirects with our component redirect???

avatar mbabker
mbabker - comment - 22 Mar 2017

They could not be collected and an admin be told there are URLs not valid with the new system. That's not how it works and trying to do that WOULD be a B/C break. To use com_redirect in that way would require throwing a 404 on what is currently a URL responding with a 200. Or you are suggesting to just automagically dump all valid legacy URLs into com_redirect with zero notification to anyone (which would be a massive change in behavior and user expectation because right now the component only collects 404 URLs or has items that are manually input).

avatar Bakual
Bakual - comment - 22 Mar 2017

but if it is possible, could a link migrator automatically create 301 redirects with our component redirect???

That's what I suggested in the initial issue description. But done in 3.7 and not in 4.0.

which would be a massive change in behavior and user expectation because right now the component only collects 404 URLs or has items that are manually input

Yes, it would be a change in behaviour since we collect the 404s before they happen, at a time where we actually still could say what the correct target is.
If you see an issue with that, make it optional. I don't see that as an issue.

avatar mbabker
mbabker - comment - 22 Mar 2017

The link migrator can't be done. Because there isn't a master list of all the URLs a site is accepting anywhere thanks to the glorious FUBAR behaviors of the current router, which as demonstrated allows you to mutate URLs (or in some cases will build them itself because of the oh so glorious FUBAR routing system) which results in "expected" content being displayed incorrectly. So a collection of bad URLs can only be compiled at runtime. Which by system behavior means that the URLs must 404 before they will automatically be collected into the redirect component or we will be introducing new black magic behaviors into a component and no notification to site owners about this.

avatar Bakual
Bakual - comment - 22 Mar 2017

sigh...

avatar rdeutz
rdeutz - comment - 22 Mar 2017

What you need to do is to check if the URL could be created by the system. If someone fooled the system and created a URL that works because of a bug/simplification then it is ok when this URL doesn't work in the new system. This will be the only a low number, but we need a solution for the majority of old URLs for a period of time.

avatar franzpeter
franzpeter - comment - 22 Mar 2017

Sorry, I am not a coding specialist in case of Joomla, but could something like that help to solve the problem: a crawler, which automatically crawls all pages to get even those false correct pages, take the results and gives the correct rewrites?

avatar franzpeter
franzpeter - comment - 22 Mar 2017

The only problem would be how to detect those false correct pages, it would need to crawl twice. First with the standard router, give the experimental router the results to try to route and if 404, detect that it needs a rewrite.

avatar wilsonge
wilsonge - comment - 22 Mar 2017

https://github.com/wilsonge/joomla-cms/tree/com-router-legacy-rule This rule will parse legacy URLs with the new structure (however it does not validate intermediate segments - this means that /getting-started/19-sample-data-articles/joomla/22-getting-started.html parses, but so does /getting-started/19-sample-data-articles/lalalalalalalal/joomla/22-getting-started.html which from my discussions with the SEO team was one of Joomla's biggest routing issues from an SEO perspective.

It's only com_content by example and doesn't do the redirect logic - but I'm sure you guys can figure out how to do the redirect logic and whilst each router treats this kind of link specially you can figure out how to make it work :)

avatar Bakual
Bakual - comment - 23 Mar 2017

Last night I was thinking about an approach where we would add a temporary argument $forceLegacy to JRouter::parse() which would then override the legacy/experimental parameter setting.
With that, we could put code into the redirect plugin which in case of a 404 would try to parse the route again with that $forceLegacy enabled. If that parse results in a valid URL, it would do the redirect and add the entry to the com_redirect table. Next time that URL is called, the regular redirect function would take care of it.
We can of course add a new parameter to the plugin to control that behaviour.

This way, the code would be in a central place and no coupling of the component routers to com_redirect.

avatar wilsonge
wilsonge - comment - 23 Mar 2017

It's cleaner but you can't do it as a temporary measure and keep the interface

avatar Bakual
Bakual - comment - 23 Mar 2017

I probably don't understand the sentence. With temporary I mean we could add that argument with 3.7 and deprecate it right away again for 4.0 when the legacy routing is removed (the argument is at least useless at that point).

avatar wilsonge
wilsonge - comment - 23 Mar 2017

https://github.com/joomla/joomla-cms/blob/staging/libraries/cms/component/router/interface.php As in you'd need to break this interface. Which would mean extensions couldn't have an implementation that supports J3 and J4 at the same time

avatar Bakual
Bakual - comment - 23 Mar 2017

You can't add it to the interface, that's true.
But as far as I know the component routers could have that additional optional parameter both in J3 and J4. It would still satisfy the interface. In J4 it will just be a useless argument which will be never called.

avatar wilsonge
wilsonge - comment - 23 Mar 2017

Ahh I didn't think you could. But we're doing that in JTable so I'm wrong. That could work

avatar chriswagner0815
chriswagner0815 - comment - 23 Mar 2017

Dear colleagues -
thank's for raising all these issues. At the code sprint on Monday, the SEO team also met in Amsterdam, discussing with developers about the router issues raised. We have heard you all and we are reading what you write.

We are currently in the process of creating a document and a video with a project example (which we hope will be done by mid to end next week). We are also going to address how it needs fixing, why it needs fixing and what additional router features we would like to see from a technical SEO perspective.

We hope, that everyone can see the good in the community starting this process and we understand that there is guidance and information required from us.

Please give us the time to provide you with what we feel is needed. Let's all help in moving this forward.

And again: doing SEO for a living, I cannot even begin to tell you, how glad I am that we start working on these issues!

Kind regards
Christopher Wagner
Team Lead Joomla Optimization Team

avatar brianteeman brianteeman - change - 23 Mar 2017
The description was changed
Status New Closed
Closed_Date 0000-00-00 00:00:00 2017-03-23 20:07:54
Closed_By brianteeman
avatar brianteeman brianteeman - close - 23 Mar 2017
avatar wilsonge wilsonge - change - 23 Mar 2017
Status Closed New
Closed_Date 2017-03-23 20:07:54
Closed_By brianteeman
avatar wilsonge wilsonge - reopen - 23 Mar 2017
avatar wilsonge
wilsonge - comment - 23 Mar 2017

This is an issue that's going to need to be solved for the 3.8 router - we should leave this open as it's a genuine issue that the 3.8 team need to solve

avatar piotr-cz
piotr-cz - comment - 24 Mar 2017

I know that this is not an all case workaround, but today I've written this .htaccess rules which could be helpful to someone

# J3.7 Advanced router - `Remove IDs from URLs` set to `Yes`
RewriteCond %{REQUEST_URI} ^(.*)/(\d+)-([^/]+)$
RewriteRule ^ %1/%3 [L,QSA,R=301]
avatar dgrammatiko
dgrammatiko - comment - 24 Mar 2017

@piotr-cz can you elaborate what that code does? (I m not familiar with apache)

avatar piotr-cz
piotr-cz - comment - 24 Mar 2017

@dgt41 It redirects URL with in format [menualias]/[ID]-[articlealias] to URL without ID [menualias]/[articlealias] and sets response status to 301 (Moved permanently).

I've put together that code when preparing a site for J3.7 migration so google doesn't freak out crawling the site and visitors don't get 404 pages until Google re-crawls the site.

However this is rather a band-aid then proper fix

avatar franz-wohlkoenig franz-wohlkoenig - change - 30 Mar 2017
Category Router / SEF
avatar chriswagner0815
chriswagner0815 - comment - 2 Apr 2017

Dear colleagues,
we are still working on the router project seo team internally and do not have everything together because non-voluntary work needed our attention the last week.

Once we know when we can continue, we will provide you with a new deadline.

Please apologize the inconvenience - we are on it!
Chris

avatar franz-wohlkoenig franz-wohlkoenig - change - 2 Apr 2017
Priority Medium Urgent
Status New Discussion
avatar Ruud68
Ruud68 - comment - 14 Sep 2017

Hi,
just adding to the discussion.

I agree that when people select [experimental] they should know that there are bound to be issues. So when there is no benefit for the user for switching to the experimental router they will not.

Problem is: there IS a benefit: the 'Remove IDs' function: that is what people have been waiting for for a long time :)

The main issue is that when somebody visits your site from the Google SERP / Tweet / FB Share from a link created with the stable router, they will get a 404 Page not found. That is logical because the URL has changed (no article ID in the new url). You can use a tool to handle that.

The 'lesser' issue that you CANNOT handle is what I call 'menuless articles': these are articles which are in categories that have no menu entry, or articles that have no menu item for themselves.
Ever seen a 'strange' url pop up in Google webmaster tools like: https://[your-domain]/8-site/algemeen/46-register?

This url is an example of a menuless article and believe me: they are present in the Google index and after switching to the 'Experimental' router they will produce a 404.
Problem is: there is no correct / working SEF url for these articles with the experimental router! The links created with the 'Experimental' router are NON-SEF.

In my example: https://[your-domain]/about/?view=article&id=46:register is the build link to the article in the 'about' page (with experimental router enabled).

So what the user should do BEFORE updating to the new Experimental router to benefit from the Remove IDs feature is:

  1. add a the following lines to your sites robots.txt telling google (et all) NOT to grab urls starting with a number, this will remove the OLD non-routable URLs from the google index:
    Disallow: /0
    Disallow: /1
    Disallow: /2
    Disallow: /3
    Disallow: /4
    Disallow: /5
    Disallow: /6
    Disallow: /7
    Disallow: /8
    Disallow: /9
  2. create menu type category blog for ALL the categories that have NO (hidden) menu item set: This way, the experimental router CAN create a UNIQUE SEF url for the articles in these categories
  3. create redirects for all stable urls to the Experimental urls (with a tool)

or there should be a change in the experimental router to also handle 'menuless' articles with a SEF url.

I think that we have to address this issue either in the experimental router or with the above described workaround.
Better now or with version 3.9 as this will fore sure block (at least for me and my customers sites) upgrading to 4.0

Just my input and thoughts :)
regards,
Ruud.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/14848.

avatar brianteeman
brianteeman - comment - 3 Oct 2017

@mbabker Is this still an issue
#14848 (comment)

avatar mbabker
mbabker - comment - 3 Oct 2017

Nobody's touched it so yes, still something valid to look at. Though personally I'd still stick with the stance of "when migrating to a newer option, crawl your site in a dev environment pre- and post- change and handle the redirects yourself versus relying on the CMS to get it right".

avatar rfmjoe
rfmjoe - comment - 4 Oct 2017

Hello,
As an SEO-expert working 99% with joomla cms for my clients I totally agree with the approach @Bakual has suggested.
I am new to this, so i might miss some new approaches on this. But currently these issues need to be adressed:
Joomla 3.8:

  • No matter if the new router (experimental) or old router is enabled - Links that cannot be found anymore or have changed, should always give back a 404-status code to the browser/google. Currently there joomla just displays a blank page. This has changed from joomla 3.7.5 to joomla 3.8. Why? - With joomla 3.7.5 404-error codes have properly returned.

  • Besides that, i was really disappointed when i installed joomla 3.8, activated the new experimental router and found out, that all the old URLS are not redirected to the new ones?! What the heck? You cant really leave that issue to the user site and tell them to handle all the redirects by themselves??
    This can crush a site and wipe it from the google index completely. Why is it a problem to implement an additional option when the experimental router is activated that says: Automaticall redirect old SEF-URLs to the new URLS? (Option YES / NO) - then the user can decide if he want to manually correct the urls or have joomla do the job.

these are my thoughts on this,
also, i d like to know what`s the current approach to this for the newx 3.8.1 update? if there is any?

greetings,
joe

avatar rfmjoe
rfmjoe - comment - 4 Oct 2017

also, i think that this approach Bakual mentioned is the best workaround seo-wise until a REAL solution is found:
#14848 (comment)

avatar chriswagner0815
chriswagner0815 - comment - 4 Oct 2017

Hello rfmjoe -
I agree with you.

Chris

avatar brianteeman
brianteeman - comment - 4 Oct 2017

@rfmjoe @chriswagner0815 Such a shame that neither of you "experts" didnt test this during the very extensive pre-release period

Re point 1. Please can you give a reproducible example of a url that now returns a blank page
Re point 2. Please can you give a reproducible example of a url that is not "redirected"

avatar chriswagner0815
chriswagner0815 - comment - 4 Oct 2017

Hey Brian,
we have tested the new routing with an existing project. What it does is remove the IDs of the URLs without 301 redirecting to the new URLs. I have been told multiple times, that 301 redirects are not something that will be put in because:

a) webmasters should do that themselves
b) it is too hard to do codewise

Its not like I havent said that to the involved - in particular also Michael. We had created an advisory like this for Magazine but I dont know if it is online already, there also is one issue that Rowan wanted to think through:

If you have category menu items for your categories you can simplify the htaccess redirects like this
RewriteRule ^CATEGORY/[0-9]{1,6}-(.)$ DOMAIN/CATEGORY/$1 [L,R=301]
(For NGINX use: rewrite ^/CATEGORY/[0-9]{1,6}-(.
)$ /DOMAIN/CATEGORY/$1 permanent;)

Where CATEGORY is your category menu item alias and DOMAIN is the full path to your domain i.e. https://www.example.com

So for example if you have a category alias path of blogs and your domain is https://www.example.com the line would look like this
RewriteRule ^blogs/[0-9]{1,6}-(.)$ https://www.example.com/category/blogs/$1 [L,R=301]
Or if you have a category alias path of /blogs/penguins
RewriteRule ^blogs/penguins/[0-9]{1,6}-(.
)$ https://www.example.com/category/blogs/penguins/$1 [L,R=301]

I have specifically mentioned 301 redirects in all guidelines I have created and in everything I have said because another "migration catastrophe" could be dangerous for us. This is and has been known to the involved.

It would be nice, if the "experts" as you say it, would be heard when they talk about whole sites possibly losing their rankings.

Have a nice day
Chris

avatar brianteeman
brianteeman - comment - 4 Oct 2017

Re point 1. Please can you give a reproducible example of a url that now returns a blank page
Re point 2. Please can you give a reproducible example of a url that is not "redirected"

avatar chriswagner0815
chriswagner0815 - comment - 4 Oct 2017

Hey Brian,
since this is enabled on a test domain, we would have to chat in private. Hence last time you told me, that you won't, I really don't see a way in helping you out on that one.

Should you reconsider, please let me know!
Big hug
Chris

avatar brianteeman
brianteeman - comment - 4 Oct 2017

The reason for insisting that this is in public is so that everyone can see the issue and comment.

If it is such a massive problem then surely you would want everyone to be aware of it so that everyone can help to fix it. Thats what open source is all about.

If it is such a massive problem then it must be easy to replicate on any domain or test site.

avatar chriswagner0815
chriswagner0815 - comment - 4 Oct 2017

Hey Brian,
Rowan just told me that on CJO, the issue with the redirects is fixed with the solution above.

Regarding everything else, I could only provide excel sheets and since we have no 301 redirects by default (look at the code) you see the issue.

Hope that helps
Chris

avatar brianteeman
brianteeman - comment - 4 Oct 2017

Nope doesnt help at all in seeing either of the two issues you mention.

avatar chriswagner0815
chriswagner0815 - comment - 4 Oct 2017

Hello Brian,
in that case, I really don't know how to help you :( You could simply have a look at the code!
Chris :)

avatar rfmjoe
rfmjoe - comment - 4 Oct 2017

Hello brian,
ok, will do that for you. i use one of my sites as example.
https://www.seo-webdesign.wien (joomla 3,8, experimental router enabled, removie IDs enabled)
NEW SEF-URL with new router:
https://www.seo-webdesign.wien/aktuelles/joomla-version-3-8-ist-da-und-bringt-neues-sef-routing-fuer-suchmaschinenfreundlichere-urls

if i`d enable the STABLE router, this SEF-URL would be rendered as:
https://www.seo-webdesign.wien/aktuelles/43-joomla-version-3-8-ist-da-und-bringt-neues-sef-routing-fuer-suchmaschinenfreundlichere-urls

now, assume google has indexed the old SEF-URL (Joomla 3.7.5 and before). now i come and activvate the new router. Guess what happens now if a user enters this url? - blank page. no redirect. no 404 status code to browser. try it:
https://www.seo-webdesign.wien/aktuelles/43-joomla-version-3-8-ist-da-und-bringt-neues-sef-routing-fuer-suchmaschinenfreundlichere-urls

solution:

  • automatically redirect (301) old urls to the new urls (htaccess rewrite for example). otherwise, google penalty incoming for larger sites who switch from old router to new router with joomla 3.8+.

let users redirect those old urls by hand is not an option. if you ignore this issue, i see big problems coming to joomla sites in the future.

to be clear:
it is NOT a problem that the NEW router can`t parse the old URLs! The problem is the kombination of:

  • old urls
  • new router + IDs removed
    result: old urls RETURN a blank page (point 1) and are not redirected to the new URL (point 2).

there is no problem (at least havent found any) if you activate the new router and you disable the remove ID from URL option.

hope this helps,
joe

avatar mbabker
mbabker - comment - 4 Oct 2017

A old URL returning a blank page is indicative of a PHP error. Any time you get a blank page is a hidden PHP error. Turn on your error reporting to find out what is going on there. Either way, the issue you are seeing with that is NOT related to the fact that old URLs cannot be parsed.

There is only so much that Joomla can be expected to do as it relates to redirect management. Given the way the API is written, it is not a simple solution to just try to parse a URL with every possible routing configuration and redirect if something matches. The only reliable way to handle redirects is to have a master list of the existing valid URLs prior to making a configuration change, making the configuration change, and reviewing the URL list afterwards to find discrepancies. URL management (including migrations) is not a task that a site owner should leave 100% to the platform they are working with, they should be actively involved in reviewing the sitemap and managing redirects as well as using the tools the platform offers to assist with that.

Nobody has downplayed this issue, contrary to what some might want to say here. But the fact of the matter is nobody has put any real effort into addressing the problem, people just come here and say the problem exists and that we must automatically redirect some URLs based on some set of parameters. As a 100% volunteer open source project, we need people helping to work on a solution and not just continuing to post comments saying the problem exists, otherwise there will be no solution.

avatar brianteeman
brianteeman - comment - 4 Oct 2017

@rfmjoe Thank you for supplying the link

As @mbabker says a blank page is an error message in disguise. If you set error reporting to Development in global configuration and then try again you should get an error message either on the screen or in your logs. If you can do that and post back the results then we can help.

@chriswagner0815 I fail to understand why you refuse to help

My own test results on a live site (now reverted)

Old router url

https://example.com/community/lifecycle/49-marriage

New router with id url

https://example.com/community/lifecycle/49-marriage

New router with no id url

https://example.comk/community/lifecycle/marriage

Result of using old url 404

I

avatar rfmjoe
rfmjoe - comment - 4 Oct 2017

hey there brian,
thanks, i enabled maximum error and also tried another template. as it seems the missing error 404-code is template-related. when i enabled the beez3 template, a 404-error is correctly displayed.

these are the errors of my active template that prevents a correct 404-error:
Warning: require_once(/usr/www/users/rauschi/seo-webdesign/libraries/joomla/document/html/renderer/head.php): failed to open stream: No such file or directory in /usr/www/users/rauschi/seo-webdesign/templates/cloudbase3/error.php on line 78

Fatal error: require_once(): Failed opening required '/usr/www/users/rauschi/seo-webdesign/libraries/joomla/document/html/renderer/head.php' (include_path='.:/usr/local/lib/php/') in /usr/www/users/rauschi/seo-webdesign/templates/cloudbase3/error.php on line 78

thanks,
joe

avatar csthomas
csthomas - comment - 4 Oct 2017

@brianteeman
Your example present when old router url == new router with id url.

Then an issue is between new routing with id and without it.
Do you thing that new routing with removed id should return correct page for: https://www.sinaileeds.uk/community/lifecycle/49-marriage ?

avatar brianteeman
brianteeman - comment - 4 Oct 2017

@rfmjoe now you can see why I asked the questions that I did. It wasnt to be awkward but to show that the issue you had with the blank page was nothing to do with the router

@csthomas Personally yes I do but what do i know about routing ;)

avatar rfmjoe
rfmjoe - comment - 4 Oct 2017

@brianteeman yes, blank page issue is clear for me now.
the other issue (redirect old urls to new ones) is still an issue as described in the original post by Bakual.

avatar csthomas
csthomas - comment - 4 Oct 2017

This is a simple fix for Brian example:

diff --git a/components/com_content/router.php b/components/com_content/router.php
index 4957dc0170..4b06e943b1 100644
--- a/components/com_content/router.php
+++ b/components/com_content/router.php
@@ -219,7 +219,24 @@ class ContentRouter extends JComponentRouterView
                                ->where('catid = ' . $dbquery->q($query['id']));
                        $db->setQuery($dbquery);
 
-                       return (int) $db->loadResult();
+                       $id = (int) $db->loadResult();
+
+                       if ($id === 0)
+                       {
+                               $alias = explode('-', $segment, 2);
+
+                               if (isset($alias[1]))
+                               {
+                                       $dbquery->clear('where')
+                                               ->where('alias = ' . $dbquery->q($alias[1]))
+                                               ->where('catid = ' . $dbquery->q($query['id']));
+                                       $db->setQuery($dbquery);
+
+                                       $id = (int) $db->loadResult();
+                               }
+                       }
+
+                       return $id;
                }
 
                return (int) $segment;

The same could be done for getCategoryId.

But the question is, do you want such improvement? optionally?

avatar mbabker
mbabker - comment - 4 Oct 2017

The fix is not that simple. That may make the legacy URL parsable, but how does the system know that URL should be a 404 or 301 with a different set of routing configurations enabled? Without that part of the equation (George wrote a parser rule too months ago), this can't go anywhere. And no, API changes like changing method signatures are not an option, especially when there is an interface involved (because if you can't rely on the interface why bother with it?).

avatar csthomas
csthomas - comment - 4 Oct 2017

I only suggest that URL from new routing with url id could be parsable (optional) on new routing without url id.

Decision about 404 or 301 can be made later in plugin.

Hannes gave an example of plugin plgSystemSeoRedirect at https://groups.google.com/forum/#!topic/joomla-dev-cms/RWya-5Gcvlg

Personally I made similar plugin for com_content and com_tags. It fixes all weird/doubled urls at J3.7 and use redirect 301 or add canonical link.

Joomla 3.x could have an option to parse all older versions of links.
If administrator checks all options (support old routing, support url with id on routing without id), then after URL parsing, the plugin makes a decision.

It will slow down the system but give us a time when J4 will be released without supporting the old routing.

avatar mbabker
mbabker - comment - 4 Oct 2017

For things to work right, what has to happen is that the router tries to parse things like normal, throw the 404, then in a plugin it decides if it should attempt to reparse the URL with another configuration. If so, the plugin should handle issuing the 301 required, otherwise it should fall back to whatever 404 handling is in place otherwise.

So, the router may need to be able to parse the old URLs, but it absolutely cannot be enabled by default otherwise that parsing will cause URLs that should 404 based on the configuration change to be a 200 and that's just going to make things even worse than they are now.

avatar JimJGitHub
JimJGitHub - comment - 21 Nov 2017

#18771

Home page wrong url when home is a featured articles menu

avatar Ruud68
Ruud68 - comment - 22 Nov 2017

I have been looking at the redirect of old to new URLs from different perspectives.
I concluded (for myself) that this is a one time 'issue': when 'migrating' from the old URL to the new URL.
With my 'developer hat' on, I want to automate things as much as possible but for me the cost for creating that code / testing it and maintaining it would only make sense if it was NOT a one time issue for a site.

So I have followed the following approach for my and my customer's sites: I have added a function to my toolbox that will create an overview of the old URL and the new URL. This overview you can copy and paste into the build-in com_redirect component: problem solved. Both for Search Engines as for back-links.

What it does under the hood is switch on the stable router (with id), create URLs for all articles with JRoute, switch on the experimental router with ID turned of, create all the URLs again. It also handles URLs for multiple languages.

selection_250

avatar infograf768
infograf768 - comment - 22 Nov 2017

Could we test this?

avatar Ruud68
Ruud68 - comment - 23 Nov 2017

Sure @infograf768 if you sent me a mail (in my profile), I will sent you a downloadlink

avatar infograf768
infograf768 - comment - 23 Nov 2017

It looks like working.
Note: One anyway has to still create hidden menus of the type All Categories for each language and each component, specially when using featured menu items.

@Ruud68
Do you mind if I share this component with the Maintainers group?

avatar Ruud68
Ruud68 - comment - 23 Nov 2017

@infograf768 sure, be my guest :)

avatar brianteeman
brianteeman - comment - 2 Jan 2018

Dear colleagues,
we are still working on the router project seo team internally and do not have everything together because non-voluntary work needed our attention the last week.
Once we know when we can continue, we will provide you with a new deadline.
Please apologize the inconvenience - we are on it!
Chris

Was there ever an update on this from the router project seo team?

avatar brianteeman brianteeman - change - 25 Mar 2018
Labels Added: J3 Issue
avatar brianteeman brianteeman - labeled - 25 Mar 2018
avatar danielmreck
danielmreck - comment - 1 Oct 2018

Hello all,

As I have started working on preparing sites to be migrated toward version 4.0, the URL router breaking legitimate legacy links has been a real concern. Requiring site administrators to convert hundreds or thousands of URLs without an easy tool integrated into Joomla 4.0 is an invitation to for them move away from Joomla as a platform.

I agree with the router dev team that we should return 400-series errors on malformed URLs that would work in the legacy router:

https://example.com/my-made-up-garbage/123-real-content

should not resolve to "Real Content" with article ID 123, but should instead return a 400-series error.

 
However, we really should be taking into account legit URLs generated by the legacy router and reroute them to their new destinations:

https://example.com/my-real-menu-item/123-real-content

should receive a 301 redirect to the URL generated by the modern router, such as:

https://example.com/my-real-menu-item/real-content

 
It would seem to me that the best place to accomplish this is within the modern router, which would allow us to weed out other fake legacy URLs like:

https://example.com/my-real-menu-item/000-real-content

The modern router would look up the true ID for "Real Content" and know that 000 is incorrect (in addition to being invalid anyway).
 

If this is not going to be super-easy in Joomla 4.0, then our user base will either migrate to another platform or turn to inadequate .htaccess rules such as this:

RewriteCond %{HTTP_HOST} ^example.com$
RewriteRule ^(.*)\/[0-9]{1,6}-(.*)$ "https\:\/\/example\.com\/$1\/$2" [L,R=301]

# Based on @chriswagner0815's reply on October 4, 2017

This will correctly handle rewriting URLs that contain an ID after the last slash, but will choke on IDs appearing earlier in the URL, such as categories that are not assigned to menu items. For sites with a lot of legacy links pointing at them, this could generate a significant load on the server that could be eliminated if the modern router could just accept the correctly-formed legacy links.

 
Thanks again to everyone who has put in so much work on this!

avatar Ruud68
Ruud68 - comment - 3 Oct 2018

Hi, I once wrote a routine to handle this via the build in com_redirect plugin (#14848 (comment))

Currently working on a site with 60K of articles that need to drop the ID from the url, so now I am looking if com_redirect is a viable option (performance wise). It would have to import 60K of redirect rules :s Not sure what this will do on site performance.

Doing it with the redirect rules is only a temp solution because when all the search engines have visited they get the 301 and update their indexes. The rules then only come into affect when an old link (via e.g. facebook / twitter / email/ etc.) is followed. So the performance hit should not be that big.

avatar codeacade
codeacade - comment - 4 Sep 2019

I agree with the router dev team that we should return 400-series errors on malformed URLs that would work in the legacy router:

https://example.com/my-made-up-garbage/123-real-content

should not resolve to "Real Content" with article ID 123, but should instead return a 400-series error.

It is more vicious than you think - it will resolve even with any random content after article ID:

https://example.com/my-made-up-garbage/123-q

I suffer same problem with my Joomla 3.4 and I can't see any working solution for that. I used to have some redirection plugin running but got into trouble with it (hacked), now I wait for Joomla version with problem solved.

avatar jwaisner jwaisner - change - 18 Mar 2020
Priority Urgent Medium
Status Discussion Confirmed
avatar brianteeman
brianteeman - comment - 23 Aug 2022

#32923 (comment)

Thank you for raising this issue.

Joomla 3 is now in security only mode with no further bug fixes or new features.

As this issue doesn't relate to Joomla 4 it will now been closed.

If we are mistaken and this does apply to Joomla 4 please open a new issue (and reference this one if you wish) with updated details for testing in Joomla 4.
cc @zero-24

avatar zero-24 zero-24 - change - 23 Aug 2022
Status Confirmed Closed
Closed_Date 0000-00-00 00:00:00 2022-08-23 13:50:52
Closed_By zero-24
Labels Added: No Code Attached Yet
Removed: ?
avatar zero-24 zero-24 - close - 23 Aug 2022
avatar jwaisner
jwaisner - comment - 12 Sep 2022

Reopening this as it has been replicated by other users in J4. Relabeling and having the JBS check into it.

avatar jwaisner jwaisner - change - 12 Sep 2022
Status Closed New
Closed_Date 2022-08-23 13:50:52
Closed_By zero-24
avatar jwaisner jwaisner - reopen - 12 Sep 2022
avatar jwaisner jwaisner - change - 12 Sep 2022
Labels Added: J4 Issue ?
Removed: J3 Issue
avatar jwaisner jwaisner - labeled - 12 Sep 2022
avatar jwaisner jwaisner - labeled - 12 Sep 2022
avatar jwaisner jwaisner - unlabeled - 12 Sep 2022
avatar brianteeman
brianteeman - comment - 12 Sep 2022

If we are mistaken and this does apply to Joomla 4 please open a new issue (and reference this one if you wish) with updated details for testing in Joomla 4.

!!

avatar jwaisner jwaisner - change - 10 Oct 2022
Labels Removed: ?
avatar jwaisner jwaisner - unlabeled - 10 Oct 2022
avatar Hackwar Hackwar - change - 18 Feb 2023
Labels Added: ?
avatar Hackwar Hackwar - labeled - 18 Feb 2023

Add a Comment

Login with GitHub to post a comment