?
avatar xtech86
xtech86
13 Oct 2016

Steps to reproduce the issue

Look for an article ID that is off a published article and copy it for this example mine is: 4, in the front end of your site with SEF enabled as:

Search Engine Friendly URLs - Yes
Use URL Rewriting - Yes
Adds Suffix to URL - Yes
Unicode Aliases - No

Visit any menu item on your frontend. For example, I have a menu item: Article 1, SEF: url:

http://www.mydomain.com/article-1.html

Great it works! now get that article ID you previously had and add it just after the / after your domain so:

http://www.mydomain.com/4article-1.html

Expected result

Error 404, since the page doesn't exist as a menu item and we shouldn't be automatically finding articles.

Actual result

Loads an article if the id exists of that article exists.

System information (as much as possible)

CentOS 6, apache 2.4, php5.6/7

Additional comments

This is causing a major headace for us from an SEO perspective where the client was adding reference ids to the start of their SEF url and now we have urls ranking which shouldn't exist.

Video of the issue:

https://www.dropbox.com/s/a0ic7nkcwd12p68/joomla_sef_bug.mov?dl=0

I have also tested this on J3.6.3.RC3, it's caused by the com_content router, which is just looking for the ID it should also be verifying the Alias of the article/category.

avatar xtech86 xtech86 - open - 13 Oct 2016
avatar xtech86 xtech86 - change - 13 Oct 2016
The description was changed
avatar xtech86 xtech86 - edited - 13 Oct 2016
avatar xtech86 xtech86 - edited - 13 Oct 2016
avatar zero-24
zero-24 - comment - 13 Oct 2016

@xtech86 The is a known issue that should be fixed with 3.7 please try the last Nightly of 3.7 from here: https://developer.joomla.org/nightly-builds.html

and enable the modern and remove id option for com_content. (you can find that in the options of com_content)

avatar zero-24 zero-24 - change - 13 Oct 2016
Labels Added: ?
avatar xtech86
xtech86 - comment - 13 Oct 2016

Thanks @zero-24, I'm already testing 3.7 out :-) Just couldn't see anything relating to J3.6 when searching previous issues.

--Update--
However, tests with J3.7 are not great, they still resolve them all to the homepage and should instead produce a 404.

avatar zero-24
zero-24 - comment - 13 Oct 2016

To be sure did you enable the option in com_content to use modern routing and remove the ID?

avatar brianteeman
brianteeman - comment - 13 Oct 2016

However, tests with J3.7 are not great, they still resolve them all to the homepage and should instead produce a 404.

That sounds like an issue with either htaccess or the template - please test with the defaults

avatar xtech86
xtech86 - comment - 13 Oct 2016

@zero-24 yes exactly

@brianteeman I'm afraid not, disabling the new router indeed returns a Joomla! 404 as per normal.

avatar xtech86
xtech86 - comment - 19 Oct 2016

@zero-24 @brianteeman Issue is present on my install tested again just now. See below video:

https://www.dropbox.com/s/4by0jqu9qcm1rat/Joomla%213.7_sef_error.mov?dl=0

avatar zero-24
zero-24 - comment - 19 Oct 2016

@hackwar can you take a look here?

avatar Hackwar
Hackwar - comment - 19 Oct 2016

There is no code in the new router right now, that throws a 404 for such URLs. Throwing a 404 is not as easy as it sounds, since you would have to decide when a 404 should be thrown and where. Should we throw it in the component router? But what about plugins that process the URL after the component router? Should we throw it in the application router, right at the end? (You could write a small plugin that adds its own router rule and does that) should it throw a 404 only when there are path elements left or when there are unknown query elements? The later is another issue, since there is no way for us to know which query elements would be valid or not.

avatar xtech86
xtech86 - comment - 19 Oct 2016

@Hackwar But then you are at square one in terms of the original issue I was having. You are still forcing the user to 1 page and not changing the URL. Thus this becomes an SEO nightmare again, since you will be having duplicate content on multiple pages as far as Search Engines are concerned, only this is much worse since you are saying the homepage exists for all these urls.

If we can route to the homepage, surely we can issue a 404?

I'm not familiar with the code structure of the SEF router, but I think this needs to be addressed.

avatar Hackwar
Hackwar - comment - 19 Oct 2016

We are not routing to the homepage. We are falling back on the default menu item (=homepage). As I wrote, you can easily create a plugin that implements the behavior that you are looking for, but since we need to be backwards compatible, we simply can not role that out to all sites out there. Developers write crap and such a change would break lots and lots of sites. The code would look something like this:

class plgSystemThrow404 extends JPlugin
{
    public function onAfterInitialise()
    {
        $app = JFactory::getApplication();
        if (!$app->isSite()) return;
        $app->getRouter()->attachParseRule(array($this, 'throw404'), JRouter::PROCESS_AFTER);
    }

    public function throw404(&$router, &$uri)
    {
        if ($uri->getPath() != '' && $uri->getPath() != 'index.php')
        {
            throw new Exception(404, 'Not found');
        }
    }
}

This is just from the top of my head and not tested.

avatar xtech86
xtech86 - comment - 19 Oct 2016

Thanks @Hackwar I fully appreciate that. But with such a version jump to me it seems crazy to still allow the new structure which needs to be switched on anyways. If it falls back to the homepage I don't understand why we can't fallback to a 404 instead?

avatar Hackwar
Hackwar - comment - 19 Oct 2016

The routing works kinda like this:
1. Take the default menu item and set it as the current request
2. Run the URL through the different steps and whenever we parse a date from the URL, we store it in the current request, overwriting the data from the default menu item one by one.

Now, proper routing code would parse something from the path-part of the URL and remove it and add the parsed data as query elements, but as I wrote, developers are stupid, lots and lots and lots of people don't do that and thus in the end we might have a perfectly valid URL still with a populated path. If we throw a 404 now, you are preventing people from reaching those parts of your site. It is simply something that has to be done on a per-site-basis.

avatar brianteeman brianteeman - change - 23 Oct 2016
Category Router / SEF
avatar Hackwar
Hackwar - comment - 4 Mar 2017

Since the discussion seems to have walked its course and the solution has been stated here and in other issues, can we close this one?

avatar brianteeman
brianteeman - comment - 10 Mar 2017

@zero-24 this can probably be closed as we have another bigger Issue regarding the inability of the new router to create a 404

avatar tonypartridge tonypartridge - change - 10 Mar 2017
The description was changed
Status New Closed
Closed_Date 0000-00-00 00:00:00 2017-03-10 13:59:27
Closed_By tonypartridge
avatar tonypartridge tonypartridge - close - 10 Mar 2017

Add a Comment

Login with GitHub to post a comment