? ? Success

User tests: Successful: Unsuccessful:

avatar okonomiyaki3000 okonomiyaki3000 - open - 30 Jan 2014
avatar chrisdavenport
chrisdavenport - comment - 30 Jan 2014

There is already a PR for this issue: #2735. Please review, test and improve. Thanks.

avatar mbabker
mbabker - comment - 30 Jan 2014

So I tested both patches and the current staging branch by running the Smart Search indexer and doing a search on the front end since that generates a UNION query. Here's my results:

PR #2735 Query:

SELECT t.term, t.term_id
FROM #__finder_terms AS t
WHERE t.term = 'joomla' AND t.phrase = 0
UNION (
SELECT t.term, t.term_id
FROM #__finder_terms AS t
WHERE t.stem = 'joomla' AND t.phrase = 0)

PR #2986 Query:

(
SELECT t.term, t.term_id
FROM #__finder_terms AS t
WHERE t.stem = 'joomla' AND t.phrase = 0)

Current Staging:

SELECT t.term, t.term_id
FROM #__finder_terms AS t
WHERE t.term = 'joomla' AND t.phrase = 0

Judging by this, @chrisdavenport patch looks to be the correct fix.

avatar okonomiyaki3000
okonomiyaki3000 - comment - 31 Jan 2014

It depends on how you would want union() (and unionAll()) to work.

JDatabaseQuery has the concept of query 'type'. So, SELECT is a type, INSERT is a type, etc. Calling one of those functions will set the type which affects how the query will be rendered.

There are two possible approaches to UNION. One is to treat it as a type which is what my patch does. The other is to treat it as a clause of the SELECT type which is what #2735 does. The upatched code fails because the union functions treat UNION as a clause but __toString treats it as a type.

There may be pros and cons to both approaches. I suggest considering carefully how you would like union() and unionAll() to work. In any case, unionDistinct() should be deprecated because UNION DISTINCT makes no sense.

avatar mbabker
mbabker - comment - 31 Jan 2014

Personally, I'm impartial to how it should work. With that said, I don't know enough about UNION queries to know how it is supposed to work, but the query I got with Chris' patch applied looks the most correct to me.

avatar okonomiyaki3000
okonomiyaki3000 - comment - 31 Jan 2014

The reason is that the way to use the function is different in my patch. His is like:

$query->select('*')->from('foo')->union('select * from bar');

But mine is like:

$query1->select('*')->from('foo');
$query2->select('*')->from('bar');
$query3->union($query1)->union($query2);

or the last line could be:

$query3->union(array($query1, $query2));
avatar okonomiyaki3000
okonomiyaki3000 - comment - 31 Jan 2014

So the difference is that, in his, you start with a select and append one or more selects to it with union(). In mine, you just pass any number of selects to union() and it wraps them up as a separate query.

avatar mbabker
mbabker - comment - 31 Jan 2014

So I'd change wherever $query->union() is implemented to use the code as expected too that way there's a good way to test the change. From my test, it's obvious the current code is broken, but without fixing at least that query, your patch doesn't fair much better at the moment (not saying it's wrong, just at a glance, that's the appearance).

avatar okonomiyaki3000
okonomiyaki3000 - comment - 31 Jan 2014

But how is the code expected to work? The unpatched code is written halfway to function like my patch and halfway to function like his so it doesn't work at all. So we have both fixed it but with different approaches to how union() should be used.

Since it is currently useless, there is no reason to think that either way should cause a bc issue. We should consider how is the most useful and sensible way for the function to be used in practice and implement it that way. Maybe this is a topic that should be discussed on the google groups forum and get some additional opinions. Actually, I'm not sure which approach is better myself. I'd like to think more about what are the benefits and drawbacks of each.

avatar Bakual
Bakual - comment - 31 Jan 2014

Imho
$query->select('*')->from('foo')->union('select * from bar');
is much more intuitive than
$query3->union(array($query1, $query2));

avatar wilsonge
wilsonge - comment - 31 Jan 2014

Actually i think that having both instances as JDatabaseQuery elements makes more sense than having one being a written SQL clause

avatar betweenbrain
betweenbrain - comment - 31 Jan 2014

Imho
$query->select('*')->from('foo')->union('select * from bar');
is much more intuitive than

+1

avatar Bakual
Bakual - comment - 31 Jan 2014

Actually i think that having both instances as JDatabaseQuery elements makes more sense than having one being a written SQL clause

You can of course pass the second query as JDatabaseQuery object as well. The example was a bit misleading

$query1->select('*')
    ->from('foo');
$query2->select('*')
    ->from('bar');
$query1->union($query2);
avatar wilsonge
wilsonge - comment - 31 Jan 2014

Oh ok then :+1: to that

avatar AmyStephen
AmyStephen - comment - 31 Jan 2014

It's not clear to me, so, this is a question.

Would both generate SQL like the following, with the following results (given sample data)?

UNION EXAMPLE:

SELECT id, `title`
FROM `jos_content`
UNION
SELECT id, `username` as `title`
FROM `jos_users`
ORDER BY `title`
LIMIT 0, 3

RESULTS:

859, 'admin'
1, 'Administrator Components'
2, 'Archive Module'

avatar chrisdavenport
chrisdavenport - comment - 31 Jan 2014

Don't know about Elijah's code, but with my proposed fix:

$q1->select('id, title')->from('#__content');
$q2->select('id, username as title')->from('#__users');
$q1->union($q2)->order('title');

produces:

SELECT id, title
FROM #__content
UNION (
SELECT id, username as title
FROM #__users)
ORDER BY title

If you were to put an order on the second SELECT, like this:

$q1->select('id, title')->from('#__content');
$q2->select('id, username as title')->from('#__users')->order('username');
$q1->union($q2)->order('title');

then you'd get this:

SELECT id, title
FROM #__content
UNION (
SELECT id, username as title
FROM #__users
ORDER BY username)
ORDER BY title

which I think is syntactically correct although if memory serves the "ORDER BY username" will be ignored (by MySQL at least) without a LIMIT clause in the same SELECT clause, which is not possible with the current implementation of JDatabaseQuery (but could be added quite easily).

avatar AmyStephen
AmyStephen - comment - 31 Jan 2014

Thanks @chrisdavenport -- that's what is needed. +1

avatar okonomiyaki3000
okonomiyaki3000 - comment - 1 Feb 2014

@chrisdavenport

As I understand it...

SELECT id, title
FROM #__content
UNION (
SELECT id, username as title
FROM #__users
ORDER BY username)
ORDER BY title

means that you get a the items from #__content in the default order, get the items from #__users ordered by username (which will be aliased as title), then take the full set of both results and order it by title. In this example ORDER BY username is redundant but in the end everything works out the way you want. I think you might have problems in more complex examples.

How about this:

SELECT a, b, c
FROM foo
UNION
SELECT a, b, c
FROM bar

If I want the results from foo to be ordered by b and the results from bar to be ordered by c, is this possible?

Obviously this is kind of an edge case.

avatar chrisdavenport
chrisdavenport - comment - 1 Feb 2014

In MySQL the "ORDER BY username" without an associated LIMIT clause is optimised out (ie. ignored), so I could have written "ORDER BY id" or "ORDER BY somethingelse" and it would make no difference. See: http://dev.mysql.com/doc/refman/5.0/en/union.html. Currently there is no mechanism in JDatabaseQuery to get a LIMIT on that SELECT because the LIMIT is applied when you do $db->setQuery($query, $offset, $limit).

For your "edge case", I think this should work:

$q1->select('a, b, c, b AS sort_col')->from('foo');
$q2->select('a, b, c, c AS sort_col')->from('bar');
$q1->union($q2)->order('sort_col');

which produces:

SELECT a, b, c, b AS sort_col
FROM foo
UNION (
SELECT a, b, c, c AS sort_col
FROM bar)
ORDER BY sort_col
avatar okonomiyaki3000
okonomiyaki3000 - comment - 1 Feb 2014

@chrisdavenport Ah, you are quite right. It doesn't work quite like I expected (http://dev.mysql.com/doc/refman/5.0/en/union.html) but you bring up a good point about the use of ORDER BY with LIMIT inside of a UNIONed query. Since the query passed to union() will not have a LIMIT, ORDER BY is pointless. But what if we could give it a LIMIT? What if we had something like:

public function union($query, $offset = 0, $limit = 0)

Since $distinct and $glue serve no purpose, get rid of them, replace them with something useful.

avatar Bakual
Bakual - comment - 1 Feb 2014

Since $distinct and $glue serve no purpose, get rid of them, replace them with something useful.

Just a note: It's not easy to just replace an argument. Even if it currently doesn't work and serves no purpoase, it may still be used. If you're going to replace it, you will have to make sure that you can distinguish between the old use (as distinct and glue -> could be anything, including int) and the new one (as offset and limit -> both integers). This could be hard in this case.

avatar okonomiyaki3000
okonomiyaki3000 - comment - 1 Feb 2014

It's not just the arguments that serve no purpose though, it's the function itself. It currently does not work at all. Who could be using it?

avatar okonomiyaki3000
okonomiyaki3000 - comment - 1 Feb 2014

Then Smart Finder is also broken. How long has this gone unnoticed, I wonder?

avatar chrisdavenport
chrisdavenport - comment - 1 Feb 2014

The offset and limit are not conceptually associated with the union; they're associated with the select, so I would argue that adding them as parameters to the union() method would be wrong in my opinion.

However, it looks like I was wrong about limits not being available in JDatabaseQuery as it looks like the capability has already been added! There is a JDatabaseQueryLimitable interface and those database drivers that support it can do $query->setLimit($limit, $offset).

So, this

$q1->select('a, b, c, b AS sort_col')->from('foo')->setLimit(200, 25);
$q2->select('a, b, c, c AS sort_col')->from('bar')->setLimit(100, 50);
$q1->union($q2)->order('sort_col');

will produce this

SELECT a, b, c, b AS sort_col
FROM foo
UNION (
SELECT a, b, c, c AS sort_col
FROM bar LIMIT 50, 100)
ORDER BY sort_col LIMIT 25, 200

And this would produce the same result

$q1->select('a, b, c, b AS sort_col')->from('foo');
$q2->select('a, b, c, c AS sort_col')->from('bar')->setLimit(100, 50);
$q1->union($q2)->order('sort_col');
$db->setQuery($q1, 200, 25);

It doesn't seem to be testable with the current unit tests because the mock database driver does not have the JDatabaseQueryLimitable interface. As a test newbie I have no idea how to fix that.

avatar chrisdavenport
chrisdavenport - comment - 1 Feb 2014

A better example would be:

$q1->select('a, b, c')->from('foo');
$q2->select('a, b, c')->from('bar')->order('c')->setLimit(100, 50);
$q1->union($q2)->order('b');

which produces:

SELECT a, b, c
FROM foo
UNION (
SELECT a, b, c
FROM bar
ORDER BY c LIMIT 50, 100)
ORDER BY b
avatar okonomiyaki3000
okonomiyaki3000 - comment - 1 Feb 2014

Oh good, then it's already doable. As for the current unit tests, I think they should all (the ones related to union) be rewritten anyway since they obviously do nothing of value.

Another edge case comes to mind. How would you put a limit (and possibly order) on the first query as opposed to the result of the UNION?

avatar chrisdavenport
chrisdavenport - comment - 2 Feb 2014

PR #2735 already includes revised and several new unit tests. Could you suggest any additional ones?

For your edge case, I think you'd have to use a dummy query, like this:

$q0->select('a, b, c')->from('foo')->where('1=0');
$q1->select('a, b, c')->from('foo')->order('b')->setLimit(200, 25);
$q2->select('a, b, c')->from('bar')->order('c')->setLimit(100, 50);
$q0->union(array($q1, $q2))->setLimit(10, 5);

which produces:

SELECT a, b, c
FROM foo
WHERE 1=0
UNION (
SELECT a, b, c
FROM foo
ORDER BY b LIMIT 25, 200)
UNION (
SELECT a, b, c
FROM bar
ORDER BY c LIMIT 50, 100) LIMIT 5, 10
avatar okonomiyaki3000
okonomiyaki3000 - comment - 2 Feb 2014

I suppose that works. It's a bit funny but this is a rare case anyway.

I'll have to have a look at the new tests. The thing I didn't like about the old ones was that they only tested some internal representation and not the output that would actually be used. Which is why code that didn't actually work was able to pass them. I make no claim of any expertise in TDD at all but I think there must be a lot of tests that need closer attention. I wonder what can be done to maintain a high standard of quality for tests? Anyway, that's another topic.

Well, I'm basically satisfied with your patch. Philosophically, I prefer my approach but it seems I'm in the minority.

avatar mbabker mbabker - change - 11 Mar 2014
Status New Closed
Closed_Date 0000-00-00 00:00:00 2014-03-11 22:56:10
Labels
avatar mbabker mbabker - close - 11 Mar 2014
avatar mbabker
mbabker - comment - 11 Mar 2014

Closed in favor of #2735

avatar mbabker mbabker - close - 11 Mar 2014
avatar okonomiyaki3000 okonomiyaki3000 - head_ref_deleted - 24 Mar 2014

Add a Comment

Login with GitHub to post a comment