HACKER Q&A
📣 hghsv

Why doesn't “select *” in grouped SQL queries yield all grouped columns?


SQL Server complains when trying to "select *" in a grouped query (unless the extreme mostly very useless case where all columns are being grouped by). It makes sense if "select *" has to mean all columns in the table (or joined tables etc) "pre grouping" (in lack of a better word). I don't know if my SQL queries are too trivial, but it seems like one would be able to merely select grouped columns or aggregate calculations on any column when grouping -- so why not let "*" refer to all the selectable columns, ie those grouped by. I believe it would improve typing speed surely, but perhaps even readability.

I know there is some dialect(s?) that let's you use \1 or similar for the first column grouped by and so forth, which I guess if a half-way solution.

Is there some reason that "select *" in grouped queries could not/should not select the columns grouped by?


  👤 stareatgoats Accepted Answer ✓
There is I guess no reason except that it would be confusing. "select *" normally means select all the columns in the table, which is ok for ad hoc eyeballing smaller tables, where it saves you the typing involved of specifying possibly hundreds of columns - that syntax is not really ok for anything else.

But you are suggesting that the same syntax should be extended (by DBMS vendors presumably) to aggregate "group by" queries, where there is hardly any such efficiency gain involved; you absolutely need to specify the "group by" columns anyway, and you can simply copy that list over into the select statement. So you would achieve more confusion, and achieve almost no efficiency.

The only situation where your suggestion would make some sense would be if you routinely work with databases with tables that have many (hundreds?) of columns where a majority of these fields are included in "group by" statements. But these would be extreme edge cases in my experience. And the benefit would still be debatable.


👤 hans_castorp
Because it makes no sense.

If you group by one column, there might be multiple rows for a single value of the group column. And multiple rows means there might be multiple, different values to choose from.

And because SQL doesn't "guess" which of those you want, you are forced to tell the engine which one you want. Either by specifying an aggregate that picks exactly one (min, max, ..) or an aggregate that works on all values (avg, sum, ..)