HACKER Q&A
📣 fny

Does anyone else think SQL needs help?


I have been writing a decent amount of SQL for three years now, and while the paradigm is extremely powerful, the language lends itself to unmaintainable spaghetti code quickly.

Fundamentally, SQL lacks meaningful abstraction, and there's no sane way to package very common operations.

Say you want to find elements in a row that correspond to some maximum value when grouped by date. Today, you'd need to write something like this EVERY SINGLE TIME:

``` SELECT sd1.sale_person_id, sd1.sale_person_name, sd1.no_products_sold, sd1.commission_percentage, sd1.sales_department FROM sales_department_details sd1 INNER JOIN (SELECT sale_person_id, MAX(no_products_sold) AS no_products_sold FROM sales_department_details GROUP BY sale_person_id) sd2 ON sd1.sale_person_id = sd2.sale_person_id AND sd1.no_products_sold = sd2.no_products_sold; ```

Wouldn't something like this be nicer?

``` SELECT sale_person_id, max(no_products_sold) as max_sold, link(sale_person_name, max_sold), FROM sales_department_details ```

Frankly, it seems like some sort of macro system is needed. Perhaps the SQL compiles into the above?


  👤 dasil003 Accepted Answer ✓
Agreed that SQL lends itself to spaghetti. However after working with it for 25 years both as a developer but also collaborating with analysts and data scientists, I have to say I appreciate its basic declarative nature and remarkable portability as it has expanded into distributed databases.

For me the value of being able to walk up to an unfamiliar database of widely varying tech and start getting value out of it immediately is the killer app. Macros or other DB-specific extensions would be useful at times, but I’m not sure how much they would enable solving the messiness problem in a generic way that reshapes the paradigm more broadly. My instinct is the messiness is a consequence of the direct utility and trying to make it nicer from a programmerly aesthetics point of view might be hard to sell. It’s not like SQL doesn’t have affordances for aesthetics already (eg. WITH clauses).


👤 throwyawayyyy
The story of the past 10 years at my FAANG has been one of making our key-value databases look as much like relational databases as possible, complete with SQL query engines (at least two of them, because what would a FAANG be without multiple slightly incompatible ways of doing things?). Honestly it's been a huge boon. Should we have abandoned SQL entirely, and come up with a new language? Maybe. But given that the primary initial motivation for this work was to allow analysts to creation millions of reports without bothering eng, it had to be a language that analysts understand. That's SQL.

👤 pdntspa
I can't believe that at time of this writing, only a few other comments are mentioning VIEWs! Like that is this dude's exact complaint!

smh...


👤 forinti
Frankly, the biggest problem with SQL is that not enough people know it well (and a lot of people think they know it well).

👤 simonw
I find CTEs (the WITH statement) greatly improved my ability to run more complex queries because they offer an abstraction I can use to name and compose queries together.

SQL views are useful for that kind of thing too.


👤 croes
Wouldn't

SELECT * FROM

(SELECT DENSE_RANK() OVER ( PARTITION BY sd1.sale_person_id ORDER BY sd1.no_products_sold DESC) r ,sd1.sale_person_id ,sd1.sale_person_name ,sd1.no_products_sold ,sd1.commission_percentage ,sd1.sales_department

FROM sales_department_details sd1) sd

Where sd.r = 1

do the same without the inner join?


👤 CodesInChaos
You could take a look at EdgeDB. It's built on postgres, but uses its own query language and data model. Its data model has build in support for links and polymorphism. The query language EdgeQL makes it easy to follow links and output embedded arrays.

👤 trollied
>Frankly, it seems like some sort of macro system is needed

Views, materialized views, CTEs.


👤 carlineng
I've written about this topic a fair amount; first examining some of the criticisms of SQL that have been present since the language's inception [1], then looking at a new project called Malloy that I'm quite excited about, and think has a lot of potential to address some of the problems of SQL as it relates to data analysis [2].

[1]: https://carlineng.com/?postid=sql-critique#blog [2]: https://carlineng.com/?postid=malloy-intro#blog


👤 richbhanover
What do you think of PRQL ("prequel")?

https://github.com/prql/prql


👤 hotdamnson
This is a very poorly written SQL.. Try some modern syntax: select DISTINCT sale_person_id, sale_person_name, max(no_products_sold) over (partition by sale_person_id) AS max_products_sold, commission_percentage, sales_department from no_products_sold;

👤 tacosbane
I think every dialect has an idiomatic way to do what you're asking. e.g., in Snowflake it's `select * from sales_department_details qualify no_products_sold = max(no_products_sold) over (partition by sales_person_id)`, PSQL `select distinct on (sales_person_id) * from sales_department_details order by sales_person_id, no_products_sold desc`, ...

👤 bawolff
It sounds like you just reinvented VIEWs?

👤 cultofmetatron
Elixir's Ecto library basically lets you write sql in elixir while abstracting away all the stuff op talking about.

you can simply assign a subquery to a variable and reuse it.

PS: a lot of you are saying use views. thats dangerous. In most cases, thats a nice way to end up with ballooning autovacume processes as your databasse gets bigger. Materialized views are fine as long as your use case doesn't need realtime accuracy.


👤 benjiweber
create view max_products_sold as select sale_person_id, max(no_products_sold) AS no_products_sold from sales_department_details group by sale_person_id;

select sale_person_id, sale_person_name, no_products_sold, commission_percentage, sales_department from sales_department_details natural join max_products_sold;


👤 PaulHoule
I am using JooQ now at work to write SQL statements in a Java DSL. Java’s type system mostly works as intended and is helpful, particularly in conjunction with an IDE. Many kinds of metaprogramming are possible with JooQ.

👤 pgt
Learn Datalog Today: http://www.learndatalogtoday.org/

👤 TekMol
Put your example on https://www.db-fiddle.com/ or http://sqlfiddle.com/ or some other sql fiddle then we can all experiment with it and come up with leaner solutions.

Just asking for mooar build in stuff just leads to bloated software.


👤 scubbo
Absolutely.

https://www.scattered-thoughts.net/writing/against-sql (found from HN a week or so ago)


👤 sshine
Pine [1] and jq [2] are combinator query languages.

Every expression represents a filter of some kind.

In SQL (and in relational algebra), every expression represents a dataset.

That means composing SQL expressions is composing data, not composing operations on data.

Since there is a somewhat low, backwards-compatible barrier to entry to build a combinator language on top of SQL (Pine is an example), whether or not trading the increased learning curve of using combinators for better composability is a good idea can stand the test of time.

[1]: https://github.com/pine-lang/pine [2]: https://stedolan.github.io/jq/


👤 SakeOfBrevity
Check out dbt (not dbt-cloud), great open-source SQL tool, macro system is included in the ever-growing set of features. It helps managing SQLs a lot.

👤 wodenokoto
Nicer formatting of OPs woes:

    SELECT 
        sd1.sale_person_id,
        sd1.sale_person_name, 
        sd1.no_products_sold, 
        sd1.commission_percentage, 
        sd1.sales_department 
    FROM 
        sales_department_details sd1 
    INNER JOIN (
        SELECT 
            sale_person_id, 
            MAX(no_products_sold) AS no_products_sold 
        FROM sales_department_details 
        GROUP BY sale_person_id
    ) sd2 
    ON 
            sd1.sale_person_id = sd2.sale_person_id
        AND sd1.no_products_sold = sd2.no_products_sold;

👤 rawgabbit
What you described is already solved by window functions.

👤 SnowHill9902
That’s because you are 1) writing naive SQL and 2) forcing naive SQL on an arbitrary data structure. You may use VIEWs, CTEs, GROUP BY CUBE/ROLLUP for 1) and rethink your structures for 2).

👤 kristov
I think it would be neat if columns of a resultset could contain other resultsets (relvars?) as values. This would be the natural outcome of a join operation, and give greater flexibility on how you want to reduce those sub-relvars down. It also makes recursive queries more natural, yielding a "tree" of results. You could even build a reducer for a recursive query that concatenated rows to a string to serialize into whatever: json, XML, etc.

👤 gerdesj
I'm not an expert but whenever I need to use SQL it seems to be complicated enough as I need at the time to do the job. When the job becomes more complicated, so does SQL.

I think it is a remarkably well designed language. I often get decent results for minimal effort. It isn't something I use day to day but more often month to month and I generally forget what I learned a couple of years ago.

For example, I wanted to clear down some records for a defunct client from our backup results database. I started by using SELECT to reliably find the customer (not as simple as it might sound). Then I construct an INNER JOIN (in this case most joins are the same) to link to the jobs. A SELECT COUNT(*) experiment looks about right. I snapshot the VM and change my experiment to a DELETE.

SQL is a bit strange but it lends itself to experimentation ... to results, way better than any other language I know.

Then you get into functions and views and the like. SQL has just enough complication for whatever job you have at hand and no more.

I recently ran up a query in Icinga2 (Director) against a MariaDB and it seemed to write itself. I get far fewer syntax errors in SQL than anything else that I try to abuse!


👤 thangalin
From https://bitbucket.org/djarvis/rxm/src/master/ :

    root               > people,           # "root" keyword starts the document
    person             > person,           # maps table context to a node
    .age               > @age,             # @ maps a column to an attribute node 
    .first_name        > name/first,       # maps a column to a node
    .last_name         > .../last,         # ... reuses the previous node's path
    account.person_id +> person.person_id, # +> performs an INNER JOIN
    account            > account,          # context is now "account" node
    .id                > @id,              # account id attribute
    ^,                                     # pop stack to previous table context
    address            > address,          # switch context to "address" node
    .*,                                    # glob remaining columns
    ;                                      # Denotes optional WHERE clause
The query produces:

    
      
        
          Charles
          Goldfarb
        
        
        
456 123 Query Lane San Francisco
It wouldn't take much to make that a JSON document.

👤 AtlasBarfed
1) templating constructs or env vars? The programming languages can provide that too, but... prepared statements and variable-based substitution was a huge improvement to SQL code from programming languages (at least in java and ruby). Why wasn't that part of the SQL standard, why did it have to evolve organically? If it was in the specification, so many SQL injection vulns would have been avoided.

2) SQL statement syntax should start with the table specification so you can get autocompletion help when doing the columns and other expressions. The INSERT statement has it right, while SELECTs suck because the columns are specified before the WHERE clause, so there is no context. At least provide alternate syntaxes like SELECT FROM

COLUMNS WHERE , it would be minimal for the DB vendors to support.

3) SQL standards have been too nice to vendors, especially Oracle. The lack of portable SQL between databases has hamstrung the industry.

4) Stored procedure standards should have been formalized, as should standard apis. JDBC/ODBC were ok, but why are programming languages doing major enhancements/de facto standards setting?


👤 PaulHoule
In the semantic web, RDF can put every table in the universe in a unique namespace and OWL can bind together multiple tables and do inference at a high level, then you can query it with SPARQL which is quite similar to SQL at heart.

The Department of Defense was asked by Congress to answer “Where does the money go?” and tried to use OWL and SPARQL queries across all the relational databases they owned but they couldn’t get it to work.

I can’t help but think something along those lines could present as a ‘low code’ data tool.

You can accomplish something similar with views, foreign tables, stored procedures, user defined functions, and other mechanisms which are common in databases like PostgreSQL, Microsoft SQL, etc.

I find PostgreSQL pretty handy because it supports embedded data structures such as lists in columns, supports queries over JSON documents in columns, etc.

My favorite DB for side projects is ArangoDB which uses collections of JSON objects like tables and the obvious algebra over them which lets you query and update like the best relational dbs but you don’t have the programmability of views, stored procedures, etc.


👤 RedShift1
If you make the joining keys have the same name, you can use USING(col_foo) instead of table1.col_foo = table2.col_foo. It's one of the reasons I always use tablename_id as primary key name and foreign key name. Doesn't always work (for example 2 foreign keys linking to the same table but represent something different like created_by and modified_by).

👤 greggyb
There are a number of query languages to address this type of reuse and composability for analytical query workloads.

- MDX: created by Microsoft to provide a dimensional query language. The language is incredibly powerful, but depends on your understanding of dimensional modeling (go read Kimball is the best starting point for learning MDX). There are several tools, both commercial and open source, which implement an MDX interface to data.

- DAX: Microsoft's attempt to make MDX more approachable. A relational, functional language built on top of an in-memory columnstore relational storage engine, and used in several Microsoft products (Power BI, Analysis Services Tabular mode).

- Qlik has its own expression language whose name I am not sure of.

- Looker has LookML

There are a lot of BI tools out there. Not all have an expression language to support reusable measures/calculations, but many do. You may want to look into one.


👤 jasfi
Procedural SQL is how you package reusable/common operations in the DB world. In terms of programming languages, you'd use an ORM.

SQL isn't as broken as you think, it does what it's supposed to do very well. There are many ways to work with SQL such as procedural SQL and ORMs that build on it.


👤 mamcx
Check my project:

https://tablam.org

---

Exist 2 major ways to solve this: You do a transpiler (like most ORM are, actually) or you go deeper and fix from the base.

The second could yield much better results. It will sound weird at first, but not exist anything that block the idea of "a SQL" language work for make the full app, with UI and all that.

SQL as-is is just too limited, in some unfortunate ways even for the use-case of have a restricted language usable for ad-hoc queries, but that is purely incidental.

The relational model is much more expressive than array, functional models (because well, it can supersede/integrate both) and with some extra adjustment you can get something like python/ML that could become super-productive.


👤 Liron
I agree 100%. This post from the EdgeDB team makes a great case why SQL is outdated [1]

[1] https://www.edgedb.com/blog/we-can-do-better-than-sql


👤 alrlroipsp
> Frankly, it seems like some sort of macro system is needed.

Check out stored procedures and views.

Here's a great summary: https://stackoverflow.com/a/5195020


👤 jhoelzel
From my own experience, SQL is solid. The problem that is see the most is the one we are taught in programming 101: atomic data.

The information you seek above is computed and therefore justifies a view table instead of a "real one", because you really are doing something "complex". after that you can always select yourdata from yourview and you should be fine.

Relational data can only be really efficient if its relational and sometimes, computed tables are also a nice thing to have but even better: sometimes a porgramm needs to aggregate the data periodically and put it into the database relationally.


👤 wodenokoto

    SELECT 
        sale_person_id, 
        max(no_products_sold) as max_sold, 
        link(sale_person_name, max_sold), 
    FROM sales_department_details 
I'm not sure what `link` is supposed to be, but wouldn't the dream syntax modification to SQL simply be

    SELECT 
        sale_person_id, 
        no_products_sold as max_sold, 
        sale_person_name, commission_percentage, sales_department,
    FROM sales_department_details
    GROUP BY sale_person_id
    WHERE no_products_sold = max(no_products_sold)

👤 anon84873628
You sound like the exact target for Malloy, which was posted to HN recently:

https://github.com/looker-open-source/malloy


👤 onetom
SQL is great, but you do have to dig deeper and use more of its features, like many other comments mentioned, like WITH & VIEWs.

On the other hand, maybe the time is ripe for you to "graduate" to using Datalog via other kind of databases, like Datomic, XTDB (fka Crux) and the likes (DataScript, Datalevin, etc)

Of course knowing Datalog won't help you with your SQL issues, but you can look into changing jobs, if you like what you see or maybe do some ETL job to spill over some data into the mentioned DBs and see whether querying it with Datalog is any better.


👤 sph
The problem with SQL is that it's all about transforming data, and that would better be modelled as a pipeline.

https://prql-lang.org/


👤 avereveard
Use composition, views are great for that, modern databases will optimize trough views as well so it's not like the day of old where a view with a join was an all consuming monster.

We already had the best minds in the world to try and replace sql, but because sql is rooted in group theory and math is hard to budge they all eventually had to get back and implement the group bys and havings and whatnot that complicate the sql syntax, except tackled on and with their proprietary syntax.


👤 wilde
> Fundamentally, SQL lacks meaningful abstraction, and there's no sane way to package very common operations.

Tables are the unit of abstraction.

You’re on the right track with your instinct that you should be able to pull repeated work into a reusable unit. In many DB systems you can just register these sub queries as views and treat them as real tables. In the ETL world that I’m typically working in, you go one step further and just write pipelines to make useful intermediate tables.

Reuse tables, not subqueries.


👤 PaulHoule
The one that drives me nuts is

   select this,count(*) from that group by this
which is the most common data exploration query of them all which makes you type ‘this’ twice.

👤 default-kramer
Yes, I can't believe that the awesome power of relational DBs is still hindered by the major flaws of SQL. Here is my attempt at something better: https://docs.racket-lang.org/plisqin/Read_Me_First.html The feature I am most pleased with is that joins are values (or expressions, if you prefer) which can be then refactored using normal techniques.

👤 heavyset_go
I agree that additional abstraction and convenience features would be nice to have in SQL.

I disagree that it needs "help", which suggests that there is something fundamentally wrong with SQL that needs to change, possibly with help from outside of the SQL community.

On the contrary, I think SQL is great, but it could always be better.


👤 narrator
Generally, I find that if you do anything too complicated with SQL it won't be scalable because the optimizer will start making bad decisions and you should just do it in the app tier where you can tune the algorithm more easily.

Thus, if you need abstraction, you probably are doing it wrong already.


👤 qwerty456127
Isn't SQL a nice target for a no-code visual language to compile to?

I also don't understand how does "SQL lacks meaningful abstraction". Most of serious RDBMSes have user-defined functions and stored procedures, that's just the ORM enthusiasts who are unwilling to use them.


👤 x-shadowban
Really would like "select * except ..." and a non-dynamic-sql way to remap column names

👤 spullara
I translated this into "I don't really know SQL but I think I do".

👤 truth_seeker
SQL is 4th generation programming language and turing complete. It is developer who needs help by knowing more of the inbuilt features and keywords of SQL and how to compose them in effective manner.

👤 db48x
People build composable and reusable abstractions on top of sql all the time.

👤 bandushrew
You can use stored procs and/or views to hide the details of queries if you want.

the specifics of an inner join vs outer join vs left join etc are crucial though. It makes sense to be able to specify them.


👤 michael_j_x
Feel you. I've long moved to jupyter notebooks as my de-facto database tool. I use python functions that generate the SQL, which I then execute in jupyter and load the results into pandas.

👤 temp_account_32
SQL does seem antiquated, I know a lot of people who have worked with it for ages are in the 'don't fix if it ain't broken' camp but consider even completely basic things that you cannot do, such as get autocomplete for what you want to select.

Because you are forced to specify what you want to SELECT before the FROM, you are left guessing the column names.

In contrast, in a programming language or ORM, you'd do something like (pseudocode) Entity.Select( ... ) and can get suggestions of the fields, but with SQL you are forced to do it backwards.


👤 george_ciobanu
Check out https://human.software, a visual language replacement (no code)

👤 CodeWriter23
No wonder you think it needs help if you write it like that in one line with no indentation and insufficient capitalization.

👤 carabiner
There needs to be a Pandas to SQL translator.

👤 SQL2219
With dynamic sql you can write your own. I know you don't have time, just throwing it out there.

👤 mr_toad
> Frankly, it seems like some sort of macro system is needed.

Like dynamic SQL? That has its own problems.

Rather than trying to extend SQL I think that people should stop trying to use SQL/RDBMSs for every task.


👤 jonnypotty
I wish programming was easier sometimes too.

👤 nknealk
There’s a LAST_VALUE statement for this kind of thing that most databases support. Partition by sale person name order by number of products.

👤 rzwitserloot
This question breaks down across 3 completely different lines.

But first, SQL is a descriptive language: You describe what you want, you don't describe how you want it.

Instead, you rely on extremely complicated optimisers and a lot of using `EXPLAIN` to see what's happening under the hood. This is fundamental in the very _point_ of SQL.

Keeping that in mind, what do you mean:

[1] You feel the general notion of doing relational queries using a descriptive language is just broken. In which case, the fact that so many 'simpler' systems decided to add SQL support instead of just relying on the fact that folks will program their way into efficient queries is telling. People like being able to do this.

[2] You think the general notion is great, just, SQL specifically is badly designed. This then breaks down into two explanations.

[2a] You haven't actually read the spec, specifically your DB engine's spec. There's a ton DB engines can do - you can make your own procedures, your own aggregator functions, add triggers, and use VIEWs to combine it all (and VIEWs can act entirely as tables, including letting you INSERT or UPDATE them). For example, I'm pretty sure you can do precisely what you want, or at least get quite close, by only adding a little bit of extra definitions in postgresql. Either you just don't know enough (I'm just guessing here, based on very little info, please don't take it personally), or you do, but you find all that 'needlessly complicated'. In which case I think you're just misjudging how varied (and therefore complicated) your average DB user's needs are. Add the ability to do everything you want, but then for all use cases of people like you, and it is so complicated, you yourself would say 'yeah but not so complicated', and we're right back where we started.

[2b] No you really do know your way around SQL, including all the various exotic features that DB engines, particularly really good ones like psql, offer. And it's not good enough and you think you have a good grasp on how to make SQL specifically better. In which case: That has been done. Multiple times. The XKCD with the 14 standards comes to mind. Here is a REALLY long list of query languages: https://en.wikipedia.org/wiki/Query_language - pick your poison.

But first, think about the fact you didn't know about any of these. Whatever you're planning to 'replace' or 'fix' SQL, isn't that just doomed to be the 19th entry on that wikipedia list, just as forgotten as all the others?

In which case, [A] many, many db engines have extra syntax you might want to look at. You can make recursive queries, windowing functions, define your own procedures, and more - and use VIEWs to let you do precisely what you want. I'm pretty sure you can do precisely what you desire with postgres, using only a fairly simplistic amount of