HACKER Q&A
📣 jaxk

Are graph databases used in the wild?


I am a front end guy reading up on databases for some backend task. It seems like graph databases try to manufacture a problem rather than address existing ones. They also always use either fraud detection or some such usecase that's niche among niche. That makes me wonder if businesses actually use/need them or its just a glorified academic project, but are there so many products in the space competing!! What's going on?


  👤 malux85 Accepted Answer ✓
Graph databases are used in many many places. Examining bank transactions, finding the shortest path through state machines, molecular reaction modelling, transactional arbitrage/trading, circuit modelling, fact modelling in AI, threat modelling in cyber security, network routing and large scale analysis, virus and sickness modelling, there’s so many more

These systems usually have specialised on-disk data structures and algorithms (e.g. path finding) that are multiple orders of magnitude faster than the equivalent data stored “Flat” in a column database, this difference in time and space efficiency isn’t “some glorified academic project” - but the concrete difference between practical and impractical


👤 mattcdrake
Graph databases are used by the IRS for (at minimum) detecting "patterns of abusive tax transactions"[0] and tax fraud[1].

The IRS also recently contracted Brillient to "define and prototype a graph database for the individual taxpayer"[2]. This is supposed to "enable IRS researchers to visualize complex relationships to improve compliance and enforcement."

[0] https://www.irs.gov/pub/irs-soi/09rescongraphquery.pdf

[1] https://www.aaai.org/Papers/Workshops/2005/WS-05-07/WS05-07-...

[2] https://www.brillient.net/news/brillient-awarded-new-task-or...


👤 randomopining
Graph DBs I believe are just useful when your data and the value you can get from your data is heavily reliant on relationships. This lets you construct a graph, and the field of CS/Math that has evolved around graph structures is implemented in a DB software that abstracts useful features in common functions etc.

Graph implies lots of nodes, which means lots of data. Only certain businesses use lots of data. Most small/medium businesses won't use it. That's why it seems like graphs are mostly in use at big tech etc.


👤 unknown_error
Whether or not you use a graph DB under the hood, having access to a GraphQL API (https://graphql.org/) is much more elegant than having to join together a bunch of separate REST calls or SQL queries. It automatically traverses relationships for you based on the fields you need, and returns only the fields you ask for. So you get exactly what you need, no more and no less. Most GraphQL servers have a GUI explorer query builder too so you just click on the fields you need and it'll construct the query for you.

GraphQL can also run on traditional SQL databases, but I think at some point they hit complexity limits and performance issues because JOINs are hard for them, especially if the columns aren't indexed. In a proper graph database, relationships are first-class parts of any data model and there are no "tables" to speak of, just nodes and the arbitrary, complex relationships between them. It makes data modeling both more intuitive (if your data is naturally a graph) and more performant. Here's one take on it (https://developers.mews.com/intro-to-graph-databases/)

At the end of the day there is no magic cure-all for data storage. It depends on the data you're storing and the way you need to read and write from it. And business wise, it may not be worth it to rewrite 10 years of SQL databases to improve performance by a few percent. But if I were starting a new project from scratch, one that involves layers of interconnected data (say, a bookstore with connections between books, images, authors, customers, reviewers, reviews, inventories, third-party merchants, Goodreads entries, different versions for audiobooks and ereaders, multiple books in a series, etc.) it's the kind of thing that would lend itself well to a graph database in modeling, as long as the stack can also be performant and scalable enough for end users.

As a web dev "in the wild", we've offloaded the scaling problem to a vendor by choosing to use a headless CMS (GraphCMS, DatoCMS, Contentful, Prismic, etc.; there are many). Some of those use graph databases while others don't, but at the end of the day we don't really care as long we don't hit their complexity limits. They play DB admin, we get GraphQL or traditional REST APIs, and we can build a frontend on top of that.


👤 ecesena
Lyft open sourced Cartography [1] which I think is a really useful and concrete example of using a graph db for security analysis.

[1] https://github.com/lyft/cartography