HACKER Q&A
📣 mattrighetti

What DBMS would you recommend for read-intensive geospatial queries?


I want to develop a game that will let user pin objects to the world map with a timeout value after which that object will not be visible anymore (user can also set an infinite timeout that will leave the object visible on the map forever). Each user that is playing will see these objects coming and going, particularly they will see objects that are located on the visible map.

Right now the only query that I would like to serve FAST is (I don't care about super precise results): "Given a rectangle of the user's map visible area, return all the objects that are inside it"

OR something similar: "Given a user location, return all the objects that are within a certain radius specified by the user"

It's the first time I am dealing with geospatial data and geospatial indexes. I have seen that Elasticsearch is a good starting point for giant data and little queries. I have also taken a look at PostGIS which I liked a lot and seemed to be a good DBMS for the geospatial task above. What I am concerned about right now is scaling this service (which probably is not the priority right now but I want to be prepared in case this is going to work).

I've never tried to benchmark these kind of queries but this seems to be a very read-intensive task, I imagine that 500 queries per seconds is going to be very hard to keep up with, but this are just my unproven thoughts. Also keep in mind that I will probably need to remove rows that have their timeout value expired (maybe every 10-5 minutes?) so triggers will need to be fired over the giant database.

Again, I have no experience with this and this is probably a very bad implementation of the game, but I am here to ask you for advice on this.

What DBMS would you recommend for this task? What kind of optimizations would you apply to this problem?


  👤 toast0 Accepted Answer ✓
By all means, consider PostGIS. If it works, great. If not, consider it as the source of truth and then an export to Elastic/some other search engine for searching.

One strategy is to make the DB the source of truth, and only put geo info and an id and maybe expiration time into the search engine. Partitioning into several indexes will help with your expiration issue. Have one index for 'permanent' objects, then one index for objects that expire at X:00, another at X:01, X:02, etc. Query multiple indexes and join on the client, then query the database. After the expiration window, you can just drop the index (plus whatever on the source of truth). If you can partition similarly on the source of truth db, that helps there too.

Alternatively, you may want to consider just using in memory structures, plus something to persist permanent objects.

Break up the map into reasonable sized rectangles, that can have small lists in them. Take the user's viewable box and determine which of your storage rectangles match, then go through all the lists and refilter.

If your boxes are all the same size, it's easy to find the matching ones, but popular boxes may have too many objects for performance. OTOH, resizing boxes sounds complex.


👤 flashm
PostGIS would get you up and running, but is not particularly quick unless you have a budget and the knowledge to tune it. It is however probably the most mature, and will be able to do anything you need.

If you need read speed, check out Clickhouse, or something in memory, like Tile38. Clickhouse in my experience is at least 50-100x faster for a point in polygon than Postgis out of the box.


👤 vgeek
I've used Postgis for read only data with indexes on the spatial data, maybe 25mm rows with multiple joins and on cheap hardware it can easily return results in under 100ms. Nowhere near 500qps, though. Depending on your application, st_within vs st_intersects may have drastically different performance, too.

👤 pwg
Postgresql is the database behind OpenStreetMap.

Unless you plan to scale beyond the load that OpenStreetMap handles today, going with Postgresql seems a good choice. And from what I understand (never used this component myself) Postgresql's geospatial support is excellent.


👤 karterk
Typesense offers fast, in-memory geosearch: https://github.com/typesense/typesense