HACKER Q&A
📣 QueensGambit

How developer friendly is APIs with cursor based pagination?


I see companies like Shopify deprecate their existing pagination in favor of cursor based pagination. As a developer, I always hated cursor based pagination because it restricts me to serially move from one page to another. Is this developer friendly? Why do companies migrate to cursor based pagination?


  👤 kamikaz1k Accepted Answer ✓
My company is actually going through the transition to cursor/keyset pagination. As a SaaS company starts having to support larger customers, they will likely have to go through this exercise. The reason is because most pagination is usually implemented with LIMIT/OFFSET in a SQL query. Using OFFSET becomes very expensive for pagination queries because the query has gotta do a full scan till it reaches the desired offset.

Imagine a table with a 1 million rows, and you return pages in 100 item chunks. To get page 500, using the OFFSET method, you would have to scan through 100*499 records before you even get to the first record you care about.

This can lead to all sorts of cascading problems because slow queries can cause back pressure build up from other queries queuing up. So in order to scale, services push the complexity out to the client. Conceivably, you can cache the indices on the client side, and abstract a page-number based paginator on top of it.

Here's an article talking through the issue: https://www.moesif.com/blog/technical/api-design/REST-API-De...


👤 slap_shot
I run a company that integrates data from hundreds of sources, including Shopify. There are only about 5 simple mistakes that most data extraction APIs mess up, and this is one of them.

Very few APIs that implement pagination work optimally. If I query for all orders whose updated_at is greater than 2019-10-23 07:00:00, and paginate through the results, there's a good chance that any record updated before my pagination completes will be missed by the paginated queries. If I "checkpoint" the greatest updated_at retrieved in the most recent query, I will likely miss the records updated after my query started but before my query completed. Leaving me to use the start time that I began retrieving data as my new checkpoint.

With a cursor based pagination system, there is at least a chance the service that I'm calling to is dynamically adjusting their underlying query to account for this scenario.

Out of curiosity, what are the scenarios when you want to jump between pages (e.g. not iterate over the pages in a serial fashion)?


👤 fiedzia
Because abstraction of pages is to expensive to maintain for larger datasets, and very confusing for data that is dynamically generated.

👤 nyuszika7h
I think cursors make sense because they ensure the data doesn't change unexpectedly under you while you're paginating.