HACKER Q&A
📣 herodoturtle

Good SQL statements to benchmark RDBMS?


Hi folks,

We’re building a benchmark tool in C to compare the speed of DML statements in assorted RDBMS vanilla installations (Postgres, MySQL, etc.).

We’re not doing anything fancy / scientific - it’s purely for internal learning purposes.

Are there any particular DML statements that we should include in our benchmark?

We’d welcome any tips or insights on this topic.

(As an aside, we plan to use clock_t for tracking CPU time, with long-running loops that span several hundred seconds). But of course we welcome advice on this too!

Thanks ^_^


  👤 goh-chunlin Accepted Answer ✓
clock_t measures CPU time (the time the processor spent on your specific process). In database benchmarking, the CPU is often idling while waiting for Disk I/O or Network latency.

If clock_t is used, normally results will show the database is "blazing fast" because it does not count the time spent waiting for the hard drive to actually save the data.

You may need to look into CLOCK_MONOTONIC. clock_gettime(CLOCK_MONOTONIC, ...) will measure "Wall Clock" time.


👤 SRMohitkr
you are really a good task for now i am going to thinking,after will comment.

👤 swashq
A few patterns that stress different parts of the engine:

Window functions (tests optimizer): SELECT , ROW_NUMBER() OVER (PARTITION BY category ORDER BY date DESC) FROM large_table

Large aggregations (tests hash tables): SELECT category, COUNT(), AVG(value), PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY value) FROM large_table GROUP BY category

Self-joins (tests join algorithms): SELECT a., b. FROM table a JOIN table b ON a.foreign_key = b.id WHERE a.date BETWEEN '2024-01-01' AND '2024-12-31';

String operations (often overlooked): SELECT * FROM table WHERE column LIKE '%pattern%';

Also test with varying data sizes (1K, 100K, 10M rows). Some engines optimize differently at scale.