We’re building a benchmark tool in C to compare the speed of DML statements in assorted RDBMS vanilla installations (Postgres, MySQL, etc.).
We’re not doing anything fancy / scientific - it’s purely for internal learning purposes.
Are there any particular DML statements that we should include in our benchmark?
We’d welcome any tips or insights on this topic.
(As an aside, we plan to use clock_t for tracking CPU time, with long-running loops that span several hundred seconds). But of course we welcome advice on this too!
Thanks ^_^
If clock_t is used, normally results will show the database is "blazing fast" because it does not count the time spent waiting for the hard drive to actually save the data.
You may need to look into CLOCK_MONOTONIC. clock_gettime(CLOCK_MONOTONIC, ...) will measure "Wall Clock" time.
Window functions (tests optimizer): SELECT , ROW_NUMBER() OVER (PARTITION BY category ORDER BY date DESC) FROM large_table
Large aggregations (tests hash tables): SELECT category, COUNT(), AVG(value), PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY value) FROM large_table GROUP BY category
Self-joins (tests join algorithms): SELECT a., b. FROM table a JOIN table b ON a.foreign_key = b.id WHERE a.date BETWEEN '2024-01-01' AND '2024-12-31';
String operations (often overlooked): SELECT * FROM table WHERE column LIKE '%pattern%';
Also test with varying data sizes (1K, 100K, 10M rows). Some engines optimize differently at scale.