HACKER Q&A
📣 long_time_gone

How often do you create fake data sets?


Trying to understand how often folks have to create fake or dummy data for product development, demos, proofs-of-concept, consulting, or other purposes.

What tools do you use?


  👤 nanis Accepted Answer ✓
If it were easier to do the thing I want, I'd do it all the time. What I want is to be able to point to either a database with empty tables or an SQL file and fill in the tables correctly with bogus data while respecting all the constraints. I have not found a tool that does that.

It is trivial to write something where a tool takes a spec written in a way the tool understands (but possibly nothing else) and then spew out data without having to worry about, e.g., foreign key relationships.

If I have a choice, I wire together something Perl that does what I want.

Sometimes, I do not have a choice and have to use something like https://www.mockaroo.com/ or https://generatedata.com/ but then I have fiddle with stuff.

I am fond of David Golden's https://metacpan.org/pod/Data::Fake


👤 Jugurtha
Not often. The last time I had to do it was four years ago for two projects: financial data derived from real data with consent of the owner for one project for demos (consulting), and fitness tracker data for testing (wrote a sort of simulator that generated data from several mock trackers with different timestamps and payloads).

👤 beardyw
For discreet tests it's best to keep it small and do it by hand. Bulk data in my experience is problematic - if multiple testing relies on it it quickly becomes frozen due to potential conflicts. And then it needs updating as the product changes.