HACKER Q&A
📣 markus_zhang

How do you manage dev db environment?


Hi db experts,

I'm curious how do you manage dev environments? Here is some background:

- We use Vertica for prod/dev. Data team members share the same dev database.

- We need to import data from prod to dev for testing frequently. At least a few times per Sprint per developer.

- We don't control the databases. DBAs from HQ control them and the response time is from a few days to infinity.

Here are the pain points:

- If two or more developers are developing two different features for the same table, we have to make sure that neither breaks the table.

- It's slow to import data from prod to dev. We tried vsql (the command line tool) and pandas and neither works great.


  👤 PaulHoule Accepted Answer ✓
Alternatives are hard but it is not a good practice for devs to have a copy of the prod database. It is also bad for devs to share the test database because they will always be trashing it for each other or be afraid of trashing it.

For instance you don't want personal or commercially valuable information to walk out the door.

The cost in terms of storage, CPU and devs waiting for tasks to finish against it is vast.

Really devs should have a small test database to work with that exercises everything important. If a dev personally can use 10 copies of this database they can afford to have it and they can have it now.

The full database is nice to have if you want to debug specific problems in the full system, debug performance problems at scale, do a dress rehearsal of an upgrade, etc. You still need it for some things but if every dev uses it every day you are in trouble.

If you don't know how to make a small database that works then don't know how your system works and making any kind of change to the system is risky and expensive.


👤 softwaredoug
We have a sandbox dev cloud environment and a way to sample a subset of prod data into this dev sandbox. The sandbox dbs are namespaced by the devs username and their branch, so you don’t have conflicts. Devs control how they sample prod to recreate the needed behavior. And we can test on each other scenarios as needed.

The key here is “sample” from prod to dev, otherwise it’s far too large to be workable.


👤 jaredsohn
Have a centralized staging database with fake data that engineers use but don't break. Make a local copy of that database and do what you want with it. If you break things, grab another copy from staging.

You can replicate many production issues via tests.


👤 markus_zhang
Sorry forgot to mention that the whole db is a DWH, not a OLTP thing.