I'm curious how do you manage dev environments? Here is some background:
- We use Vertica for prod/dev. Data team members share the same dev database.
- We need to import data from prod to dev for testing frequently. At least a few times per Sprint per developer.
- We don't control the databases. DBAs from HQ control them and the response time is from a few days to infinity.
Here are the pain points:
- If two or more developers are developing two different features for the same table, we have to make sure that neither breaks the table.
- It's slow to import data from prod to dev. We tried vsql (the command line tool) and pandas and neither works great.
For instance you don't want personal or commercially valuable information to walk out the door.
The cost in terms of storage, CPU and devs waiting for tasks to finish against it is vast.
Really devs should have a small test database to work with that exercises everything important. If a dev personally can use 10 copies of this database they can afford to have it and they can have it now.
The full database is nice to have if you want to debug specific problems in the full system, debug performance problems at scale, do a dress rehearsal of an upgrade, etc. You still need it for some things but if every dev uses it every day you are in trouble.
If you don't know how to make a small database that works then don't know how your system works and making any kind of change to the system is risky and expensive.
The key here is “sample” from prod to dev, otherwise it’s far too large to be workable.
You can replicate many production issues via tests.