HACKER Q&A
📣 oldPete

What is a good configuration format now a days?


first some background: I created an alert definition generator script in bash, that has a variable defined like:

## TEAM DEV_threhold STAGING_threshold DEMO_threshold PROD_threshold ALERT_TYPE DB_USER

## ALERT_TYPE=< warning | alert | custom> (based on team notification routing configuration)

read -r -d '' ALERT_PARAMS << EOM

team-1 1 2 3 4 warning db-user1

team-2 1 2 3 4 warning db-user2

# comment lines

EOM

which I then use in same script to populate a yml file (for alertmanager)

echo "$ALERT_PARAMS" \

| grep -v '^#' \

| while read -r vars... ; do generate_definition; done > alerts.yml

Development teams are expected to add, update or delete lines for ALERT_PARAMS, and nothing else.

I chose "team-1 1 2 3 4 warning db-user1" format since this structure seems easy to understand and update by (multi year experienced) java, go and python developers in vim, atom, git{hub,lab} or whichever fancy editors developers use now a days. This format also means I can perform validations and sanity checks before/ inside/ after generate_definition step with only the bash shell inside a CI pipeline, without adding any more tool dependency or write relatively complex program. I value simplicity.

One engineer, with whom I checked first, believes (rather strongly) this format is not easy to understand and should be replaced with JSON or YAML because:

- developers don't use editors like Vim.

- developers don't have more than 5 seconds to understand the format, and high chances are devs wont understand or make mistakes in updates.

This seemed to me a bit odd feedback, though felt very genuine, so here I am seeking your input on:

- If and how, JSON and YAML (or similar formats) are better suited for simple use-cases like this?

- What might want a developer to not spend more than 5 seconds to understand how application's database related alerts are generated?

To be clear I dont mind using what ever format majority uses, as long as we have everyone using the same format, I just want to understand what makes the above format harder to comprehend than YAML or JSON, and why spending more than 5 seconds, learning about databases, is such a big ask (co-incidentally after a two week long database related incident)?

How would you move forward if you were in my shoes?


  👤 thristian Accepted Answer ✓
"One line per record, whitespace-delimited fields" is about the physically simplest file-format one can imagine (see: /etc/crontab, /etc/fstab), and that simplicity will keep implementation cost and complexity low for the generation script you write.

However, this generation script isn't a complete tool in and of itself; from your description it generates a YAML document which presumably is fed into some downstream system, while upstream you need to accept team names and DB user names among other things. I don't know your tech stack, but I can imagine that there's probably developers who work both upstream and downstream of this file format, with tools in different languages, different transports, different data stores. While it would be possible for every contact-point between different tools and technologies to use the simplest possible format for that interaction, that would generate an explosion of different micro-formats - some without quoting, some with URL encoding or base64 or backslash-escaping or JSON or YAML or ProtoBuf etc. etc. etc.

Instead, to reduce overall system complexity, it can be prudent to stick to one file-format everywhere. Sure, it'll be overkill in some places and nearly-inadequate in others, but the effort spent can pay for itself when everybody touching a file, a parser, or a serialiser knows what to expect and can jump between layers of the tech stack without having to re-learn the quirks of each layer every time. You even leave the door open for automated tools that scan all the config files from all the parts of the system and sanity-check them against each other.

There's an upper-limit too, of course. We don't use JSON encoding for web-pages and images and videos and databases and filesystems, because the benefits of a single standard encoding are outweighed by the drawbacks of JSON for those specific use-cases. But there's a wide span between "JSON is overkill" and "JSON is too primitive" and you can get a lot done in that space.


👤 ThePhysicist
I mostly use YAML as it's easy to read, write and parse. For Golang applications I validate and parse it into a typed data structure, so that no further conversions or type checks are required at runtime. Works really really well. YAML seems to become the standard for configuration files in many areas because it is so easy to read and write.

👤 yuppie_scum
Cloud native/12 factor thinking, I’d encourage you to consider a format you can manage with a kube ConfigMap and override with environment variables.

I agree that a key-value map (one config parameter, one value per line) is very easily grokked. And should be able to fit into the above cloud native paradigm.


👤 zufallsheld
Key-value pairs could be easier to read than your definition:

Team=foo Dev_threshold=1 Stage_threshold=2

And so on. Should be equally easy to parse but with the added benefit that one always knows what number is used for which threshold.



👤 elchief
For me, a good format has

1. Variable substitution 2. Easy overriding 3. Split between project, user,global config 4. Works across languages

So, for now the best is dotenv