What should I keep in mind when working with bitemporal systems?
They are not SQL but Datomic[1] and Crux[2] are first class bitemporal databases, look there for design guidelines and commentary.
There is some other project crux that has a lot of python and this pollutes the Google searches.
I think many projects try to re-implement in a ad-hoc way, maybe without knowing it, so if you can start with a bitemporal database it gives a big head start.
As noted above, the SQL:2011 standard supports bitemporality; here's a good summary from SIGMOD Record: https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfea...
This is a good summary of vendor support for temporal tables: https://www2.cs.arizona.edu/~rts/sql3.html
Oracle and DB2 both support bitemporal tables, SQL Server only supports system-versioned tables. There is an extension for Postgres that supports system-versioned tables, but it hasn't been updated in 2 years: https://github.com/arkhipov/temporal_tables
In the past, I've manually implemented system-versioning for audit purposes in SQL Server using triggers, T4 templates and the CONTEXT_INFO variable to store user IDs. It worked OK, but schema changes became much more difficult.
Also - start thinking about archiving now - n lg n is fast at the start - two years later, not so much.
Good luck
- changing dimensions - handling changing types e.g. A column of strings becoming a compound of floats - efficiently handling appends - efficiently handling corrections of previous data - how to cleanup, remove old versions
The version store storage engine in Arctic provides fast and efficient versioning for numeric data. The code is quite readable so it might be worth having a look at its implementation too.
The ORM has built-in support for various time-series data concepts including bitemporality.
If you just need to get data at a point in time, you only need to store (so called) application-time. If you need to handle both a “created date” and “effective date” type scenario, then you’ll need bitemporality.
I've come up with a solution specifically for pandas dataframes. Not sure how similar a solution would be for your use case.
What are some good resources?
A spreadsheet is a bitemporal database of sorts right?