also have you considered denormalisation? you could solve some of the duplication by putting history tables in the CSV files in the repo and converting them to full version tables when it loads the data into a database.
i think storing the data in the repo is a different problem yeah. not sure what approach i’m gonna take there yet.