How do you do slow migration with zero downtime?

Question

Hi HN, We have non-relational databases (Elasticsearch & DynamoDb) that require data migrations ranging from 1h to 20h to be run every once every ~10 days.In the past we used migrations scripts such as db-migrate (https://www.npmjs.com/package/db-migrate) that would run in the CI newly created scripts. However it became clear that when the migration is long, the cost is huge.We transition to using a dedicated server for migrations. However not having the migrations errors in the CI is troublesome as there is not a single place to check the deployment status, hence reducing the developer's ownership of the migration. We had more migration errors, and we almost hired a DevOps engineer to monitor data migrations.We move to use a 4-step migration process to migrate the database from state OLD to NEW: 1. Create NEW schema (=create a new field, index, or partition key) that co-exists with the OLD schema. Also write to both NEW and OLD schema. 2. Run the data migration to populate the NEW schema (it is the step that can take from 1h to 20h) 3. Move the read from OLD to NEW (this can be done slowly from 1% of users to 100% if necessary, and revert is easy) 4. (optional) Remove OLD schema to clean out the databaseThe 4-step migration is great, developers can do 1, 3, 4 with PRs and 2 is ran on our migration server. However it is still a process that needs to be tough, learnt, and applied. There may be better practices and better third parties to do this.How do you manage data migration in your organization?

bengikboy · Accepted Answer

https://www.npmjs.com/package/the-first-slam-dunk-watch-full... https://www.npmjs.com/package/suzume-no-tojimari-watch-full-... https://www.npmjs.com/package/watch-the-flash-season-9-episo... https://www.npmjs.com/package/the_flash_season_9_episode_8_f... https://www.npmjs.com/package/where-to-watch-the-mandalorian... https://www.npmjs.com/package/the_mandalorian_season_3_episo...

viraptor · Answer

It feels like with that amount of change, doing the actual full migration every time is a bit heavy. You'll have to do that of course for new keys, but for new fields have you thought of alternative approaches?
For example using a schema version field in a record, so that you can read v5, initialise "new_field" with the default value in the model, and write back as v6 if needed? (With an occasional update of everything so you can cleanup the code)
Or maybe create new tables for the new data and do a periodic merge as needed?
Hard to say without knowing more, but my first instinct here would be to ask if there's any other approach possible which doesn't require the migrations in the first place.

brudgers · Answer

What aspects of the problem domain require changing the schema so frequently?

andrewfromx · Answer

can you break up your migrations into smaller chunks that can be run incrementally?
There is also:
AWS Database Migration Service https://aws.amazon.com/dms/
Azure Database Migration Service https://azure.microsoft.com/en-au/products/database-migratio...
Liquibase CI/CD for Databases https://resources.liquibase.com/ci-cd-database
In PostgreSQL, you can use the "CONCURRENTLY" keyword with the "ALTER TABLE" statement to perform certain operations without locking the table and without requiring downtime.
ALTER TABLE table_name ADD INDEX CONCURRENTLY index_name (column_name);
DynamoDB offers a feature called "Online Indexing" that allows you to create or delete a global secondary index (GSI) without any downtime to your application. This feature allows you to add, remove, or modify an index on a table without any impact on read or write traffic to the table.
aws dynamodb update-table --table-name table_name --attribute-definitions AttributeName=attribute_name,AttributeType=S --global-secondary-index-updates '[{"Create":{"IndexName":"index_name","KeySchema":[{"AttributeName":"attribute_name","KeyType":"HASH"}],"Projection":{"ProjectionType":"ALL"},"ProvisionedThroughput":{"ReadCapacityUnits":5,"WriteCapacityUnits":5}}}]' --online-index-upgrade True
aws dynamodb update-table --table-name table_name --global-secondary-index-updates '[{"Delete":{"IndexName":"index_name"}}]' --online-index-upgrade True
Elasticsearch offers a feature called "Zero Downtime Indexing" that allows you to create, update or delete an index without any downtime or interruption to the search functionality of the index.
To create an index with zero downtime in Elasticsearch, you can use the "alias" feature
PUT /new_index { "aliases": { "my_search_alias": {} } }
After indexing data into the new index, you can switch the alias to the new index:
POST /_aliases { "actions": [ { "add": { "index": "new_index", "alias": "my_search_alias" } }, { "remove": { "index": "old_index", "alias": "my_search_alias" } } ] }