Strategies for deploying database migrations
Modifying a database's structure is often known as "migrating" to a new schema. While the operations used to modify the structure itself are usually relatively straightforward, care and planning is important to ensure that the data being managed remains accessible, consistent, and semantically correct.
In this guide, we'll cover some of the strategies that teams can use to update their database schemas and related codebases and discuss how well each option addresses potential problems. We'll look at some general purpose application deployment patterns as well as a few options specifically designed to address database-specific scenarios.
There are a number of potential issues that can arise when deploying changes to environments with databases. Some of these problems relate to alignment between the client codebase and the database structure while others arise from the impact of trying to update existing data.
When changing the actual structure of the database, existing data frequently needs to be modified to conform to the new schema. In some scenarios, this is relatively simple. For instance, if you need to add a new table that isn't related to the existing data, nothing stored in the database will need to be modified.
In other cases, like splitting or combining columns, you must define custom data transformations to specify how the existing data should be altered to populate the new context. For structures that handle very large amounts of data, this migration process may take a significant amount of time.
Depending on your deployment strategy, making changes reversible can be difficult as well. Once data structures are updated and being populated by live code, reverting to a previous version can potentially lead to data loss.
This makes "rolling forward" in case of a problem much more attractive than "rolling back" since the data can be accounted for in the updated code even if the behavior itself is reverted.
Another challenge involved with database changes is testing. It can be difficult to understand the best way to test schema changes in a way that captures edge cases and ensures the validity of the new data format. Real world data often pushes against the boundaries of constraints in unexpected ways, so it is important to have a good handle on the entire range of potential values to write tests that account for them.
Schema changes can also affect other parts of the database that you'll want to test, like stored procedures, triggers, and other components. These need to be tested with each schema change to ensure that they still work as intended.
Lastly, schema changes can be difficult because they can have huge impacts on performance and availability. This category of problems can be difficult to emulate in test environments where the dataset, request load, and access patterns may not mirror production values.
Changes that may be technically valid may impose an unacceptable performance cost that might be difficult to deduce in staging environments. If your deployment process has the chance to affect availability, that is an even larger concern.
With the issues mentioned above in mind, how do you decide on the best way to migrate to new schemas? There are a number of approaches to consider depending on your requirements, priorities, and application environment. In some cases, using a combination of strategies can help protect you against a wider range of potential problems.
One of the oldest and least complex strategies for implementing schema migrations is to simply take your database offline for the duration of your migration and consider the downtime an acceptable cost. For perhaps obvious reasons, this approach is unsuitable for many use cases, as uptime and availability are commonly among the highest priority goals of many organizations.
Still, performing schema changes and data mutations offline is a valid strategy that can be useful in some cases. There are many advantages to this approach if it is an option:
- Schema changes can be implemented in coordination with client code changes in a single step.
- Changes to schemas and stored data can be inspected and tested without regard to their performance impact on running processes.
- There is no "transition" period where conditional code paths or multiple variants of the same data structures lead to temporary surges in complexity.
- Minimal infrastructure and systems are required to implement this approach.
- Large changes may be easier to incorporate than in some other systems.
The disadvantages, however, cannot be ignored:
- Loss of availability can have a huge impact on SLAs, revenue, reputation, and other important measures.
- Unforeseen problems during deployment are more impactful if working within a constrained "maintenance" window as they have a direct impact on the time until availability is reestablished.
- Downstream services will also be disrupted, leading to cascading downtime for any dependent software.
- Planned maintenance tends to be an "all hands on deck" event, which can be challenging, especially if it is your primary method of deploying changes.
The process of deploying database schema changes during downtime is rather straightforward in practice. The application or component generating user-facing responses should ideally warn about any scheduled downtime well in advance of the maintenance block. This can help users and downstream services make decisions about their own needs.
Prior to the maintenance window, a plan or checklist should be designed to define the exact operations that must be performed. The deployment itself should be scripted and automated as much as possible to reduce human error and perform the required actions as quickly as possible. All required assets and personnel should be on hand before the service is brought down.
During the maintenance window, the application should update its responses to indicate that planned maintenance is occurring as well as any estimated time frame for the service to be available again. The tested change procedures should be applied to the production environment and afterwards, the new code and data schemas should be inspected and tested.
When the deployment is complete, the application can be restarted and begin serving requests using the new code and schema.
Blue / green deployments is another strategy that is often used to deploy new code in application contexts. It can be used for database schema changes to a certain extent, but does have some notable shortcomings.
Blue / green deployments is an approach that involves setting up two identical sets of infrastructure for your database clients, together representing double the resources required to run production traffic.
One set of infrastructure serves the current production traffic. The other set of infrastructure is used to set up the next release. When everything is ready, the load balancer or other traffic director routing client requests switches traffic from the first set to the second set to introduce the new changes. If a problem occurs, the traffic can be switched back to the original infrastructure. If everything goes well, the original, now unused infrastructure becomes the target for staging the next deployment.
Blue / green deployments are attractive because they allow you to deploy changes to production-ready infrastructure without impacting the current production environment. By decoupling the deployment procedure from the "release" of the change, developers can test their changes on the infrastructure where it will actually run without downtime. Releasing new code and changes through a switch mechanism lets you revert changes easily.
While blue / green deployments are helpful in many scenarios, applying them with schema changes can be challenging. When changing only application code (without data-related changes), reverting problematic code is as simple as directing traffic back to the original set of infrastructure. However, when data schemas change, incompatibility is possible. Reverting to the previous infrastructure may result in data loss as schema structures are removed, etc.
When using blue / green deployments with schema changes, one way to avoid these issues is to structure your deployments using the expand and contract pattern (discussed later).
Feature flags are design pattern in software development that allow developers to modify the control flow of an application during runtime based on values set outside of the application. The application checks the current value of a well-known external location when it comes to a certain code path. The value tells the application whether or not to execute a certain code path or which path to choose among many.
Rather than being a deployment strategy itself, feature flags are a technique that can make implementing other strategies easier by allowing you to decouple the deployment of new functionality from the activation of that functionality. The application can continue to run the same as it originally had until it sees a different value when it checks for the flag. It can then immediately switch over to use the new functionality.
In terms of introducing schema changes, feature flags are particularly valuable because your application can be designed to interact with multiple iterations of the same schema. The feature flag can be set up to indicate the version of the schema currently deployed and thus select the code that has been designed to interact with it.
The disadvantages to using feature flags are often small, but should be considered. Additional infrastructure may be required to store your flag values if an appropriate key / value store is not already available. Additionally, it's possible for feature flags to increase the complexity of your code for the duration of their use. Cleaning up and simplifying code paths once the feature flag is obsolete can help keep the extra conditional logic at a minimum.
Another strategy that can be used in combination with other approaches is canary releases. Canary releasing is a deployment strategy that simply means that changes are introduced on a single or a small number of clients first before deploying to the rest of your infrastructure.
This allows you to catch problems early that you did not see during your previous testing. The subset of clients that are running the new code work as an indicator for how well the code will function on the remaining systems and allow you to gradually roll over additional systems as you gain confidence in the stability and functionality of the changes.
In terms of database schema changes, canary releases allow you validate that schema changes and their related client code are production appropriate by reducing the risk of introducing changes. Rather than affecting all clients, a small portion is used to evaluate the change. This gives you an opportunity to roll back early if the change has unforeseen consequences and to view performance using a portion of your real world production traffic. Canary releases help you minimize the impact of your code changes, which can help your schema changes go more smoothly.
Perhaps the best method of introducing schema changes is the expand and contract pattern. The expand and contract pattern allows you to introduce schema changes alongside the original schema, migrate old data to the new structure, and gradually move production traffic to the new structure in a series of planned stages. It can be used in conjunction with many of the strategies and techniques we discussed earlier to introduce changes with multiple layers of safety in case problems occur.
The expand and contract pattern can be implemented by walking through the following steps:
- Design and deploy the desired schema alongside the original schema.
- Modify the client code to write changes to both schemas simultaneously.
- Migrate existing data from the original schema to the new schema, modifying it as necessary to conform to the new structure.
- Test the new schema to ensure that it is functionally correct and that the data has been transferred correctly.
- Modify the client code to begin reading data from the new schema.
- Modify the client code to stop writing to the original schema.
- Remove the original schema.
By walking through the stages outlined above, your schema changes are spread out over a longer period of time. This, however, allows you to gradually change over your application code in production to deal with the changes in the schema.
One of the major advantages of this strategy is the decoupling of reading from and writing to the new data schema. This approach means that the application uses the new schema well before the new schema impacts any client-facing responses and the client code actively takes part in writing new data to the schema while old data can be backported in the background.
You can read about this approach in much greater detail in the guide on using the expand and contract pattern for schema changes.
Often, the best approach to deploying schema changes is to combine multiple techniques. To walk through an example that relates to schema changes, imagine that you are attempting to introduce a schema change that modifies a
names table to combine the original
last_name columns into a single
You've deployed your new database schema alongside your existing schema as outlined in the first stage of the expand and contract pattern. Now, you have two tables with largely the same structure:
names, which is the original structure containing all of the current data, and
new_names, the new, empty table representing the desired structure.
Next, you want to modify your application code to write to your new structure as well as your old structure. To accomplish this, you introduce a new piece of logic to your application that checks the value of
DATABASE_NAMES_TABLE_WRITE in your organization's Redis instance. The value is a list of tables to write to when modifying the
You also know that eventually you'll want to transition your reads from the old schema to the new schema. To address this, you also include a gate that checks Redis for the value of the
DATABASE_NAMES_TABLE_READ variable to determine which structure to read from.
You set the values in your Redis instance to use your original data schema:
rpush DATABASE_NAMES_TABLE_WRITE namesset DATABASE_NAMES_TABLE_READ names
Next, you deploy the new code, that includes the Redis check as a gate for determining where to read and write, to your client.
When you want to make your client write to both data structures (as you do in step 2 of the expand and contract pattern), you can update the
DATABASE_NAMES_TABLE_WRITE list in Redis to include the new table name:
rpush DATABASE_NAMES_TABLE_WRITE new_names
DATABASE_NAMES_TABLE_WRITE list will now have two values:
lrange DATABASE_NAMES_TABLE_WRITE 0 -1
1) names2) new_names
If your feature flag code uses these values to determine where to write, it will now write to both tables.
Now that your application is writing to both locations, you begin to migrate data in the background. Because this is a rather straightforward change, you can fill in the new schema by reading the
last_name values from the
names table, joining them with a space, and writing the resulting string to the
full_name column in the
Now the new table is populated and has all of the current data. You can perform any further tests at this point to ensure that it operates correctly and can substitute for the original table.
When you are ready to transition the reads from the old structure to the new structure, you can overwrite
DATABASE_NAMES_TABLE_READ variable with the new table name:
set DATABASE_NAMES_TABLE_READ new_names
The next time the client application checks the value before executing a read operation, it will receive the new value and read from the new structure.
Once you've confirmed that everything is working correctly, you can then update the
DATABASE_NAMES_TABLE_WRITE list to remove the name of the original table:
lrem DATABASE_NAMES_TABLE_WRITE 0 "names"
The next write operation in the client will trigger a lookup and receive the new value containing only
You can now safely delete the original
names table and remove the feature flag scaffolding that requires clients to check on where to read from and write to. During this process, you will probably want to rename the
new_names table to
names again to complete the schema change.
While there are many deployment and migration strategies you can use to implement schema changes, the simplest approaches often suffer from some notable drawbacks. By understanding the effects of different migration strategies and being aware of your own organizational requirements and expertise, you can develop a migration process that will minimize downtime and allow you to safely test changes.