Minimizing Downtime in Database Migrations with AWS DMS

Introduction

Database migrations are rarely straightforward, especially when moving large, business-critical workloads to the cloud. We recently supported one of our customers with the heterogeneous migration of several on-premises Oracle databases to AWS Aurora PostgreSQL. More details about the overall migration project are available in this case study. For the data transfer, we relied on AWS Database Migration Service (DMS). Some of the source databases were close to 1 TB in size, and several contained very large tables with tens or even hundreds of millions of rows, including CLOB and BLOB columns.

A straightforward, full-dump migration of such databases would have resulted in excessive downtime – potentially lasting several days – which was unacceptable for production systems. AWS DMS offers Change Data Capture (CDC) to replicate ongoing changes from the source database and thereby minimize downtime. However, CDC requires elevated privileges on the on-premises Oracle instance, and the database owners did not allow this due to security policies. As a result, we needed to explore alternative strategies to reduce the migration duration. The strategies we applied are outlined below.

Parallel Processing

One of the most effective ways to shorten migration time is to parallelize the workload. AWS DMS provides several mechanisms to split and distribute the migration effort, each suitable for different scenarios.

Multi-Table Tasks

The most straightforward approach is to configure a single DMS task to process multiple tables at once. This is relatively easy to set up: each subtask can migrate one or more tables, and wildcards or filters can be used when defining table selection rules. Within a subtask, the tables are processed sequentially, but because several subtasks can run simultaneously under the same task, this configuration allows broader parallelism across tables.

Parallel-Load for Large Tables

Large tables often pose the greatest challenge in terms of migration duration. Frequently, the overall migration time is dictated by the “slowest” table. If a single table takes excessively long to migrate, splitting it into smaller parts can be a decisive improvement. If the source tables are not partitioned, the dataset can be divided manually into non-overlapping subsets. Ensuring that the subsets do not overlap is essential – failure to do so can easily cause data integrity issues. Each subset can then be assigned to a separate subtask, allowing them to be migrated simultaneously. This approach can also be combined with multi-table tasks: some subtasks may handle distinct tables, while others focus on migrating different portions of a single large table.

Multiple DMS Tasks and Replication Instances

The previous approaches are based on a single DMS task and replication instance. However, it is also possible to create multiple tasks, each supported by its own dedicated replication instance. This becomes appropriate when multi-table tasks or parallel-load configurations push a single replication instance to its performance limits. By distributing the workload across multiple instances, overall throughput can be increased. At the same time, this method introduces the risk of overloading the source or target database – an aspect we will revisit in the next section.

Sizing Considerations

The performance of a migration is also heavily influenced by the sizing of both the replication and target instances. It is worth noting that we had no possibility to scale the source Oracle databases. Fortunately, this limitation did not become a bottleneck in our migrations.

Scaling DMS Replication Instances

Both vertical and horizontal scaling of DMS replication instances can be effective. In some cases, simply increasing the size of the replication instance provided sufficient performance improvement. In more demanding scenarios, we combined vertical scaling (larger instance classes) with horizontal scaling (multiple instances working in parallel). This hybrid approach proved essential in the most extreme cases, where neither vertical nor horizontal scaling alone would have been enough.

Scaling the Target RDS Instances

Running parallel DMS tasks across multiple replication instances increases the load not only on the replication layer but also on the target database. In several cases, this required us to provision larger RDS instances for the duration of the migration. For some databases, the increased instance size was only temporary: once the migration was completed, we scaled the RDS instance back down before starting the production workload.

Data Pruning before Migration

In addition to optimizing the migration process itself, reducing the amount of data to be transferred can significantly shorten migration time. Two approaches proved particularly effective in our case.

Deleting Unnecessary Data

Before starting the migration, we reviewed the source databases and removed obsolete or otherwise unnecessary data. This step reduced the total data volume, thereby lowering the amount of information that needed to be processed by AWS DMS and written to the target database. Although this may sound obvious, its impact on overall migration performance can be substantial.

Migrating Static Data Upfront

For some databases, a large portion of the content consisted of historical or otherwise static data that was no longer subject to updates. We migrated such data well before the actual cutover, leaving only the smaller subset of dynamic data for the final migration phase. This significantly reduced the downtime window required for production cutover. To identify and extract the static portions of the data, we applied the same filtering techniques as described for multi-table tasks, making the configuration both consistent and straightforward.

Lessons Learned and Best Practices

Beyond the specific strategies described above, several general practices proved crucial in ensuring smooth migrations.

Preparing the Target Database

It is advisable to drop indexes, foreign key constraints, and other non-trivial constraints from the target database before starting the migration. Foreign key constraints can lead to referential integrity errors during data loading. Indexes and complex constraints tend to slow down the process considerably. Once the migration is complete, the constraints and indexes can be recreated on the target system.

To ensure that no objects were lost during this process, we leveraged our Database Comparison Tool. In addition to comparing data, the tool can also validate schema differences between the source and target databases. This allowed us to confirm that all indexes and constraints were reintroduced correctly after the migration.

Scheduling the Cutover

Since production databases could only tolerate downtime outside of business hours, migrations were typically scheduled overnight or during weekends. This approach minimized the impact on end users while still allowing for the extended migration windows that some databases required.

Importance of Test Runs

For very large databases, it was essential to perform several test migrations in advance. These dry runs allowed us to refine the configuration, tune instance sizing, and validate that the chosen strategies would complete within the acceptable downtime window. Without such preparation, achieving a successful migration during a single night – or even across a single weekend – would not have been feasible.

Conclusion

Migrating large, business-critical Oracle databases to AWS Aurora PostgreSQL without the use of Change Data Capture (CDC) posed significant challenges, particularly in minimizing downtime. By applying a combination of parallel processing techniques, careful instance sizing, and data pruning strategies, we were able to keep migration windows within acceptable limits. Additional best practices – such as dropping and later restoring constraints, scheduling cutovers outside of business hours, and performing multiple test runs – proved equally important in ensuring both efficiency and reliability.

Contact us

It is worth remembering that defining cleanup strategies should be designed together with business stakeholders based on audit requirements. Aim for fully automated solutions whenever it is possible, but be prepared for uncounted exceptions and special cases.

If you are interested in this topic, contact us.

Gabriel Olekšák

DevOps and Cloud Lead

gabriel.oleksak@aardwark.com

Interested