• EN
  • Heterogeneous Database Migrations to AWS – Case Study

    Introduction

    As part of a bigger cloud migration project, we were supporting migration of many on-prem Oracle databases to Amazon Aurora PostgreSQL. The migrated databases were owned by application teams and the application teams could choose from two forms of support:

    • Coaching – In this constellation, the application team was responsible for the implementation – we were just coaches providing guidance and support.
    • Full migration – In this constellation, we were responsible for the entire implementation.

    Roughly, the migration of one application included the following activities:

    • Provisioning of the Aurora/PostgreSQL cluster (target database) in the given AWS landing zone
    • Schema conversion
    • Data migration
    • Adjustments of the application configuration like JDBC URL, change of the JDBC driver used
    • Eventual adjustment to application code

    For a single application, we typically migrated two or more databases – one database for the production environment, and one or more database(s) for non-production environment(s). The applications were mostly Java applications using an object-relational mapping framework like Hibernate, but there were also few legacy applications using plain JDBC.

    The migrated databases varied in schema complexity as well as in data volume. The most complex schemas involved hundreds of tables and/or lots of PL/SQL code. The largest databases had up to 1 TB of data.

    We contributed to the migration of approximately 70 databases within 10 months.

    Solution and Tools

    As this was a heterogeneous migration from one database engine (Oracle) to another database engine (Aurora Postgres), and the target databases were running on the AWS platform, we decided to stick to the following toolset:

    • AWS Database Migration Service (DMS)
    • AWS Schema Conversion Tool (SCT)
    • Terraform to provision the DMS resources and the target database cluster

    AWS Schema Conversion Tool

    During an initial phase, we used the AWS Schema Conversion Tool to assess the complexity and size of the particular databases. The outcome of the assessment involves information about the number of schema objects that require human intervention. Constructs like PL/SQL code, database links, database cron jobs etc. are of special interest as they require deeper analysis and manual migration.

    Img. 01 Example of Schema conversion tool output

    We also used the AWS Schema Conversion Tool to generate the skeleton of the target schema DDL. It does an amazing job of converting common data types.

    The schema conversion tool provides a list of warnings where manual conversion is required.

    AWS Database Migration Service

    For the data transfer itself the AWS Database migration service was chosen. As a managed service, it is very easy to set it up, and it supports migrations for many database engines, including heterogeneous migrations where the target database engine is distinct from the source database engine. To a certain extent, it also supports transformations of the migrated data. The following diagram illustrates the main DMS components.

    Img. 02 Main DMS components

    Ongoing Replication vs Full Load

    In AWS Database Migration Service (DMS), the two main approaches for migrating databases are Full Load and Ongoing Replication. Here’s a breakdown of each:

    FeatureFullOngoing
    TimingOne-TimeContinuous after initial load
    Use caseSmall, static DBsLarge, dynamic DBs
    Setup complexitySimpleModerate to high
    DowntimeRequired during loadNear zero
    Data consistencySingle point in timeContinuous synchronisation

    Our decision was to use a full load approach. Here are the reasons:

    • Simple setup
    • Clear data verification point
    • Simple rollback

    Furthermore Ongoing replication approach requires access to Redo logs and this was not allowed in our situation.

    The drawback of this decision was the downtime of the source application during migration. This put additional weight on migration speed improvements.

    Schema and Data Consistency Verification

    Important part of the whole migration is confirmation that data has been migrated correctly. We developed our own database comparison tool providing schema consistency verification as well as extended data consistency verification. The reports generated by the tool were retained as evidence of successful verification.

    Challenges

    The devil is in the details and there were a lot of details to solve. Here we will mention only the major ones.

    Downtime & Performance

    Naive approach to migration of large databases leads to excessive duration of data migration and thus to production downtime which is not acceptable. For such databases, optimization, and performance testing is necessary. Tables with BLOB and/or CLOB columns are usual candidates for optimization, but there were also tables without these data types that required 6+ hours after the optimization.

    To provide a ballpark figure, we were able to migrate a 1 TB database in about 10 hours. The duration of the migration does not depend solely on the data volume – other factors like bandwidth of the network connection between the on-prem data center and the AWS infrastructure play a significant role.

    Complexity

    AWS provides extension packs that emulates source database functions that aren’t supported in the target database. In our case we used an extension pack for source DB Oracle: aws_oracle_ext.

    This helps a lot to migrate platform specific constructs and even simplifies migration of PL/SQL code.

    Application Adjustments

    Legacy applications that do not use Java Persistence API can be time consuming to adapt. We developed a simple source code analyzer in Python that helped us to identify all places in the application source code that might require adjustments. The analyzer is looking for Oracle specific SQL constructs, JPA annotations and schema object names in the source code.

    Garbage Data

    We faced some garbage data in the source databases, even in production environments. For instance, there were some invalid (non UTF-8) characters in a varchar column causing DMS task failures. There was an extremely high numeric value which could not be properly migrated by DMS (it was replaced by the value zero in the target database).


    Conclusion

    Heterogeneous migration of databases into the cloud is a non-trivial task.

    In our project with around 70 databases we encountered various challenges not only with schema complexity but also with overall migration performance for larger databases.

    Endurance, patience, and a systematic approach caused us to finish the project in around 10 months.


    Contact us

    It is worth remembering that defining cleanup strategies should be designed together with business stakeholders based on audit requirements. Aim for fully automated solutions whenever it is possible, but be prepared for uncounted exceptions and special cases.

    If you are interested in this topic, contact us.

    Gabriel Olekšák

    DevOps and Cloud Lead
    gabriel.oleksak@aardwark.com