Both Aurora and Azure support Postgre and MySql compatible data. While moving data between Aurora and Azure, the following considerations apply
1. Question - Are the source and target database servers the same type (e.g. PostGre and PostGre) or are they different (e.g. Azure SQL to Aurora PostGre)?
1. If the source and database servers are different, then you need to take into account column type differences and create the proper target database tables and columns.
2. Additionally, you have to take into account the different maximum sizes of columns such as varchar and binary. For example SQL Server supports MAX on varchar while Postgre limits varchar to 65,535 characters. . Data may need to be truncated based on the size of your actual data in varchar and binary columns.
3. Certain column types in one database server may not be present in the other database type.
4. Precision and scale - Certain types like date or decimal have different maximum and minimum values for PostGre compared with SQL Server or MySql. Depending on your source and target, movement of data may result in precision loss.
2. Question - Which tool should I use?
- SSIS, Azure Data Factory are useful under certain situations. a) The source and target servers are the same type b) You will manually create tables and columns in the target c) You are comfortable with the batch / offline nature - that is you submit a job which may take hours or days depending on the size of the database.
- Windocks data mover - If your databases are under 4 or 5 TB, then tools like Windocks automatically create and map the column types for you and move the data on demand (i.e. the data is moved while you wait in a few hours using the full capacity of the system or as much capacity as you configure). This type of tool is suitable when a) you need the data quickly on demand b) you have different database types and you don't want to deal with the column type mapping c) You need the data copied / moved on an ongoing basis
3. Question - What sort of system resources do I need?
That depends on the tool you are using, the size of your database, number of tables and the distribution of row counts among those tables. Generally, the idea is to match the throughputs of network, disk reads, and disk writes. A tool like Windocks data mover that focuses of speed, allows you to configure the parallelism count to optimize to your tables. Begin with optimizing the memory since that will allow reads of larger batches from the source and larger batches written to the target. Use a memory performance tool to track memory usage. Increase the parallelism until memory usage reaches 90 to 95%. Next attempt to increase network bandwidth and see if the disk writes / second improves. Finally upgrade the disk to see if a specific table is copied over faster. Repeat the process until you get the desired performance.
Download a free data mover between Azure and Aurora with a performance of 20 million rows per minute