Synthetic data with migration

Synthetic data

Windocks generates synthetic data for whole databases, subsets, and tables, and includes cross platform and cloud migration.   Synthetic reflects the data distribution of a source, with subsetting included as source databases often benefit from being "slimmed down."   
 
Synthetic data that models a source database is preferred for data privacy, where masked or obfuscation of data is subject to linkage attacks.  Synthetic data is also preferred for ML and AI initiatives, as synthetic data modeling enhances data accuracy and utility.  
 
Support includes SQL Server, Postgre, MySQL, Snowflake, Azure MI and Azure SQL, Aurora, AWS RDS, and Oracle and DB2.  
 
Our customers have growing cross platform and cloud data service needs.  So, Windocks Synthetic includes automated database migration between any of the supported platforms.  You'll find Windocks database migration a breeze, as the data type mapping is automated, and we realize 3-5 GB/minute even on slow networks!  We also ensure schema changes are detected automatically, to ensure your Synthetic data is always of high quality.  
 
 
 
 
 
synthetic-data-banner2

DevOps with Subset and Synthetic Data

Delivering data for a repeatable, agile process for dev/test is challenging due to lengthy production database restores, and growing concerns for personally identifiable information (PII).   

With Windocks data scientists, developers, and test teams work can work with right-sized data sets, ranging from writable production database images to database subsets and tables, optionally populated with synthetic data.  

Windocks automates the creation of a high fidelity subset and synthetic data:

  • Simply provide a percentage size of source database, and Windocks delivers a subsetted database in minutes.
  • Model the source data distribution using built-in synthetic data generation from Windocks, or open source or in-house libraries.  
  • Validate the data distribution and privacy with built-in graphical distributions and metrics, or your custom code
  • Deploy synthetic data sets to target database instances or Docker containers for immediate access to dev and test teams and reproducing experiments
synthetic-data-devops-docker

Efficiency across disciplines

Windocks optimizes efficiency across disciplines of data governance, data science, dev, test, and DataOps.

  • Deliver synthetically populated "twins" of production in seconds, with a code-free configuration that include Auth, Git, and other pipeline operations. 
  • Work with your own python libraries, or other custom code, using Windocks pluggable architecture.
  • automate DataOps with a code-free Docker API, as well as web interface.
  • integration with source control (git) and Continuous Integration (Azure DevOps, Jenkins, etc.) systems, with  off-the-shelf templates for Azure DevOps.
  • Windocks SQL Server containers are identical to conventionally installed SQL Server instances, and preserve compatibility with existing infrastructure and security policies.
  • Achieve a 10:1 or greater reduction in VMs supported, by running up to 50 database containers on a single VM.
All-Discipline-2

Steps to orchestrate synthetic data generation

1. Install Windocks

After you have received your download link and license key, install Windocks Synthetic on a Windows or Linux machine, or Docker container as instructed.  

 

The Synthetic data service should have network access to a source database that is offline, or not otherwise being written to.  The source database can also be hosted locally.  

2. Connect to the source database

Provide credentials to connect to a source database.   Windocks connects to the source database, and presents the tables and column types.  

 

Identify tables that are to be "passed through" or not subsetted, or ignored (not included in the subset).  Save the resulting target database configuration.  

3. Define the subset & synthesize spec and "run"

Once the source database connection is made, and tables and columns are reviewed, the target subset and synthetic databases are specified: 

  • Choose to deliver either a subset database, or synthetic, or both, and assign appropriate database names. 
  • Optionally, deliver data to a different platform
  • Specify the size as a percentage of the source, save, and "run" the transform.  
  • Windocks delivers the subset and synthetic populated databases in minutes, while maintaining the production relationships and data distribution.
  • Review the quality of the subset and synthetic databases with built-in distribution charts.   

This workflow can be automated using restful APIs.  

Extend CI/CD to your data layer