Generating Synthetic Data

Windocks generates synthetic data for whole databases, subsets, and tables, and includes cross platform and cloud migration.  The generated synthetic data reflects the data distribution of a source, with subsetting included as source databases often benefit from being "slimmed down."   
 
Synthetic data that models a source database is preferred for data privacy, where masked or obfuscation of data is subject to linkage attacks.  Synthetic data is also preferred for ML and AI initiatives, as synthetic data modeling enhances data accuracy and utility.  
 
Synthetic data is can be generated across different database types. For example, generate synthetic data using a SQL Server database but deliver it into Snowflake. Support includes any combination of SQL Server, Postgre, MySQL, Snowflake, Azure MI and Azure SQL, Aurora, AWS RDS, and Oracle.  
 We also ensure schema changes are detected automatically, to ensure your Synthetic data is always of high quality and fidelity.  
 
 
DOWNLOAD

DevOps with Subset and Synthetic Data

Delivering data for a repeatable, agile process for dev/test is challenging due to lengthy production database restores, and growing concerns for personally identifiable information (PII).   

With Windocks data scientists, developers, and test teams work can work with right-sized data sets, ranging from writable production database images to database subsets and tables, optionally populated with synthetic data.  

Windocks automates the creation of a high fidelity subset and synthetic data:

  • Simply provide a percentage size of source database, and Windocks delivers a subsetted database in minutes.
  • Model the source data distribution using built-in synthetic data generation from Windocks, or open source or in-house libraries.  
  • Validate the data distribution and privacy with built-in graphical distributions and metrics, or your custom code
  • Deploy synthetic data sets to target database instances or Docker containers for immediate access to dev and test teams and reproducing experiments

Delivering data for a repeatable, agile process for dev/test is challenging due to lengthy production database restores, and growing concerns for personally identifiable information (PII).   

With Windocks data scientists, developers, and test teams work can work with right-sized data sets, ranging from writable production database images to database subsets and tables, optionally populated with synthetic data.  

Windocks automates the creation of a high fidelity subset and synthetic data:

  • Simply provide a percentage size of source database, and Windocks delivers a subsetted database in minutes.
  • Model the source data distribution using built-in synthetic data generation from Windocks, or open source or in-house libraries.  
  • Validate the data distribution and privacy with built-in graphical distributions and metrics, or your custom code
  • Deploy synthetic data sets to target database instances or Docker containers for immediate access to dev and test teams and reproducing experiments
synthetic-data-devops-docker

Steps to orchestrate synthetic data generation

1. Install Windocks

After you have received your download link and license key, install Windocks Synthetic on a Windows or Linux machine, or Docker container (See RESOURCES / Get started)

2. Connect to the source database and create a data source

Provide credentials to connect to a source database. Windocks connects to the source database. Select your database from the connection and create  a data source.

 

3. Define a transform and run it

  • Choose to deliver either a subset database, or synthetic, or both, and assign appropriate database names. 
  • Optionally, deliver synthetic data to a different database type (Such as SQL Server to Snowflake)
  • Specify Filters / WHERE clauses on the source to get the subset and synthetic data you want. You may also select specific rows you want in the subset / synthetic target
  • Specify the size as a percentage of the source, save, and "run" the transform.  
  • Windocks delivers the subset and synthetic populated databases in minutes, while maintaining the production relationships and data distribution.
  • Review the transform report and also the quality of the subset and synthetic databases with built-in distribution charts.   

This workflow can be automated using REST APIs.  

 

 

DOWNLOAD