CAPABILITIES

Masking with Synthetic Data

Windocks supports replacement of sensitive PII data with synthetic data that reflects the distribution of the source column.  Synthetic data can be applied to complete databases and built into database images, ensuring that virtualized database clones are delivered safe-for-use.  
 
Synthetic data that models a source database is preferred for data privacy, where masked or obfuscation of data is subject to linkage attacks on secondary identifiers.  Synthetic data is also preferred for ML and AI initiatives, as synthetic data modeling enhances data accuracy and utility.  
 
Windocks synthetic data also supports cross platform movement.  For example, synthetic data applied to a SQL Server database can be moved to Snowflake.  Data movement includes any combination of SQL Server, Postgre, MySQL, Snowflake, Azure MI and Azure SQL, Aurora, AWS RDS, and Oracle.  
 
Schema changes are detected automatically, to ensure delivered databases reflect any changes in source schema. 
 
 
Let's chat!

DevOps with Subset and Synthetic Data

Delivering data for a repeatable, agile process for dev/test is challenging due to lengthy production database restores, and concerns for personally identifiable information (PII).   

With Windocks data scientists, developers, and test teams work can work with right-sized data sets, ranging from writable production database images to database subsets and tables, optionally populated with synthetic data.  

Windocks automates the creation of a high fidelity subset and synthetic data:

  • Simply provide a percentage size of source database, and Windocks delivers a subsetted database in minutes.
  • Model the source data distribution using built-in synthetic data generation from Windocks, or open source or in-house libraries.  
  • Validate the data distribution and privacy with built-in graphical distributions and metrics, or your custom code
  • Deploy synthetic data sets to target database instances or Docker containers for immediate access to dev and test teams and reproducing experiments
synthetic-data-devops-docker

Steps to orchestrate synthetic data generation

1. Install Windocks

After you have received your download link and license key, install Windocks Synthetic on a Windows or Linux machine, or Docker container (See RESOURCES / Get started)

2. Connect to the source database and create a data source

Provide credentials to connect to a source database. Windocks connects to the source database. Select your database from the connection and create  a data source.

 

3. Define a transform and run it

  • Choose to deliver either a subset database, or synthetic, or both, and assign appropriate database names. 
  • Optionally, deliver synthetic data to a different database type (Such as SQL Server to Snowflake)
  • Specify Filters / WHERE clauses on the source to get the subset and synthetic data you want. You may also select specific rows you want in the subset / synthetic target
  • Specify the size as a percentage of the source, save, and "run" the transform.  
  • Windocks delivers the subset and synthetic populated databases in minutes, while maintaining the production relationships and data distribution.
  • Review the transform report and also the quality of the subset and synthetic databases with built-in distribution charts.   

This workflow can be automated using REST APIs.  

 

 

Get a demo