Generating Synthetic Data
DevOps with Subset and Synthetic Data
Delivering data for a repeatable, agile process for dev/test is challenging due to lengthy production database restores, and growing concerns for personally identifiable information (PII).
With Windocks data scientists, developers, and test teams work can work with right-sized data sets, ranging from writable production database images to database subsets and tables, optionally populated with synthetic data.
Windocks automates the creation of a high fidelity subset and synthetic data:
- Simply provide a percentage size of source database, and Windocks delivers a subsetted database in minutes.
- Model the source data distribution using built-in synthetic data generation from Windocks, or open source or in-house libraries.
- Validate the data distribution and privacy with built-in graphical distributions and metrics, or your custom code
- Deploy synthetic data sets to target database instances or Docker containers for immediate access to dev and test teams and reproducing experiments
Delivering data for a repeatable, agile process for dev/test is challenging due to lengthy production database restores, and growing concerns for personally identifiable information (PII).
With Windocks data scientists, developers, and test teams work can work with right-sized data sets, ranging from writable production database images to database subsets and tables, optionally populated with synthetic data.
Windocks automates the creation of a high fidelity subset and synthetic data:
- Simply provide a percentage size of source database, and Windocks delivers a subsetted database in minutes.
- Model the source data distribution using built-in synthetic data generation from Windocks, or open source or in-house libraries.
- Validate the data distribution and privacy with built-in graphical distributions and metrics, or your custom code
- Deploy synthetic data sets to target database instances or Docker containers for immediate access to dev and test teams and reproducing experiments
Steps to orchestrate synthetic data generation
1. Install Windocks
After you have received your download link and license key, install Windocks Synthetic on a Windows or Linux machine, or Docker container (See RESOURCES / Get started)
2. Connect to the source database and create a data source
Provide credentials to connect to a source database. Windocks connects to the source database. Select your database from the connection and create a data source.
3. Define a transform and run it
- Choose to deliver either a subset database, or synthetic, or both, and assign appropriate database names.
- Optionally, deliver synthetic data to a different database type (Such as SQL Server to Snowflake)
- Specify Filters / WHERE clauses on the source to get the subset and synthetic data you want. You may also select specific rows you want in the subset / synthetic target
- Specify the size as a percentage of the source, save, and "run" the transform.
- Windocks delivers the subset and synthetic populated databases in minutes, while maintaining the production relationships and data distribution.
- Review the transform report and also the quality of the subset and synthetic databases with built-in distribution charts.