Synthetic data
DevOps with Subset and Synthetic Data
Delivering data for a repeatable, agile process for dev/test is challenging due to lengthy production database restores, and growing concerns for personally identifiable information (PII).
With Windocks data scientists, developers, and test teams work can work with right-sized data sets, ranging from writable production database images to database subsets and tables, optionally populated with synthetic data.
Windocks automates the creation of a high fidelity subset and synthetic data:
- Simply provide a percentage size of source database, and Windocks delivers a subsetted database in minutes.
- Model the source data distribution using built-in synthetic data generation from Windocks, or open source or in-house libraries.
- Validate the data distribution and privacy with built-in graphical distributions and metrics, or your custom code
- Deploy synthetic data sets to target database instances or Docker containers for immediate access to dev and test teams and reproducing experiments
Efficiency across disciplines
Windocks optimizes efficiency across disciplines of data governance, data science, dev, test, and DataOps.
- Deliver synthetically populated "twins" of production in seconds, with a code-free configuration that include Auth, Git, and other pipeline operations.
- Work with your own python libraries, or other custom code, using Windocks pluggable architecture.
- automate DataOps with a code-free Docker API, as well as web interface.
- integration with source control (git) and Continuous Integration (Azure DevOps, Jenkins, etc.) systems, with off-the-shelf templates for Azure DevOps.
- Windocks SQL Server containers are identical to conventionally installed SQL Server instances, and preserve compatibility with existing infrastructure and security policies.
- Achieve a 10:1 or greater reduction in VMs supported, by running up to 50 database containers on a single VM.
Steps to orchestrate synthetic data generation
1. Install Windocks
After you have received your download link and license key, install Windocks Synthetic on a Windows or Linux machine, or Docker container as instructed.
The Synthetic data service should have network access to a source database that is offline, or not otherwise being written to. The source database can also be hosted locally.
2. Connect to the source database
Provide credentials to connect to a source database. Windocks connects to the source database, and presents the tables and column types.
Identify tables that are to be "passed through" or not subsetted, or ignored (not included in the subset). Save the resulting target database configuration.
3. Define the subset & synthesize spec and "run"
Once the source database connection is made, and tables and columns are reviewed, the target subset and synthetic databases are specified:
- Choose to deliver either a subset database, or synthetic, or both, and assign appropriate database names.
- Optionally, deliver data to a different platform
- Specify the size as a percentage of the source, save, and "run" the transform.
- Windocks delivers the subset and synthetic populated databases in minutes, while maintaining the production relationships and data distribution.
- Review the quality of the subset and synthetic databases with built-in distribution charts.