Introduction
Windocks Free and Standard editions deliver cross platform database subsetting, synthetic data generation, and database movement or migration. Get started with synthetic data, subsets, data move
Windocks Enterprise edition adds Windows SQL Server containers and database cloning (or virtualization), for delivery of complete writable database environments that can include subsets, synthetic data, and database migration.
Get Started with deployment and installation
Articles on Synthetic data, Subsets and data move
Useful articles and sample code for synthetic data, subsets, copying data
Includes articles such as creating subsets or synthetic data yourself, Python samples for Synthetic Data Vault (SDV), bulk inserting into Snowflake, moving data from Azure to SQL Server instances or from Azure to Aurora and more.
Best Practices
Troubleshooting
Configuration for large databases
Choosing the number of tables to process in parallel, and other parameters.
Release Notes
Docker containers
Editions and license limits
SQL Server to Snowflake microservice
Get Started
Subsetting, Synthetic data, and moving databases
Windocks is available for Windows, Linux, and Docker containers, and does not write to the source database, but does require exclusive use of the source (ie., no writes to the source while being used by Windocks).
Subsetting is based on primary/foreign keys set in the source database, and handles circular dependencies and composite keys. Schema changes are detected between Transforms, to allow for database configurations to be updated for masking or synthetic data.
Synthetic data is based on Windocks fast synthetic engine along with SDV open source libraries, applied to a database subset. Support is available for customer provided Python libraries. Automated PII detection is not included in the initial release.
Windocks automates data type mapping between platforms, creating the target schema, with throughput up to 20 million rows per minute. Incremental change capture is not included in the initial release. Repeated runs of the migration transform will overwrite a previously written target database.
Choosing a Windocks deployment
Windocks involves a source and target database, and the Windocks service. All can reside on a single machine, on two, or on three separate machines. Working with larger databases (100 GB or more) benefits from a 3 machine architecture, to maximize CPU and RAM for data operations.
Minimum resources for Windocks is 4 vCPU cores, and 8 GB of RAM. For large databases we recommend 8 or more vCPU cores, and 16 GB or more RAM. A detailed discussion on configuration for resource use is detailed in the large database configuration topic.
Installation
Windows
Download the installer zip file synth.zip and extract say to D:\synth
cd D:\synth
.\windockssyntheticinstaller.exe
Windocks is run as a service
Start using at http://localhost:5200 or http://<DnsOfServer>:5200
Change the port in windockssynthetic/config.json if you need
Set your administrator account name and password at
http://localhost:5200/firsttime
Linux
Recommended to use docker but also runs natively on Linux
Download the installer zip file synth.zip and extract
apt-get unzip
unzip the zip installer to a directory say ~/synth
cd ~/synth
./windockssyntheticinstaller
cd ../windockssynthetic
chmod 700 ./subsetsynthetic
# Run the application
./subsetsynthetic
# For background
nohup ./subsetsynthetic > outputlogfile &
Start using at http://<localhost:5200 or http://<DnsOfServer>:5200
Change the port in windockssynthetic/config.json if you need
Set your administrator account name and password at
http://localhost:5200/firsttime
Docker
Image is available on dockerhub
docker run –p 5200:5200 –d windocks/move_subset_synthetic_data
Start using at http://<localhost:5200 or http://<DnsOfServer>:5200
Set your administrator account name and password at
http://localhost:5200/firsttime
This release does not support environment variable for config variables, you have to docker exec into the container and see config/config.json
Kubernetes
In deployment yaml file, use the image windocks/move_subset_synthetic_data
After you deploy and run the service, Set your administrator account name and password at
http://localhost:5200/firsttime
Uninstall steps
Windows
Open a command prompt as administrator
sc delete WindocksSubsetSynthetic
Delete the windockssynthetic and windocksdonotdelete directory
Linux
End the application
Ps –ef | grep subsetsynthetic
kill -9 <pid>
rm -rf windockssynthetic
Configuration
Check your configuration in config/config.json, you can just use the defaults to begin with.
For system resources, we recommend a minimum of 12GB RAM with 2 cores. We also recommend running the Windocks process on a machine different from the source database and target database instances, though you can run them all on a single laptop if you wish.
If you are moving large databases refer to our system configuration guide online (under Resources) for multi-threading settings for parallel processing.
Restart the WindocksSubsetSynthetic service on Windows or the subsetsynthetic process / container on Linux after config changes.
Email support@windocks.com if you have any issues configuring
Usage
Windocks can move, subset, or generate synthetic data from these sources (SQL, Server, Snowflake, Postgre, MySql, AzureSQL, Azure Managed Instance, RDS / Aurora) to these targets sources (SQL, Server, Snowflake, Postgre, MySql, AzureSQL, Azure Managed Instance, Amazon RDS / Aurora). You may use a different database type for the target - for example customers use Windocks to move data from SQL Server to Snowflake and back, Azure to SQL Server instances, and SQL Server to Postgre.
- Start by creating a Connection, test it and save it
- You may create one or more data sources for a connection, Define a data source for your connection and save it. To generate synthetic data configure the details of what type of sensitive data you need.
- Decide if you would like to target the same or a different connection for copy/subset/synthetic. (Note that Snowflake for example is a single database connection and if that is your source, then you must define a different target connection. A SQL Server instance connection for example, can be used for both source and target)
- If targeting a different connection, create another connection, test it and save it
- Find your source connection, its data sources, and then create a transform for your data source. Specify if you want a subset (percentage of your source), full copy or generate synthetic data based on the source database.
- Run the transform. A 1TB database takes approximately 3-4 hours to copy completely. The same 1 TB database takes 2 to 3 minutes to get a small subset. A 30 TB database (Snowflake Sample) takes 1 hour for a very small representative subset (0.0000001%) to be moved to SQL Server. Subsets and synthetic data have the same data distribution as the source and can be an easily generated representative database for testing.
- The report of the transform run is available in the transform details view by selecting More in the list of transforms. You may also define WHERE clauses, primary key values to define your subset more narrowly. Data distributions are available from the Transform details view. config/config.json has details on the maximum row count for which distributions are generated, you can modify and restart the Windocks process / service.
- If using docker / kubernetes, the details of your connections, transform configurations will be written to a database in the container. If you delete the container, that data will be lost. The enterprise edition supports persistent storage on a volume you designate, contact support@windocks.com
Security, permissions
Windocks provides its own authentication to use the Windocks service and you may choose to require https only (see config/config.json).
Source databases require read only permissions, target databases require create / write
Database authentication can be done at the operating system level or you may save passwords on the server running Windocks (the encryption of these saved passwords is done by Microsoft middleware).
You may generate synthetic data for all your sensitive columns so your target databases do not have any sensitive data.
Transforms across different database types
While transforming a database from one type of system to another (such as SQL Server to Snowflake), Windocks takes care of mapping types and sizes for you automatically. However, there are a few things for you to check before running a transform like that:
- Size maximums of the values in column – Character and binary columns have maximum sizes that are different in different database platforms.
- Column name size maximums
- Row size maximums
If your target server cannot support the size of your source, then you have to truncate in your source.
Telemetry
Windocks collects the following data to enable customer technical support, and for product and performance improvements. No data is shared with third parties.
- User identity
- User email
- Browser type
- Database schema (tables and relationships, without data)
- Errors
- Database type
- Features used
- Application version
- Edition
- Language
- Browser used to access the application
- Operating system
Windocks uses this telemetry data to:
- Provide product and technical support
- Fix errors in the software
- Improve the product
Windocks does not share this data with anyone outside of Windocks
Configuration for large databases
Windocks is a configurable asynchronous, multi-threaded service. The Configuration Guide (link below) provides guidance for the number of tables to process in parallel, the number of batches to run in parallel on these tables, and the batch sizing.
Changes in the Windocks configuration are made in located in \Windockssynthetic\config\config.json. Restart the Windocks service following changes to the config.
Best Practices
Start small for subsets and synthetic data, initially target subsets of 100 MB or less, and expand the size as needed.
When working with Subset Table filters and Where clauses extraneous data can be minimized by selecting a minimum percentage (0.0000001).
For migrating large databases to Snowflake use Medium or larger data warehouse.
When running the Windocks service on the source database instance, it is important to limit RAM consumption by the source instance. A good rule of thumb is to divide RAM evenly between the source instance and the RAM available to the Windocks service.
Following a Transform review the post-transform report accessed by the "More" button on the Transform tab. The report will include any of the following errors:
- Tables with rows missing in the target
- Tables where primary keys were not created
- Tables where foreign keys were not created
Release Notes
Subsetting
Circular dependencies, composite keys, and self referencing tables are handled automatically.
Subsetting is based on Primary and Foreign keys as set in the database.
Subsets of encrypted databases are delivered in unencrypted form, except for SQL Server where automated TDE encryption is available in the Transform.
History and Temporal tables are ignored.
External references and linked servers are ignored.
Indexes, stored procedures, views, and other objects are not included in the subset.
SQL Server Filestream data type is not supported.
Synthetic data
Automated PII data detection is not included in the initial release.
The fast Windocks synthetic data engine is configured by default, and is capable of populating databases of GBs in size. SDV and other Python engines are recommended for up to 3-5 tables of 50,000 rows or less.
Database Mover
Is designed to move schemas and data to the same database platform (such as Postgre to Postgre), or between any of the Windocks supported platforms (SQL Server, Postgre, MySQL, Snowflake, Aurora, RDS, Azure Managed Instance, and Azure SQL).
Primary keys in a source database may not map to the desired database target. If a source primary key does not have an equivalent in the target database, then the target database key is set as as an nvarchar. As a result the system may not be able to create a corresponding primary key. Any errors are reported in the post-transform report accessed via the "more" button.
Column name and row length limits. Before performing a transform ensure that column name lengths in the source database are less than or equal to the max column name in target database. For example, PostGre max column name length is 59 which is less than SQL Server or Snowflake
SQL Server maximum column length is less than Snowflake.
MySQL max row length is 64K bytes. If your target is MySQL and source is not MySQL, make sure that the source has no rows in any table that exceeds the max row length of MySQL
Data types are mapped identically between source and target databases of the same type. If they are different, then the target data type will be set to the closest possible.
Column lengths for varchar and binary types are handled based on the supported lengths available in the target. Snowflake, for example, has large max varchar values. When moving data from snowflake to postgre, you must ensure that the actual values will fit in the postgre target.
Database moves include the same limitations as listed above in Subsetting (schema and data only, no Filestream support, no TDE support initially).
MySql
In the target MySQL instance, please run
set global local_infile=true;
Troubleshooting
Error reporting: errors are summarized in post-transform report, via the "more" button. The following errors are reported. The lack of these errors indicates a successful Transform.
- Tables with rows missing from the target
- Tables with Primary keys that were not successfully created
- Tables with foreign keys that were not successfully created
Logs: are available at \Windockssynthetic\bin\windocks
Insufficient memory: configurations that exceed system resources are reported in the post-Transform report. Review the configuration and reduce the number of tables in parallel, number of parallel batches per table, and batch size.
SQL Server to Snowflake Microservice
A microservice is supported in Standard and Enterprise editions to ensure high performance and reliable data movement. Contact support@windocks.com if this is needed.