Windocks User Guide

Additional resources

Tutorial videos

Get Started

Subsetting, Synthetic data, and moving databases

Windocks is available for Windows, Linux, and Docker containers, and does not write to the source database, but does require exclusive use of the source (ie., no writes to the source while being used by Windocks).  

Subsetting is based on primary/foreign keys set in the source database, and handles circular dependencies and composite keys. Schema changes are detected between Transforms, to allow for database configurations to be updated for masking or synthetic data. 

Synthetic data is based on Windocks fast synthetic engine along with SDV open source libraries, applied to a database subset.  Support is available for customer provided Python libraries.   Automated PII detection is not included in the initial release.  

Windocks automates data type mapping between platforms, creating the target schema, with throughput up to 20 million rows per minute.  Incremental change capture is not included in the initial release.  Repeated runs of the migration transform will overwrite a previously written target database.  

Choosing a Windocks deployment

Windocks involves a source and target database, and the Windocks service. All can reside on a single machine, on two, or on three separate machines. Working with larger databases (100 GB or more) benefits from a 3 machine architecture, to maximize CPU and RAM for data operations.  

Minimum resources for Windocks is 4 vCPU cores, and 8 GB of RAM. For large databases we recommend  8 or more vCPU cores, and 16 GB or more RAM.   A detailed discussion on configuration for resource use is detailed in the large database configuration topic

Installation

Windows

Download the installer zip file synth.zip and extract say to D:\synth

cd D:\synth

.\windockssyntheticinstaller.exe

Windocks is run as a service

Start using at http://localhost:5200 or http://<DnsOfServer>:5200

Change the port in windockssynthetic/config.json if you need

Set your administrator account name and password at

http://localhost:5200/firsttime

 

Linux

Recommended to use docker but also runs natively on Linux

Download the installer zip file synth.zip and extract

apt-get unzip

unzip the zip installer to a directory say ~/synth

cd ~/synth

./windockssyntheticinstaller

cd ../windockssynthetic

chmod 700 ./subsetsynthetic

# Run the application

./subsetsynthetic

# For background

nohup ./subsetsynthetic > outputlogfile &

Start using at http://<localhost:5200 or http://<DnsOfServer>:5200

Change the port in windockssynthetic/config.json if you need

Set your administrator account name and password at

http://localhost:5200/firsttime

 

Docker

Image is available on dockerhub

docker run –p 5200:5200 –d windocks/move_subset_synthetic_data

Start using at http://<localhost:5200 or http://<DnsOfServer>:5200

Set your administrator account name and password at

http://localhost:5200/firsttime

This release does not support environment variable for config variables, you have to docker exec into the container and see config/config.json

 

Kubernetes

In deployment yaml file, use the image windocks/move_subset_synthetic_data

After you deploy and run the service, Set your administrator account name and password at

http://localhost:5200/firsttime

 

Uninstall steps

Windows

Open a command prompt as administrator

sc delete WindocksSubsetSynthetic

Delete the windockssynthetic and windocksdonotdelete directory

 

Linux

End the application

Ps –ef | grep subsetsynthetic

kill -9 <pid>

rm -rf windockssynthetic

 

Configuration

Check your configuration in config/config.json, you can just use the defaults to begin with.

For system resources, we recommend a minimum of 12GB RAM with 2 cores. We also recommend running the Windocks process on a machine different from the source database and target database instances, though you can run them all on a single laptop if you wish.

If you are moving large databases refer to our system configuration guide online (under Resources) for multi-threading settings for parallel processing.

Restart the WindocksSubsetSynthetic service on Windows or the subsetsynthetic process / container on Linux after config changes.

Email support@windocks.com if you have any issues configuring

 

Usage

Windocks can move, subset, or generate synthetic data from these sources (SQL, Server, Snowflake, Postgre, MySql, AzureSQL, Azure Managed Instance, RDS / Aurora) to these targets sources (SQL, Server, Snowflake, Postgre, MySql, AzureSQL, Azure Managed Instance, Amazon RDS / Aurora). You may use a different database type for the target - for example customers use Windocks to move data from SQL Server to Snowflake and back, Azure to SQL Server instances, and SQL Server to Postgre.

  1. Start by creating a Connection, test it and save it
  2. You may create one or more data sources for a connection, Define a data source for your connection and save it. To generate synthetic data configure the details of what type of sensitive data you need.
  3. Decide if you would like to target the same or a different connection for copy/subset/synthetic. (Note that Snowflake for example is a single database connection and if that is your source, then you must define a different target connection. A SQL Server instance connection for example, can be used for both source and target)
  4. If targeting a different connection, create another connection, test it and save it
  5. Find your source connection, its data sources, and then create a transform for your data source. Specify if you want a subset (percentage of your source), full copy or generate synthetic data based on the source database.
  6. Run the transform. A 1TB database takes approximately 3-4 hours to copy completely. The same 1 TB database takes 2 to 3 minutes to get a small subset. A 30 TB database (Snowflake Sample) takes 1 hour for a very small representative subset (0.0000001%) to be moved to SQL Server. Subsets and synthetic data have the same data distribution as the source and can be an easily generated representative database for testing.
  7. The report of the transform run is available in the transform details view by selecting More in the list of transforms. You may also define WHERE clauses, primary key values to define your subset more narrowly. Data distributions are available from the Transform details view. config/config.json has details on the maximum row count for which distributions are generated, you can modify and restart the Windocks process / service.
  8. If using docker / kubernetes, the details of your connections, transform configurations will be written to a database in the container. If you delete the container, that data will be lost. The enterprise edition supports persistent storage on a volume you designate, contact support@windocks.com

Security, permissions

Windocks provides its own authentication to use the Windocks service and you may choose to require https only (see config/config.json).

Source databases require read only permissions, target databases require create / write

Database authentication can be done at the operating system level or you may save passwords on the server running Windocks (the encryption of these saved passwords is done by Microsoft middleware).

You may generate synthetic data for all your sensitive columns so your target databases do not have any sensitive data.

 

Transforms across different database types

While transforming a database from one type of system to another (such as SQL Server to Snowflake), Windocks takes care of mapping types and sizes for you automatically. However, there are a few things for you to check before running a transform like that:

  1. Size maximums of the values in column – Character and binary columns have maximum sizes that are different in different database platforms.
  2. Column name size maximums
  3. Row size maximums

If your target server cannot support the size of your source, then you have to truncate in your source.

Additional topics
 

Telemetry

Windocks collects the following data to enable customer technical support, and for product and performance improvements.   No data is shared with third parties.  

  • User identity
  • User email 
  • Browser type
  • Database schema (tables and relationships, without data)
  • Errors
  • Database type
  • Features used
  • Application version
  • Edition
  • Language
  • Browser used to access the application
  • Operating system

Windocks uses this telemetry data to: 

  • Provide product and technical support
  • Fix errors in the software
  • Improve the product 

Windocks does not share this data with anyone outside of Windocks

 

Configuration for large databases

Windocks is a configurable asynchronous, multi-threaded service.  The Configuration Guide (link below) provides guidance for the number of tables to process in parallel, the number of batches to run in parallel on these tables, and the batch sizing.  

Changes in the Windocks configuration  are made in  located in \Windockssynthetic\config\config.json.  Restart the Windocks service following changes to the config. 

 

 

Encryption

Managing encryption policies for lower level data environments is important.   Windocks provides optional automated TDE encryption for SQL Server Transforms.   Be sure to address encryption needs for other data types.  

 

 

Licensing

Windocks free and standard editions are licensed according to the sum of source databases.  

 

 

Best Practices

Start small for subsets and synthetic data, initially target subsets of 100 MB or less, and expand the size as needed.  

When working with Subset Table filters and Where clauses extraneous data can be minimized by selecting a minimum percentage (0.0000001).   

For migrating large databases to Snowflake use Medium or larger data warehouse. 

When running the Windocks service on the source database instance, it is important to limit RAM consumption by the source instance.  A good rule of thumb is to divide RAM evenly between the source instance and the RAM available to the Windocks service. 

Following a Transform review the post-transform report accessed by the "More" button on the Transform tab.  The report will include any of the following errors:

  • Tables with rows missing in the target
  • Tables where primary keys were not created
  • Tables where foreign keys were not created
 

Release Notes

Subsetting

Circular dependencies, composite keys, and self referencing tables are handled automatically. 

Subsetting is based on Primary and Foreign keys as set in the database.  

Subsets of encrypted databases are delivered in unencrypted form, except for SQL Server where automated TDE encryption is available in the Transform. 

History and Temporal tables are ignored. 

External references and linked servers are ignored.

Indexes, stored procedures, views, and other objects are not included in the subset. 

SQL Server Filestream data type is not supported. 

Synthetic data 

Automated PII data detection is not included in the initial release.

The fast Windocks synthetic data engine is configured by default, and is capable of populating databases of GBs in size.   SDV and other Python engines are recommended for up to 3-5 tables of 50,000 rows or less.  

Database Mover 

Is designed to move schemas and data to the same database platform (such as Postgre to Postgre), or between any of the Windocks supported platforms (SQL Server, Postgre, MySQL, Snowflake, Aurora, RDS, Azure Managed Instance, and Azure SQL). 

Primary keys in a source database may not map to the desired database target.  If a source primary key  does not have an equivalent in the target database, then the target database key is set as as an nvarchar.   As a result the system may not be able to create a corresponding primary key.  Any errors are reported in the post-transform report accessed via the "more" button. 

Column name and row length limits.  Before performing a transform ensure that column name lengths in the source database are less than or equal to the max column name in target database.   For example, PostGre max column name length is 59 which is less than SQL Server or Snowflake 

SQL Server maximum column length is less than Snowflake.

MySQL max row length is 64K bytes. If your target is MySQL and source is not MySQL, make sure that the source has no rows in any table that exceeds the max row length of MySQL

Data types are mapped identically between source and target databases of the same type. If they are different, then the target data type will be set to the closest possible. 

Column lengths for varchar and binary types are handled based on the supported lengths available in the target. Snowflake, for example, has large max varchar values.  When moving data from snowflake to postgre, you must ensure that the actual values will fit in the postgre target.

Database moves include the same limitations as listed above in Subsetting (schema and data only, no Filestream support, no TDE support initially). 

MySql

In the target MySQL instance, please run 

set global local_infile=true;

 

Troubleshooting

Error reporting:  errors are summarized in post-transform report, via the "more" button.  The following errors are reported.   The lack of these errors indicates a successful Transform. 

  • Tables with rows missing from the target
  • Tables with Primary keys that were not successfully created
  • Tables with foreign keys that were not successfully created

Logs: are available at \Windockssynthetic\bin\windocks

Insufficient memory:  configurations that exceed system resources are reported in the post-Transform report.  Review the configuration and reduce the number of tables in parallel, number of parallel batches per table, and batch size.  

More on Docker containers

Encrypted connections and records of transforms are stored in the container.   Deleting the container will result in these details being lost. 

 

SQL Server to Snowflake Microservice

A microservice is supported in Standard and Enterprise editions to ensure high performance and reliable data movement.   Contact support@windocks.com if this is needed.