BigID API/Scan Data In Motion Tutorial

In this article, you'll learn:

The differences between data in motion scans and regular data source scans
How to enable data in motion data sources
How to test a connection to a data in motion data source
How to add a data source using the BigID API

Many organizations are receiving and processing data in real time. Where there's data, there's bound to be personal information. In this tutorial, we'll add an AWS Kinesis data source that BigID will scan in real time.

Unlike traditional BigID scans that run weekly, monthly or quarterly, data in motion scans run continuously. This means that under scan details you'll see your data in motion scan at 0% with a status of "in progress" while you're monitoring a data in motion data source.

First, we should check if data in motion is enabled in your environment.

Discovering Data Source Options[edit]

You can see what data source connectors are installed in your environment through the BigID UI, but since we're focused on the API (and because all actions in the UI can be performed in the API), we are going to use the API to retrieve them.

Press Send on the request below to get a listing of the data source connectors installed on our test BigID system.

You'll see our test system has around 70 different data source connectors installed. Use CTRL+F (CMD+F on Mac OS) to search for the Kinesis connector.

Do you see the connector? If not, that means the DIM_ENABLED environment variable isn't set inside your sandbox system. Execute the "Enable Data in Motion" sandbox action below button to set the flag in the sandbox environment. Instructions to set the DIM_ENABLED environment variable through the UI in your own environment are located at https://www.docs.bigid.com/bigid/docs/kinesis

Sandbox Action - Enable Data in Motion

Click the button to the right to run the workflow "Enable Data in Motion" on the BigID developer sandbox system. This will not affect your personal or organization's BigID system.

Adding the Data in Motion data source[edit]

Now that we know the Kinesis data source connector is enabled, lets add our data source. Remember the three steps of adding a datasource:

Populate the data source parameters.
Test the data source connection.
Save the data source.

Populating Data Source Parameters[edit]

Kinesis data sources look like the following JSON object:

{
  "owners": [
    ""
  ],
  "differential": false,
  "rdb_is_sample_data": false,
  "aws_key_id": "",
  "aws_key_secret": "",
  "isIamRoleAuth": true,
  "region": "us-east-1",
  "stream_name": "STREAMNAME",
  "name": "Kinesis",
  "type": "kinesis",
  "security_tier": "1",
  "ocr_languages": "eng",
  "scanner_strategy": "SCAN_ALL",
  "enabled": "yes",
  "keyDeserializer": "String",
  "valueDeserializer": "String"
}

Notice how you have the option to either use an IAM role or an access key and secret to connect to the data source. In our case we'll use an IAM Role already applied to the sandbox that gives us access to a Kinesis stream called "StockTradeStream".

Our parameters should look like the following:

  "owners": [
    ""
  ],
  "differential": false,
  "rdb_is_sample_data": false,
  "aws_key_id": "",
  "aws_key_secret": "",
  "isIamRoleAuth": true,
  "region": "us-east-1",
  "stream_name": "StockTradeStream",
  "name": "Kinesis",
  "type": "kinesis",
  "security_tier": "1",
  "ocr_languages": "eng",
  "scanner_strategy": "SCAN_ALL",
  "enabled": "yes",
  "keyDeserializer": "String",
  "valueDeserializer": "String"
}

Test the data source connection[edit]

Let's make sure our BigID installation has the proper permissions and network capabilities to access our data source by performing a test connection. We can do this by sending our parameters to the /ds-connection-test endpoint.

Save the data source connection[edit]

Now that we know what parameters to pass, let's create our data source. We just need to send a POST to the /ds_conenctions endpoint with our parameters.

Every data source in BigID also needs a unique name. For your data source, use RANDOMHERE as the name so you don't conflict with other users.

If we retrieve our data sources, now we should see a new data source with the information we supplied above. Use CTRL+F (or CMD+F) in your browser to find the data source you created.

Scanning a Data in Motion Data Source[edit]

Creating a Scan Profile[edit]

Scans within BigID rely on scan profiles to know what data sources to scan, when to scan them, and what classifiers to apply. To scan our new data source we need to add a scan profile that targets it. The object for a scan profile looks like the following:

{
  "scanType": "dataInMotion",
  "allEnabledIdSor": true,
  "allEnabledDs": false,
  "skipIdScan": true,
  "isClassificationsAsPiiFindings": false,
  "labelFramework": {
    "id": "mip",
    "name": "Labels"
  },
  "dataSourceList": [
    "DSNAME"
  ],
  "name": "PROFILENAME",
  "owners": [],
  "isCustomScanProfile": true
}

Let's create one below. Be sure to use RANDOMHERE as your profile and data source name.

Starting a Scan[edit]

Now that we have our profile, we need to start our scan. Remember that this scan will not complete. Data in motion scans run continuously to scan data as it comes in. The request to start a scan won't have a request body, but if we request a list of our scans we can see it.

Code Sample[edit]