BigID API/Scan Data In Motion Tutorial

In this article, you'll learn:

The differences between data in motion scans and regular data source scans
How to enable data in motion data sources
How to test a connection to a data in motion data source
How to add a data source using the BigID API

Many organizations are receiving and processing data in real time. Where there's data, there's bound to be personal information. In this tutorial, we'll add an AWS Kinesis data source that BigID will scan in real time.

Unlike traditional BigID scans that run weekly, monthly or quarterly, data in motion scans run continuously. This means that under scan details you'll see your data in motion scan at 0% with a status of "in progress" while you're monitoring a data in motion data source.

First, we should check if data in motion is enabled in your environment.

Discovering Data Sources

You can see what data source connectors are installed in your environment through the BigID UI, but since we're focused on the API (and because all actions in the UI can be performed in the API), we are going to use the API to retrieve them.

Press Send on the request below to get a listing of the data source connectors installed on our test BigID system.

You'll see our test system has around 70 different data source connectors installed. Use CTRL+F (CMD+F on Mac OS) to search for the Kinesis connector.

Do you see the connector? If not, that means the DIM_ENABLED environment variable isn't set inside your sandbox system. Click the "Enable Data in Motion" button to set the flag. . Instructions to do set the DIM_ENABLED environment variable through the UI are located at https://www.docs.bigid.com/bigid/docs/kinesis

Adding the Data in Motion data source

Now that we know the Kinesis data source connector is enabled, lets add our data source. Remember the three steps of adding a datasource:

Populate the data source parameters.
Test the data source connection.
Save the data source.

Populating Data Source Parameters

Kinesis data sources look like the following JSON object:

  "owners": [
    ""
  ],
  "differential": false,
  "rdb_is_sample_data": false,
  "aws_key_id": "",
  "aws_key_secret": "",
  "isIamRoleAuth": true,
  "region": "us-east-1",
  "stream_name": "STREAMNAME",
  "name": "Kinesis",
  "type": "kinesis",
  "security_tier": "1",
  "ocr_languages": "eng",
  "scanner_strategy": "SCAN_ALL",
  "enabled": "yes",
  "keyDeserializer": "String",
  "valueDeserializer": "String"
}

Notice how you have the option to either use IAM Roles or an access key and secret to connect to the data source. In our case we'll use an IAM Role already applied to the sandbox that gives us access to a Kinesis stream called "StockTradeStream".

Our parameters should look like the following:

  "owners": [
    ""
  ],
  "differential": false,
  "rdb_is_sample_data": false,
  "aws_key_id": "",
  "aws_key_secret": "",
  "isIamRoleAuth": true,
  "region": "us-east-1",
  "stream_name": "StockTradeStream",
  "name": "Kinesis",
  "type": "kinesis",
  "security_tier": "1",
  "ocr_languages": "eng",
  "scanner_strategy": "SCAN_ALL",
  "enabled": "yes",
  "keyDeserializer": "String",
  "valueDeserializer": "String"
}