BigID API/Scan Data In Motion Tutorial

From BigID Developer Portal

In this article, you'll learn:

  • The differences between data in motion scans and regular data source scans
  • How to enable data in motion data sources
  • How to test a connection to a data in motion data source
  • How to add a data source using the BigID API

Many organizations are receiving and processing data in real time. Where there's data, there's bound to be personal information. In this tutorial, we'll add an AWS Kinesis data source that BigID will scan in real time.

Unlike traditional BigID scans that run weekly, monthly or quarterly, data in motion scans run continuously. This means that under scan details you'll see your data in motion scan at 0% with a status of "in progress" while you're monitoring a data in motion data source.

First, we should check if data in motion is enabled in your environment.

Discovering Data Source Options

You can see what data source connectors are installed in your environment through the BigID UI, but since we're focused on the API (and because all actions in the UI can be performed in the API), we are going to use the API to retrieve them.

Press Send on the request below to get a listing of the data source connectors installed on our test BigID system.

You'll see our test system has around 70 different data source connectors installed. Use CTRL+F (CMD+F on Mac OS) to search for the Kinesis connector.

Do you see the connector? If not, that means the DIM_ENABLED environment variable isn't set inside your sandbox system. Click the "Enable Data in Motion" button to set the flag. . Instructions to do set the DIM_ENABLED environment variable through the UI are located at https://www.docs.bigid.com/bigid/docs/kinesis

Adding the Data in Motion data source

Now that we know the Kinesis data source connector is enabled, lets add our data source. Remember the three steps of adding a datasource:

  • Populate the data source parameters.
  • Test the data source connection.
  • Save the data source.

Populating Data Source Parameters

Kinesis data sources look like the following JSON object:

  "owners": [
    ""
  ],
  "differential": false,
  "rdb_is_sample_data": false,
  "aws_key_id": "",
  "aws_key_secret": "",
  "isIamRoleAuth": true,
  "region": "us-east-1",
  "stream_name": "STREAMNAME",
  "name": "Kinesis",
  "type": "kinesis",
  "security_tier": "1",
  "ocr_languages": "eng",
  "scanner_strategy": "SCAN_ALL",
  "enabled": "yes",
  "keyDeserializer": "String",
  "valueDeserializer": "String"
}

Notice how you have the option to either use IAM Roles or an access key and secret to connect to the data source. In our case we'll use an IAM Role already applied to the sandbox that gives us access to a Kinesis stream called "StockTradeStream".

Our parameters should look like the following:

  "owners": [
    ""
  ],
  "differential": false,
  "rdb_is_sample_data": false,
  "aws_key_id": "",
  "aws_key_secret": "",
  "isIamRoleAuth": true,
  "region": "us-east-1",
  "stream_name": "StockTradeStream",
  "name": "Kinesis",
  "type": "kinesis",
  "security_tier": "1",
  "ocr_languages": "eng",
  "scanner_strategy": "SCAN_ALL",
  "enabled": "yes",
  "keyDeserializer": "String",
  "valueDeserializer": "String"
}

Test the data source connection

Let's make sure our BigID installation has the proper permissions and network capabilities to access our data source by performing a test connection. We can do this by sending our parameters to the /ds-connection-test endpoint.

Save the data source connection

Now that we know what parameters to pass, let's create our data source. We just need to send a POST to the /ds_conenctions endpoint with our parameters.

Every data source in BigID also needs a unique name. For your data source, use RANDOMHERE as the name so you don't conflict with other users.

If we retrieve our data sources, now we should see a new data source with the information we supplied above. Use CTRL+F (or CMD+F) in your browser to find the data source you created.

Scanning a Data in Motion Data Source

Creating a Scan Profile

Scans within BigID rely on scan profiles to know what data sources to scan, when to scan them, and what classifiers to apply. To scan our new data source we need to add a scan profile that targets it. The object for a scan profile looks like the following:

{
  "scanType": "dataInMotion",
  "allEnabledIdSor": true,
  "allEnabledDs": false,
  "skipIdScan": true,
  "isClassificationsAsPiiFindings": false,
  "labelFramework": {
    "id": "mip",
    "name": "Labels"
  },
  "dataSourceList": [
    "DSNAME"
  ],
  "name": "PROFILENAME",
  "owners": [],
  "isCustomScanProfile": true
}


Let's create one below. Be sure to replace the use RANDOMHERE as your profile and data source name.

Starting a Scan

Now that we have our profile, we need to start our scan. Remember that this scan will not complete. Data in motion scans run continuously to scan data as it comes in.