BigID API/Scan Data In Motion Tutorial
- The differences between data in motion scans and regular data source scans
- How to enable data in motion data sources
- How to test a connection to a data in motion data source
- How to add a data source using the BigID API
Many organizations are receiving and processing data in real time. Where there's data, there's bound to be personal information. In this tutorial, we'll add an AWS Kinesis data source that BigID will scan in real time.
Unlike traditional BigID scans that run weekly, monthly or quarterly, data in motion scans run continuously. This means that under scan details you'll see your data in motion scan at 0% with a status of "in progress" while you're monitoring a data in motion data source.
First, we should check if data in motion is enabled in your environment.
Discovering Data Sources
You can see what data source connectors are installed in your environment through the BigID UI, but since we're focused on the API (and because all actions in the UI can be performed in the API), we are going to use the API to retrieve them.
Press Send on the request below to get a listing of the data source connectors installed on our test BigID system.
You'll see our test system has around 70 different data source connectors installed. Use CTRL+F (CMD+F on Mac OS) to search for the Kinesis connector.
Adding the Data in Motion data source
Now that we know the Kinesis data source connector is enabled, lets add our data source. Remember the three steps of adding a datasource:
- Populate the data source parameters.
- Test the data source connection.
- Save the data source.
Populating Data Source Parameters
Kinesis data sources look like the following JSON object:
"owners": [
""
],
"differential": false,
"rdb_is_sample_data": false,
"aws_key_id": "",
"aws_key_secret": "",
"isIamRoleAuth": true,
"region": "us-east-1",
"stream_name": "STREAMNAME",
"name": "Kinesis",
"type": "kinesis",
"security_tier": "1",
"ocr_languages": "eng",
"scanner_strategy": "SCAN_ALL",
"enabled": "yes",
"keyDeserializer": "String",
"valueDeserializer": "String"
}
Notice how you have the option to either use IAM Roles or an access key and secret to connect to the data source. In our case we'll use an IAM Role already applied to the sandbox that gives us access to a Kinesis stream called "StockTradeStream".
Our parameters should look like the following:
"owners": [
""
],
"differential": false,
"rdb_is_sample_data": false,
"aws_key_id": "",
"aws_key_secret": "",
"isIamRoleAuth": true,
"region": "us-east-1",
"stream_name": "StockTradeStream",
"name": "Kinesis",
"type": "kinesis",
"security_tier": "1",
"ocr_languages": "eng",
"scanner_strategy": "SCAN_ALL",
"enabled": "yes",
"keyDeserializer": "String",
"valueDeserializer": "String"
}