BigID API/Duplicate Data Tutorial: Difference between revisions

From BigID Developer Portal
(Created page with "{{Box/start}} In this article, you'll learn: * What the BigID data catalog can be used for * Retrieving object data from the catalog via API * Retrieving column data from the...")
 
No edit summary
Line 9: Line 9:
{{Scenario|You're seeing increasingly high storage costs in your cloud data sources. Looking at the names of these data sources, they don't seem to be storing anything that's particularly large, but you suspect that they're storing similar data which is increasing your storage costs. '''Use the BigID catalog to get a list of duplicated data'''}}
{{Scenario|You're seeing increasingly high storage costs in your cloud data sources. Looking at the names of these data sources, they don't seem to be storing anything that's particularly large, but you suspect that they're storing similar data which is increasing your storage costs. '''Use the BigID catalog to get a list of duplicated data'''}}


The BigID API allows you to perform all the actions you're used to performing via the BigID user interface programmatically. This is perfect for scenarios like the one in this exercise where you need to perform the same operation on a scheduled basis. In order to communicate with BigID over its API, we first need to authenticate ourselves.
== The BigID Catalog ==


== Authenticating with BigID ==
<img src="https://resources.cdn.mybigid.com/images-animated/catalog-01.gif" />


There are two ways to authenticate ourselves to BigID:
* '''Username and Password''' - This is the easiest way to authenticate to BigID. You provide a username and password to the /sessions endpoint and BigID will return a session token that is valid for any other API endpoints (given that user has permissions to access them) for 24 hours.
* '''User Token''' - A user token (generated from Administration -> Access Management by a System Administrator) allows you to access BigID by exchanging a user token for a session token at the /refresh endpoint. This means you don't have to store your username and password within an application, but user tokens are only valid for a maximum of 999 days.
In this tutorial, we're going to authenticate with BigID using Username/Password auth and retrieve a list of data sources.
Below you'll see the POST request we'll use to authenticate. The body of the request contains our username and password and we're directing the request to the sessions endpoint in our BigID Sandbox system. Press {{Key|Send}} to get a session token.


<html>
<html>

Revision as of 21:22, 4 November 2021

In this article, you'll learn:

  • What the BigID data catalog can be used for
  • Retrieving object data from the catalog via API
  • Retrieving column data from the catalog via API


scenarioYou're seeing increasingly high storage costs in your cloud data sources. Looking at the names of these data sources, they don't seem to be storing anything that's particularly large, but you suspect that they're storing similar data which is increasing your storage costs. Use the BigID catalog to get a list of duplicated data

The BigID Catalog

<img src="https://resources.cdn.mybigid.com/images-animated/catalog-01.gif" />


In the response, there's a bunch of information about the logged in user. For our purposes, we just care about line 4, the auth_token. This token is what we'll use the authenticate with the other BigID APIs. We've placed a sample below with the auth token highlighted. Copy the auth token from the request you placed above. We'll need it in just a second.

{
    "success": true,
    "message": "Enjoy your token!",
    "auth_token": "eyJhbGciOiJ<don't copy me! I'm just an example!>...",
    "username": "bigid",
    "firstName": "BigID Admin",
    "permissions": [
        "admin",
        "permission.tasks.edit",
        "permission.tasks.read_task_list",
    ...

Calling an API

Now that you have a session token we can directly call BigID APIs. Documentation for these APIs is available at https://www.docs.bigid.com/bigid/reference/api-getting-started . Since we're just trying to perform a simple task, we don't need the docs here, just to know that GET /ds-connections is the endpoint to retrieve a list of data source connections.

Add a new header named "Authorization" and paste the session token you got in the previous request to authenticate yourself.

In that API call, we can see a list of data sources and all the information for each data source.

{
    "status": "success",
    "statusCode": 200,
    "data": {
        "ds_connections": [
            "<data source info here>"
         ]
    }
}