Connectors/Java

SDK Structure and Data Model

In this module, we'll explore the core components of the BigID Connector SDK and delve into the data model that underpins connector development. Understanding these elements is crucial for building effective and interoperable connectors.

Deep Dive into SDK Modules

The BigID Connector SDK is comprised of three modules, each serving a distinct purpose:

sdk-api This module houses the interfaces that define the core functionalities of a connector. These interfaces act as contracts, specifying the methods that your connector must implement to interact with BigID and the data source.
sdk-data: This module provides the data model, which includes classes and objects that represent data source entities in a standardized format. This standardized representation ensures seamless communication between your connector and the BigID platform.
sdk-utils: This module offers a collection of utility classes designed to simplify common development tasks. These utilities can help with data object conversion, iterator management, and other functionalities, streamlining your development process.

Exploring the Data Object Hierarchy

The SDK's data model employs a hierarchical structure to represent data source entities. This hierarchy consists of three main levels:

Container Represents the highest-level entry point in a data source. Examples include a database, a cloud storage bucket, a root folder, or a user account.
Subcontainer Represents a secondary entry point within a container. Examples include a schema within a database, a folder within a bucket, or a workspace within a user account.
Leaf Object Represents the lowest level in the hierarchy, containing the actual data. Examples include tables in a database, files in a folder, emails in a mailbox, or tickets in a system.

Understanding this hierarchy is crucial for effectively modeling data source entities and ensuring that your connector can accurately represent the structure of the data source to BigID.

graph TD Container --> Subcontainer Subcontainer --> LeafObject

This diagram visually represents the relationship between the three levels of the data object hierarchy. The `Container` is the root, with `Subcontainer`s nested within it, and `LeafObject`s at the bottom, containing the actual data.

Understanding Data Object Types

The SDK defines four primary types of data objects, each tailored to a specific category of data source entities:

**Base:** Represents fundamental objects like containers and subcontainers. These objects typically describe the structure and organization of the data source.
**Structured:** Represents data objects with a defined schema, such as tables in relational databases or collections in NoSQL databases.
**Unstructured:** Represents data objects without a predefined schema, such as files in a file system or documents in cloud storage.
**App:** Represents data objects specific to applications, such as emails, messages, or tickets. These objects often contain a combination of structured and unstructured data.

Choosing the appropriate data object type for each entity in your data source is essential for accurate representation and efficient processing by BigID.

Data Object Members

Data objects in the SDK have three categories of members:

**Mandatory Members:** These are essential fields that must be populated for the object to be processed correctly by BigID. They typically represent core attributes of the data source entity.
**Optional Members:** These fields provide additional information about the data source entity and should be populated whenever possible without incurring significant performance overhead.
**Additional Parameters:** These are dynamic properties that you can add to the data object to include extra metadata specific to your data source.

Understanding these member categories ensures that you provide the necessary information to BigID while maintaining flexibility to include data source-specific details.

Importance of Excluding Sensitive Data

It's crucial to remember that `DataSourceObjects` and `DataLink` objects should only contain metadata or indexing information. **Never include sensitive data, such as passwords, personally identifiable information (PII), or other confidential details, in these objects.**

BigID provides mechanisms for handling sensitive data during the scanning process, and including it in the metadata objects can pose security risks.

This concludes the module on SDK Structure and Data Model. You now have a deeper understanding of the SDK structure and the data model that forms the foundation of connector development. In the upcoming modules, we'll explore the various interfaces that enable your connector to interact with BigID and data sources.