Connectors/Java: Difference between revisions

From BigID Developer Portal
No edit summary
No edit summary
Line 1: Line 1:
== Introduction to BigID Connectors and SDK ==
== SDK Structure and Data Model ==


Welcome to the BigID Connector Development course! This module will introduce you to the world of BigID connectors and the SDK that empowers you to build them. We'll cover the fundamentals, setting the stage for your journey into connector development.
In this module, we'll explore the core components of the BigID Connector SDK and delve into the data model that underpins connector development. Understanding these elements is crucial for building effective and interoperable connectors.


=== What are BigID Connectors? ===
=== Deep Dive into SDK Modules ===


BigID connectors are the bridges that connect the BigID platform to various data sources across your organization. They enable BigID to discover, classify, and analyze data residing in diverse systems, from databases and cloud storage to applications and mainframes.
The BigID Connector SDK comprises three primary modules, each serving a distinct purpose:


'''Purpose of BigID Connectors:'''
* **sdk-api:** This module houses the interfaces that define the core functionalities of a connector. These interfaces act as contracts, specifying the methods that your connector must implement to interact with BigID and the data source.
* **sdk-data:** This module provides the data model, which includes classes and objects that represent data source entities in a standardized format. This standardized representation ensures seamless communication between your connector and the BigID platform.
* **sdk-utils:** This module offers a collection of utility classes designed to simplify common development tasks. These utilities can help with data object conversion, iterator management, and other functionalities, streamlining your development process.


'''Data Discovery:''' Connectors allow BigID to scan and identify data sources, providing a comprehensive inventory of your data landscape.
=== Exploring the Data Object Hierarchy ===
'''Data Classification:''' Connectors facilitate the classification of data based on sensitivity, enabling you to understand the types of data you hold and where it resides.
'''Data Analysis:''' Connectors enable BigID to analyze data for various purposes, such as identifying personal information, detecting anomalies, and assessing risk.
'''Data Remediation:''' Connectors can support data remediation actions, such as deleting or anonymizing sensitive data.
=== 1.2 Overview of the BigID Connector SDK ===


The BigID Connector SDK is a powerful toolkit that provides a structured and efficient way to develop connectors. It offers a set of interfaces, data models, and utilities that streamline the development process and ensure consistency across connectors.
The SDK's data model employs a hierarchical structure to represent data source entities. This hierarchy consists of three main levels:


'''Key Capabilities of the SDK:'''
* **Container:**  Represents the highest-level entry point in a data source. Examples include a database, a cloud storage bucket, a root folder, or a user account.
* **Subcontainer:** Represents a secondary entry point within a container. Examples include a schema within a database, a folder within a bucket, or a workspace within a user account.
* **Leaf Object:** Represents the lowest level in the hierarchy, containing the actual data. Examples include tables in a database, files in a folder, emails in a mailbox, or tickets in a system.


'''Standardized Interfaces:''' The SDK defines a set of interfaces that represent common connector functionalities, such as connecting to a data source, enumerating objects, scanning data, and searching for information.
Understanding this hierarchy is crucial for effectively modeling data source entities and ensuring that your connector can accurately represent the structure of the data source to BigID.
'''Data Object Model:''' The SDK provides a data object model that represents data source entities in a standardized format, facilitating interoperability with the BigID platform.
'''Utilities:''' The SDK includes utility classes that simplify common tasks, such as data object conversion, iterator chaining, and pagination handling.
'''Testing Framework:''' The SDK offers a testing framework that enables you to simulate the BigID runtime environment and validate your connector's functionality.


=== Benefits of Using the SDK ===
<div class="mermaid">
graph TD
    Container --> Subcontainer
    Subcontainer --> LeafObject
</div>


Developing connectors with the BigID Connector SDK offers several advantages:
This diagram visually represents the relationship between the three levels of the data object hierarchy. The `Container` is the root, with `Subcontainer`s nested within it, and `LeafObject`s at the bottom, containing the actual data.


'''Faster Development:''' The SDK provides a structured framework and pre-built components, accelerating the development process.
 
'''Improved Consistency:''' The standardized interfaces and data model ensure consistency across connectors, making them easier to maintain and integrate.
=== Understanding Data Object Types ===
'''Enhanced Performance:''' The SDK incorporates best practices for performance optimization, helping you build efficient connectors.
 
'''Simplified Testing:''' The testing framework facilitates thorough testing and validation of your connectors.
The SDK defines four primary types of data objects, each tailored to a specific category of data source entities:
 
* **Base:** Represents fundamental objects like containers and subcontainers. These objects typically describe the structure and organization of the data source.
* **Structured:** Represents data objects with a defined schema, such as tables in relational databases or collections in NoSQL databases.
* **Unstructured:** Represents data objects without a predefined schema, such as files in a file system or documents in cloud storage.
* **App:** Represents data objects specific to applications, such as emails, messages, or tickets. These objects often contain a combination of structured and unstructured data.
 
Choosing the appropriate data object type for each entity in your data source is essential for accurate representation and efficient processing by BigID.
 
=== Data Object Members ===
 
Data objects in the SDK have three categories of members:
 
* **Mandatory Members:** These are essential fields that must be populated for the object to be processed correctly by BigID. They typically represent core attributes of the data source entity.
* **Optional Members:** These fields provide additional information about the data source entity and should be populated whenever possible without incurring significant performance overhead.
* **Additional Parameters:** These are dynamic properties that you can add to the data object to include extra metadata specific to your data source.
 
Understanding these member categories ensures that you provide the necessary information to BigID while maintaining flexibility to include data source-specific details.
 
=== Importance of Excluding Sensitive Data ===
 
It's crucial to remember that `DataSourceObjects` and `DataLink` objects should only contain metadata or indexing information. **Never include sensitive data, such as passwords, personally identifiable information (PII), or other confidential details, in these objects.**
 
BigID provides mechanisms for handling sensitive data during the scanning process, and including it in the metadata objects can pose security risks.
 
This concludes the module on SDK Structure and Data Model. You now have a deeper understanding of the SDK structure and the data model that forms the foundation of connector development. In the upcoming modules, we'll explore the various interfaces that enable your connector to interact with BigID and data sources.

Revision as of 18:51, 20 February 2025

SDK Structure and Data Model

In this module, we'll explore the core components of the BigID Connector SDK and delve into the data model that underpins connector development. Understanding these elements is crucial for building effective and interoperable connectors.

Deep Dive into SDK Modules

The BigID Connector SDK comprises three primary modules, each serving a distinct purpose:

  • **sdk-api:** This module houses the interfaces that define the core functionalities of a connector. These interfaces act as contracts, specifying the methods that your connector must implement to interact with BigID and the data source.
  • **sdk-data:** This module provides the data model, which includes classes and objects that represent data source entities in a standardized format. This standardized representation ensures seamless communication between your connector and the BigID platform.
  • **sdk-utils:** This module offers a collection of utility classes designed to simplify common development tasks. These utilities can help with data object conversion, iterator management, and other functionalities, streamlining your development process.

Exploring the Data Object Hierarchy

The SDK's data model employs a hierarchical structure to represent data source entities. This hierarchy consists of three main levels:

  • **Container:** Represents the highest-level entry point in a data source. Examples include a database, a cloud storage bucket, a root folder, or a user account.
  • **Subcontainer:** Represents a secondary entry point within a container. Examples include a schema within a database, a folder within a bucket, or a workspace within a user account.
  • **Leaf Object:** Represents the lowest level in the hierarchy, containing the actual data. Examples include tables in a database, files in a folder, emails in a mailbox, or tickets in a system.

Understanding this hierarchy is crucial for effectively modeling data source entities and ensuring that your connector can accurately represent the structure of the data source to BigID.

graph TD

   Container --> Subcontainer
   Subcontainer --> LeafObject

This diagram visually represents the relationship between the three levels of the data object hierarchy. The `Container` is the root, with `Subcontainer`s nested within it, and `LeafObject`s at the bottom, containing the actual data.


Understanding Data Object Types

The SDK defines four primary types of data objects, each tailored to a specific category of data source entities:

  • **Base:** Represents fundamental objects like containers and subcontainers. These objects typically describe the structure and organization of the data source.
  • **Structured:** Represents data objects with a defined schema, such as tables in relational databases or collections in NoSQL databases.
  • **Unstructured:** Represents data objects without a predefined schema, such as files in a file system or documents in cloud storage.
  • **App:** Represents data objects specific to applications, such as emails, messages, or tickets. These objects often contain a combination of structured and unstructured data.

Choosing the appropriate data object type for each entity in your data source is essential for accurate representation and efficient processing by BigID.

Data Object Members

Data objects in the SDK have three categories of members:

  • **Mandatory Members:** These are essential fields that must be populated for the object to be processed correctly by BigID. They typically represent core attributes of the data source entity.
  • **Optional Members:** These fields provide additional information about the data source entity and should be populated whenever possible without incurring significant performance overhead.
  • **Additional Parameters:** These are dynamic properties that you can add to the data object to include extra metadata specific to your data source.

Understanding these member categories ensures that you provide the necessary information to BigID while maintaining flexibility to include data source-specific details.

Importance of Excluding Sensitive Data

It's crucial to remember that `DataSourceObjects` and `DataLink` objects should only contain metadata or indexing information. **Never include sensitive data, such as passwords, personally identifiable information (PII), or other confidential details, in these objects.**

BigID provides mechanisms for handling sensitive data during the scanning process, and including it in the metadata objects can pose security risks.

This concludes the module on SDK Structure and Data Model. You now have a deeper understanding of the SDK structure and the data model that forms the foundation of connector development. In the upcoming modules, we'll explore the various interfaces that enable your connector to interact with BigID and data sources.