Connectors/Java: Difference between revisions

From BigID Developer Portal
(Created page with "== Module 1: Introduction to BigID Connectors and SDK == Welcome to the BigID Connector Development course! This module will introduce you to the world of BigID connectors and the SDK that empowers you to build them. We'll cover the fundamentals, setting the stage for your journey into connector development. === 1.1 What are BigID Connectors? === BigID connectors are the bridges that connect the BigID platform to various data sources across your organization. They ena...")
 
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Module 1: Introduction to BigID Connectors and SDK ==
== SDK Structure and Data Model ==


Welcome to the BigID Connector Development course! This module will introduce you to the world of BigID connectors and the SDK that empowers you to build them. We'll cover the fundamentals, setting the stage for your journey into connector development.
In this module, we'll explore the core components of the BigID Connector SDK and delve into the data model that underpins connector development. Understanding these elements is crucial for building effective and interoperable connectors.


=== 1.1 What are BigID Connectors? ===
=== Deep Dive into SDK Modules ===


BigID connectors are the bridges that connect the BigID platform to various data sources across your organization. They enable BigID to discover, classify, and analyze data residing in diverse systems, from databases and cloud storage to applications and mainframes.
The BigID Connector SDK is comprised of three modules, each serving a distinct purpose:


'''Purpose of BigID Connectors:'''
* '''sdk-api''' This module houses the interfaces that define the core functionalities of a connector. These interfaces act as contracts, specifying the methods that your connector must implement to interact with BigID and the data source.
* '''sdk-data:''' This module provides the data model, which includes classes and objects that represent data source entities in a standardized format. This standardized representation ensures seamless communication between your connector and the BigID platform.
* '''sdk-utils:''' This module offers a collection of utility classes designed to simplify common development tasks. These utilities can help with data object conversion, iterator management, and other functionalities, streamlining your development process.


'''Data Discovery:''' Connectors allow BigID to scan and identify data sources, providing a comprehensive inventory of your data landscape.
=== Exploring the Data Object Hierarchy ===
'''Data Classification:''' Connectors facilitate the classification of data based on sensitivity, enabling you to understand the types of data you hold and where it resides.
'''Data Analysis:''' Connectors enable BigID to analyze data for various purposes, such as identifying personal information, detecting anomalies, and assessing risk.
'''Data Remediation:''' Connectors can support data remediation actions, such as deleting or anonymizing sensitive data.
=== 1.2 Overview of the BigID Connector SDK ===


The BigID Connector SDK is a powerful toolkit that provides a structured and efficient way to develop connectors. It offers a set of interfaces, data models, and utilities that streamline the development process and ensure consistency across connectors.
The SDK's data model employs a hierarchical structure to represent data source entities. This hierarchy consists of three main levels:


'''Key Capabilities of the SDK:'''
* A '''Container'''  Represents the highest-level entry point in a data source. Examples include a database, a cloud storage bucket, a root folder, or a user account.
* A '''Subcontainer''' Represents a secondary entry point within a container. Examples include a schema within a database, a folder within a bucket, or a workspace within a user account.
* A '''Leaf Object''' Represents the lowest level in the hierarchy, containing the actual data. Examples include tables in a database, files in a folder, emails in a mailbox, or tickets in a system.


'''Standardized Interfaces:''' The SDK defines a set of interfaces that represent common connector functionalities, such as connecting to a data source, enumerating objects, scanning data, and searching for information.
Understanding this hierarchy is crucial for effectively modeling data source entities and ensuring that your connector can accurately represent the structure of the data source to BigID.
'''Data Object Model:''' The SDK provides a data object model that represents data source entities in a standardized format, facilitating interoperability with the BigID platform.
'''Utilities:''' The SDK includes utility classes that simplify common tasks, such as data object conversion, iterator chaining, and pagination handling.
'''Testing Framework:''' The SDK offers a testing framework that enables you to simulate the BigID runtime environment and validate your connector's functionality.
=== 1.3 Benefits of Using the SDK ===


Developing connectors with the BigID Connector SDK offers several advantages:
<html>
<center>
<div class="mermaid">
graph TD
    Container --> Subcontainer
    Subcontainer --> LeafObject
</div>
</center>
</html>
This diagram visually represents the relationship between the three levels of the data object hierarchy. The ''Container'' is the root, with ''Subcontainers'' nested within it, and ''LeafObjects'' at the bottom, containing the actual data.


'''Faster Development:''' The SDK provides a structured framework and pre-built components, accelerating the development process.
{{Mermaid}}
'''Improved Consistency:''' The standardized interfaces and data model ensure consistency across connectors, making them easier to maintain and integrate.
 
'''Enhanced Performance:''' The SDK incorporates best practices for performance optimization, helping you build efficient connectors.
=== Understanding Data Object Types ===
'''Simplified Testing:''' The testing framework facilitates thorough testing and validation of your connectors.
 
'''Reduced Errors:''' The structured approach and clear guidelines minimize the risk of errors and ensure connector quality.
The SDK supports four types of data sources:
This concludes Module 1. You now have a foundational understanding of BigID connectors, the SDK. In the next module, we'll delve deeper into the SDK structure and data model.
 
* '''Structured''' Represents data objects with a defined schema, such as tables in relational databases or collections in NoSQL databases.
* '''Unstructured''' Represents data objects without a predefined schema, such as files in a file system or documents in cloud storage.
* '''App''' Represents data objects specific to applications, such as emails, messages, or tickets. These objects often contain a combination of structured and unstructured data.
 
Determining the appropriate data source type allows you to know what interfaces you need to implement.
 
=== Importance of Excluding Sensitive Data ===
 
It's crucial to remember that `DataSourceObjects` and `DataLink` objects should only contain metadata or indexing information. **Never include sensitive data, such as passwords, personally identifiable information (PII), or other confidential details, in these objects.**
 
BigID provides mechanisms for handling sensitive data during the scanning process, and including it in the metadata objects can pose security risks.
 
This concludes the module on SDK Structure and Data Model. You now have a deeper understanding of the SDK structure and the data model that forms the foundation of connector development. In the upcoming modules, we'll explore the various interfaces that enable your connector to interact with BigID and data sources.

Latest revision as of 19:22, 20 February 2025

SDK Structure and Data Model

In this module, we'll explore the core components of the BigID Connector SDK and delve into the data model that underpins connector development. Understanding these elements is crucial for building effective and interoperable connectors.

Deep Dive into SDK Modules

The BigID Connector SDK is comprised of three modules, each serving a distinct purpose:

  • sdk-api This module houses the interfaces that define the core functionalities of a connector. These interfaces act as contracts, specifying the methods that your connector must implement to interact with BigID and the data source.
  • sdk-data: This module provides the data model, which includes classes and objects that represent data source entities in a standardized format. This standardized representation ensures seamless communication between your connector and the BigID platform.
  • sdk-utils: This module offers a collection of utility classes designed to simplify common development tasks. These utilities can help with data object conversion, iterator management, and other functionalities, streamlining your development process.

Exploring the Data Object Hierarchy

The SDK's data model employs a hierarchical structure to represent data source entities. This hierarchy consists of three main levels:

  • A Container Represents the highest-level entry point in a data source. Examples include a database, a cloud storage bucket, a root folder, or a user account.
  • A Subcontainer Represents a secondary entry point within a container. Examples include a schema within a database, a folder within a bucket, or a workspace within a user account.
  • A Leaf Object Represents the lowest level in the hierarchy, containing the actual data. Examples include tables in a database, files in a folder, emails in a mailbox, or tickets in a system.

Understanding this hierarchy is crucial for effectively modeling data source entities and ensuring that your connector can accurately represent the structure of the data source to BigID.

graph TD Container --> Subcontainer Subcontainer --> LeafObject

This diagram visually represents the relationship between the three levels of the data object hierarchy. The Container is the root, with Subcontainers nested within it, and LeafObjects at the bottom, containing the actual data.

Understanding Data Object Types

The SDK supports four types of data sources:

  • Structured Represents data objects with a defined schema, such as tables in relational databases or collections in NoSQL databases.
  • Unstructured Represents data objects without a predefined schema, such as files in a file system or documents in cloud storage.
  • App Represents data objects specific to applications, such as emails, messages, or tickets. These objects often contain a combination of structured and unstructured data.

Determining the appropriate data source type allows you to know what interfaces you need to implement.

Importance of Excluding Sensitive Data

It's crucial to remember that `DataSourceObjects` and `DataLink` objects should only contain metadata or indexing information. **Never include sensitive data, such as passwords, personally identifiable information (PII), or other confidential details, in these objects.**

BigID provides mechanisms for handling sensitive data during the scanning process, and including it in the metadata objects can pose security risks.

This concludes the module on SDK Structure and Data Model. You now have a deeper understanding of the SDK structure and the data model that forms the foundation of connector development. In the upcoming modules, we'll explore the various interfaces that enable your connector to interact with BigID and data sources.