Apps/Role of BigID Apps

From BigID Developer Portal

The BigID Platform[edit]

BigID is implemented as a platform. This means the BigID core provides four services: Correlation, Classification, Clustering, and Catalog. All other capabilities are implemented as apps using the data from these four services.

Correlation[edit]

Correlation is the process of determining who a piece of information belongs to. For example, in our DSAR application we use correlation to report all of the information belonging to an individual. In our breach response application, we can use a sample of breached data to discover the individuals that need to be notified.

Classification[edit]

Classification is the process of determining the type of a piece of information. For example, classification can tell us that the string "+1 (917) 555-5555" is a phone number. It also can tell us that an image looks like a receipt. BigID uses regular expressions, NLP, and NER classifiers to determine the types of data and files. You can read more about our classification methodology here: https://bigid.com/blog/what-is-data-classification/

Clustering[edit]

Clustering is a commonly used machine learning technique. Google has a good article about clustering in general at https://developers.google.com/machine-learning/clustering/overview. For BigID, clustering is the process of combining like files together to generalize about the group. For example, if you have hundreds of files that look similar (similar types of data, layouts, text patterns, etc) BigID can determine that files it encounters in the future are going to have the same data elements.

Catalog[edit]

The BigID Catalog takes the results of the previous 3 services and allows them to be accessed and searched in one place. Along with that, the catalog also indexes metadata about the objects being scanned like access permissions and modification dates.

Where does my app come in?[edit]

All apps have the same access to the four core services above. This means your app can perform any use case that requires knowing the type, metadata or owner of a piece of data.