Personal Data Discovery: The What, Why and the How?

We live in a data-driven world, where businesses can’t survive without information and data. Every enterprise in this modern data economy is collecting millions of data points to know more about the consumers – name, gender, family, contact details, what they like/dislike, health data, biometric, job, travel, spending budget, income, savings and lots more. 

This personal data about consumers enables businesses to further improve or create services which are more meaningful and contextual to the consumers. This data can be acquired by enterprises directly or through 3rd parties, which also means that the individuals may not even know what personal data about them is being stored by enterprises and for what purposes. Additionally, storing this amount of sensitive personal data in an enterprise IT without proper security is like serving a meal to cyber criminals on a silver platter. 

In today’s digital world where almost everything is connected to the internet, data security and privacy is paramount. Gone are the days when you could easily prevent data-related threats by securing the IT perimeter. Now the sensitive data resides at thousands of places – AWS, Azure, Slack, Salesforce, Zendesk, M365, Google Workspace and more. Hence, there is no perimeter that exists anymore. Data security is not a tool or a feature you can turn on and off. It’s an ongoing commitment that requires fresh thinking and constant monitoring to ensure it remains effective.

What is Personal Data?

While companies are looking at advanced methods to protect their customers’ data from breach risks, it’s vital to know everything about the data that needs protection. According to GDPR, personal data covers a broader range of information than PII (Personally Identifiable Information), as commonly referred in North America. Simply put,

All PII is personal data. But not all personal data is PII.

As per GDPR, this is what personal data can be defined as:

“‘Personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.”

All businesses alike – private or government, that collect, use, or buy information, have ever-evolving responsibilities of protecting and categorizing this data that they store, use, report on, or sell.

What is PII?

Defined by GDPR, Personal Data is more widely used in Europe and PII is a term commonly used in the US. Any information that can be used for identification of an individual (directly/indirectly) is called PII (Personally Identifiable Information). PII forms a key element in government regulations, data protection frameworks, privacy policies, and a whole lot of different tech crimes. The more PII produced, the more complex it gets keeping it safe.

According to NIST (National Institute of Standards and Technology), PII can be explained as:

“Any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or trace an individual’s identity, such as name, social security number, date and place of birth, mother‘s maiden name, or biometric records; and (2) any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information.”

However, in the absence of a single source definition of PII, the ideal way of determining what constitutes PII and what doesn’t, is through thorough assessment of laws and regulations pertaining to that specific industry.

Individual rights under various regulations

There are several data protection acts or data privacy regulations that provide data rights which are enforced in countries across the world. These rights give control and authority over personal data of an individual. Following is the set of these powerful data rights:

  1. The right to access of data
  2. The right to change and/or rectification of data
  3. The right to data portability
  4. The right to erasure and/or deletion of data
  5. The right to know who’s collecting data
  6. The right to know who has access to data
  7. The right to know the location and processing destination of data
  8. The right to know the purpose and time frame of processing data

How companies collect individual data

Several ways which companies use to collect customer data can be broadly categorized in two buckets: 

1. Collecting it themselves:

  • Asking for the data directly from the individuals
    • Customer Profiles/Accounts: Names, Email ID, Phone #, Marital Status, Employer, etc.
    • Personal data on enterprise SaaS like health data, financial data, insurance data, patients’ data, etc.
    • Survey forms and landing page forms
  • Indirectly tracking the information about the individuals
    • Website cookies and web beacons
    • IoT sensors
    • Gathering social information from web

2. Collecting the data from 3rd parties:

  • Data sharing with 3rd parties through direct integrations like Facebook, LinkedIn, Google, etc.
  • Purchasing customer data from third parties who make their living by collecting, analyzing and selling customer data

How companies collect individual data

Personal data challenges faced by InfoSec and Privacy teams

Regardless of the industry, to work seamlessly, whether it’s a private cloud, a public cloud, or even a multicloud/hybrid infrastructure, all InfoSec and Privacy teams have to comply with several regulations for ensuring complete data security and protection.

Here are some of the biggest challenges faced by these teams:

  1. Personal data exposure risk: Exposure to the external world or even insiders (employees/consultants/contractors) can put customers’ personal data at high risk. This can be a csv carrying customers’ data residing in Google Drive with open link, patients’ medical images data lying in AWS S3 buckets with open access, financial data leaking in software logs having exposure to external software vendors, Zendesk tickets containing customers’ personal data for identification, etc.
  2. Data privacy risk: Is personal data storage, usage or sharing across the enterprise cloud (cloud IT and SaaS Apps) resulting in violation of any privacy compliance policies? This can be storage of EU citizens’ personal data in an EC2 instance running in Indian geography, sharing of the data with 3rd party without consent of the customer, storing of personal data in plain text, etc.
  3. Responding to DSAR (Data Subject Access Request): According to GDPR’s right to data access, customers have the right to know what information companies have and what they do with it. Recital 63, GDPR states,

    “A data subject should have the right of access to personal data which have been collected concerning him or her, and to exercise that right easily and at reasonable intervals, in order to be aware of, and verify, the lawfulness of the processing.”

    But finding personal data of a specific data subject within thousands of data sources, is a herculean task. Manual inspection of varied data sources is erroneous, costly and time consuming.

Personal data challenges faced by InfoSec and Privacy teams

Automated personal data visibility – the solution to challenges

Personal data visibility is the fundamental capability needed by InfoSec and Privacy teams to deal with challenges mentioned in the above section. You cannot protect what you don’t know and actionability without proper visibility is bound to fail.

At Airavana, we enable organizations InfoSec and Privacy teams to gain unified visibility of their customers’ sensitive data sprawl across multiple cloud applications. Our first principles and AI powered approach drastically reduces false positives. This visibility further empowers enterprise teams to automate and enforce violations monitoring policies across multiple Cloud 

personal data visibility

Apps and enables remediation to avoid data exposures (and breaches) and comply with privacy regulations like GDPR, CCPA, HIPAA, and more.

To know more, write to us at


Co-founder and CEO - Airavana Inc.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may also like

PHI & Healthcare data protection: A guide for healthcare enterprises