Data Classification: Explaining the What, Why, and How [ + Free Template]

February 05, 2025

Author

Anna Fitzgerald

Senior Content Marketing Manager

Reviewer

Fortuna Gyeltsen

Senior Manager, Compliance and Product (Automation)

According to Statista, global data creation is projected to explode over the next decade, growing to more than 180 zettabytes by 2025 and over two thousand zettabytes by 2035. The rapid evolution of AI models is expected to drive this exponential growth in global data generation even more.

That means organizations are facing the challenge of storing and managing more data than ever before and safeguarding it against breaches. Data classification can help your organization address this challenge, enabling you to sort information according to risk level and set proper data security policies.

This guide will explain how to determine data classification levels, what methods you can use to classify data, what steps and best practices you need to follow to create a data classification policy, and more.

What is data classification?

Data classification is the process of identifying and categorizing data using consistent labels to help simplify the management, security, and storage of that data.

Data classification comes after the data discovery process. So first, you scan your environment to determine where structured and unstructured data resides. It will likely be spread across databases, cloud storage services, and files like PDFs and emails, among other sources. Then, within these discovered data sources, you identify different types of data and assign them labels based on characteristics like:

Sensitivity: Examples include high, medium, and low-sensitivity.
Type of data asset: Examples include vendor invoices, customer invoices, and employee records.
Source information: Examples include licensed, acquired, or internally created.
Geopolitical information: Examples include US person or EU entities.

These labels are part of an organization’s data classification scheme. Now let’s take a closer look at why data classification is important.

Why data classification is important

The data your business creates includes everything from valuable intellectual property to calendar invites between coworkers and can pose a real challenge from a risk and data governance perspective. Your business needs a system for organizing both sensitive and low-priority data — that’s where data classification comes into play.

Data classification involves sorting information according to sensitivity level so you can set proper data security policies to mitigate the risk of that being altered, stolen, or destroyed. But data classification offers benefits beyond mitigating risk.

Let’s take a closer look at the benefits of data classification so you understand why your organization should start classifying its data.

Risk management

Data classification policies should help you develop a sensible risk management strategy. Once you identify the value of your data, you can implement security measures to minimize the risk of that data being altered, stolen, or destroyed.

Data classification is therefore a key part of risk management and data loss prevention strategies.

Data security and retrieval

Data classification can also be useful for creating data security and retrieval processes by helping you to:

Organize data by importance
Safeguard high sensitivity data
Streamline data searches and retrieval

Doing so can help your organization reduce user access to sensitive data, install the right data protection technologies, optimize resource utilization for less critical data, and securely share data assets with partners, contractors, and other organizations.

Data deduplication

Data classification can not only help make data more searchable and trackable — it can also help eliminate duplicate data. This can help your employees speed up their search process and also reduce storage and backup costs for your organization as a whole.

Organizational efficiency

Data classification policies can also help improve your organizational efficiency. For example, you can find and cut duplicate data to reduce storage and backup costs.

Regulatory compliance

Data classification can also help your organization comply with data privacy requirements and other rules and regulations by putting appropriate security controls in place and making data searchable and retrievable within specified timeframes.

Now that you know why data classification is worth the effort, we’ll walk you through how it’s accomplished.

Data sensitivity classification

Data classification requires you to assess the level of sensitivity of data across your organization. These levels typically range from high to medium to low and correlate to how damaging it would be if that data was lost, stolen, or compromised.

Classifying data in this way helps organizations understand where to focus their risk mitigation efforts. The more sensitive the data is, the more your organization needs to focus on protecting it.

Low sensitivity data

Low sensitivity data is data that would have little to no impact if compromised, lost, or destroyed (although an organization may still put security controls in place to protect against damages). Low sensitivity data is for public use and doesn’t require any confidentiality protections. It is commonly labeled unrestricted or public data, depending on their classification model.

Examples of low sensitivity data include:

Public information and web pages, such as job postings, blog posts, etc.
Press releases
Employee directory

Medium sensitivity data

Medium sensitivity data is data that would not have a catastrophic impact if compromised, lost, or destroyed but would result in some risk to an organization. This data should therefore only be accessible to internal personnel who were granted access and is commonly labeled internal or private.

Examples of medium sensitivity data include:

Internal emails or documents that don’t contain confidential data
Supplier contracts
IT service management or telecommunication information

High sensitivity data

High sensitivity data is data that if compromised, lost, or destroyed would have a catastrophic impact on an organization. Organizations must therefore place the strictest access controls on high sensitivity data. Because access is limited on a need-to-know basis, high sensitivity data is commonly labeled confidential or restricted data.

Examples of high sensitivity data include:

Financial records, such as credit card numbers
Medical and biometric data, including protected health information (PHI)
Employee records, including personally identifiable information (PII) like Social Security numbers
Authentication data, such as login credentials

Data sensitivity modeling table showing three levels of sensitivity with examples for each

Data classification levels

Based on the sensitivity of data, among other factors, organizations can classify data into different levels. These may vary by organization.

Typically, an organization adopts a data classification scheme or framework to identify and categorize its data assets. This scheme is made up of three to five levels based on the criticality and sensitivity of data in order to help determine appropriate security controls. Levels are typically arranged from least to most sensitive.

Organizations should design their own data classification schemes based on their need to protect proprietary, business, and/or user data with varying levels of sensitivity and to meet compliance and regulatory requirements. However, they can use classification schemes developed by governments as well as private sector organizations as a starting point. Below are two examples.

Confidential data classification

For example, the U.S. government has three classification levels for data based on the potential impact to national security if it is disclosed. These are defined in Executive Order 12356:

Confidential: Unauthorized disclosure of this information would likely cause damage to national security.
Secret: Unauthorized disclosure of this information would likely cause serious damage to national security.
Top Secret: Unauthorized disclosure of this information would likely cause exceptionally grave damage to national security.

NIST data classification

The National Institute of Standards and Technology (NIST) also has three levels for classifying federal information systems and information. This is defined in Federal Information Processing Standards (FIPS) 199. These three levels are based on the potential impact to not just confidentiality but also the integrity and availability of information and information systems applicable to an organization’s mission:

Low: Unauthorized disclosure of this information would have a limited adverse effect on organization operations, organization assets, or individuals.
Moderate: Unauthorized disclosure of this information would have a serious adverse effect on organization operations, organization assets, or individuals.
High: Unauthorized disclosure of this information would have a severe or catastrophic adverse effect on organization operations, organization assets, or individuals.

Organizations can use secondary labels within these levels to specify different data assets and handling procedures or compliance and regulatory requirements. For example, an organization that only collects financial records may classify that as “confidential data” but an organization that collects medical records may classify that more specifically as “protected health information” in order to indicate that HIPAA requirements apply to that data.

Data classification examples

While the NIST data classification scheme is widely recognized as an adequate classification scheme in sector-specific, national, and international certifications, organizations should develop their own classification schemes based on their unique organizational and risk management needs.

For inspiration, we’ll look at some examples of organizations and the classification policy they have implemented.

UW-Madison

UW-Madison classifies data into four categories, which are used to determine how to provision access to data to individuals. The categories are:

Public: The unauthorized disclosure, alteration or destruction of this data would result in little or no risk to the University and its affiliates. Any data displayed on websites or published without access restrictions should be classified as public.
Internal: The unauthorized disclosure, alteration or destruction of this data could result in some risk to the University and its affiliates. By default, any data that is not explicitly classified in the other three categories should be classified as internal.
Sensitive: The unauthorized disclosure, alteration, loss or destruction of this data could cause a moderate level of risk to the University, affiliates or research projects.
Restricted: The unauthorized disclosure, alteration, loss or destruction of that data could cause a significant level of risk to the University, affiliates or research projects. If protection of the data is required by law or regulation or UW-Madison is required to self-report to the government and/or provide notice to the individual if the data is inappropriately accessed, then it should be classified as restricted.

Harvard

Harvard classifies data into five levels:

L1: L1 refers to public information. The University intentionally provides this information to the public. Published research, course catalogs, regulatory and legal findings, published annual reports, released patents, and university-wide policies are all examples.
L2: L2 refers to low risk confidential information. The University chooses to keep this information private within the Harvard community, but its disclosure beyond the community would not cause material harm. Department policies and procedures, Harvard training materials, drafts of research papers, and patent and grant applications are all examples.
L3: L3 refers to medium risk confidential information. The University intends to share this information only for those with a “business need to know” and disclosure beyond the intended recipients might cause material harm to individuals or the University. Non-directory student information, non-published faculty and staff information, budget /financial transactions information, and information specified as confidential by vendor contracts and NDAs are all examples.
L4: L4 refers to high risk confidential information. The University has strict controls for this information and disclosure beyond specified recipients would likely cause serious harm to individuals or the University. Passwords and PINs, system credentials, and private encryption keys are all examples.
L5: L5 is reserved for research data only, as determined by IRB or Data Use Agreement. Data that, if disclosed, could place the subject at severe risk of harm or data with contractual requirements for exceptional security measures should be classified as L5.

Penn State

Penn State has four classification levels. In University Policy AD95, these four different information classification types are outlined as well as the security controls required for each of them:

Low (Level 1): Unauthorized access, use, disclosure, or loss is likely to have low or no risk to individuals, groups, or the University. Any data made freely available by public sources belongs to this level.
Moderate (Level 2): Unauthorized access, use, disclosure, or loss is likely to have adverse effects for individuals, groups, or the University, but will not have a significant impact on the University. Personnel records belongs to this level.
High (Level 3): Unauthorized access, use, disclosure, or loss is likely to have significant and severe adverse effects for individuals, groups, or the University. HIPAA data belongs to this level.
Restricted (Level 4): Access and use is strictly controlled and restricted by laws, regulations, or contracts. PCI DSS data and data subject to FISMA moderate or high standards belongs to this level.

University of Missouri

University of Missouri's data classification system is comprised of four data classification levels (DCLs), each with their own associated requirements:

DCL1 – Public: Public data is openly available and may be freely shared without risk to the University, individuals, or affiliates. Examples include press releases, job postings, published research, and training manuals.
DCL2 – Sensitive: Sensitive data is not intended for public access and unauthorized disclosure could negatively impact the University, individuals, or affiliates. But is not specifically required to be protected by statute, regulation, or by department, division or University policy. Examples include budget details, employee IDs, internal policies, and unpublished research.
DCL3 – Restricted: Restricted data is highly confidential business or personal information that is often protected by legal, regulatory, or contractual requirements. Unauthorized access could cause serious harm. Examples include non-public student records (which is protected under the Family Educational Rights & Privacy Act ) and proprietary research.
DCL4 – Highly Restricted: Highly restricted data is subject to the stricted security requirements dictated by specific provisions in legal and regulatory mandates such as PCI DSS, HIPAA, and NIST 800-171. Unauthorized disclosure could severely impact the University, individuals, or affiliates. Examples include biometric data, e-commerce data, export controlled data, national security interest data, protected health information, social security numbers, and controlled unclassified information (CUI).

AWS

AWS recommends starting with a three-tiered data classification approach. Both public and commercial organizations that have adopted the AWS cloud have been able to sufficiently meet their data classification needs and requirements using the approach below.

Data classification tier	System security categorization	Cloud deployment model options
Unclassified	Low to High	Accredited public cloud
Official	Moderate to High	Accredited public cloud
Secret and above	Moderate to Hig	Accredited private/hybrid/community cloud/public cloud

Data classification methods

There are three primary ways in which your organization can perform data classification. Many organizations use some combination of all three.

Let’s take a brief look at how each method works in practice.

User-based classification

Under user-based classification, you manually decide how to classify files. You can flag sensitive documents when they’re created, after an edit, or before a document is released.

Content-based classification

Content-based classification involves reviewing files and documents for sensitive information before classifying them. A risk category is assigned based on what’s inside each file or document.

Context-based classification

Context-based classification uses metadata instead of content to find indicators of sensitive information.

Examples of metadata include:

The application that created the file (accounting, financial, or healthcare software)
The user who created the document (e.g., a member of the accounting department)
The location where a file was created (e.g., accounting department building)

Both content and context-based classification are types of automated classification. While automated classification tends to be more efficient than user-based classification, you should still verify the results manually. That’s why organizations typically employ two or three of these methods.

Once you determine which classification system is right for your organization, you can kick off your data classification process.

Data classification process: How to do data classification

There are some key steps your organization should take during the data classification process.

1. Conduct a risk assessment

To start, you need to think strategically about your data. Where are you vulnerable? How can you optimize your protection?

Here are a few questions that can help you understand your data and what corporate, regulatory, and contractual privacy and confidentiality requirements apply to your organization:

Who creates or owns the information?
Who is responsible for the integrity and accuracy of the data?
Where is the information stored?
What sensitive data do we have?
Who has permission to access, change, archive, or delete the information?
How will it affect our business if the data is stolen, destroyed, or altered?
Is the information subject to any regulations or compliance/industry standards? If yes, what are the penalties for non-compliance?

After following these practices, you should understand your business’s data better. This will help you develop the best strategy for its management and protection.

2. Define your objectives and what you would like data categorization to achieve.

Next, clearly define your primary goals for data categorization. Do you want to inform regulatory compliance processes, increase employee productivity, or reduce data management and storage costs? All of the above? This step should involve stakeholders from security, compliance, and legal.

3. Determine the categories and criteria you will use to classify data.

Once you understand why you’re classifying your data, you can better determine how to do so. There are multiple ways you can organize data: using metadata, tags, file type, character units, and size of data packets are just a few examples.

You should also establish classification levels at this stage.

4. Formalize a data classification policy.

A data classification policy should clearly outline your organization’s objectives in putting a data classification process in place, the taxonomy that will be used to classify data, and the roles and responsibilities of data owners, including how they classify data and grant access to it.

A data classification policy is comprised of both the data classification scheme and the formal description of all data types within an organization. The purpose is to enable any affected parties, including external parties who share or receive data, to have a common understanding and identify different types of data.

5. Outline employees’ roles and responsibilities in following data classification protocols.

Employees should clearly understand they’re responsible and accountable for their use of sensitive and low-priority data. Risk mitigation steps and automated policies should be documented. This will allow employees to know to move or archive PHI if unused for 180 days, for example, or how to detect and report control failures or violations.

6. Develop security standards that align with data categories, tags, and compliance regulations.

Once data has been classified by category, tag, and/or compliance regulations, you can determine appropriate security controls for protecting it. For example, medical, credit card, and personally identifiable information (PII) must be handled appropriately for different regulations and therefore may require unique security standards.

7. Periodically re-evaluate your classification criteria and process.

Data classification is not a one and done process. You should periodically review your classification criteria and process as a whole to keep up with changing regulations and business objectives. This may be done on an annual basis or at whatever frequency is possible based on available resources.

Still unsure of what to include in your data classification policy? Use our template as a foundation to quickly create your own.

Download the free data classification policy template

Use this auditor-approved data classification policy template to better understand, manage, and protect your data.

Data classification policy examples

To aid in the development of your organization’s data classification policy, check out examples of data classification policies implemented at universities below.

1. The University of Kansas's Data Classification And Handling Policy

This policy outlines how data at the University of Kansas is classified and handled to ensure its confidentiality. It outlines its purpose, who it applies to, and defines three data classification levels based on sensitivity (confidential, sensitive, and public).

2. Boston University's Data Classification Policy

Boston University's Data Classification Policy provides a common vocabulary that individuals can use to describe University data and quantify the amount of protection required. This policy defines four categories into which all University Data can be divided:

Public
Internal
Confidential
Restricted Use

3. Fordham University’s Data Classification and Protection Policy

Fordham University’s Data Classification and Protection Policy establishes a framework for classifying institutional data based on its level of sensitivity, value, and criticality to the University. It defines three categories: Fordham Protected Data, Fordham Sensitive Data, or Public Data.

4. Macy's Data Classification Policy

Macy's policy classifies data into three levels:

Protected: Limited or non-personal information (PI) or non-business sensitive information (BSI) used for routine business or operations.
Sensitive: PI or BSI that is confidential to the company including risk assessments, internal audits, purchasing information, primary account numbers (PANs) from Macy's credit cards, and vendor data
Highly Sensitive: Legally, certificatory, or contract protected sensitive data including PANs for Macy’s co-brand credit cards, corporate strategic documents, sensitive customer and partner data, investigative reports, employee health data, and financial account data.

5. Stanford

Standford's information security policy outlines its classification system. All University information is classified into one of four levels based on sensitivity and risk (Public, Confidential, Restricted, Prohibited). The classification level determines the security protections and access authorization mechanisms which must be used for the information. These are defined in the Stanford Minimum Security Guidelines.

Data classification best practices

Use these best practices to help build an effective data classification policy or improve an existing one.

Follow NIST guidelines: In 2023, NIST's National Cybersecurity Center of Excellence released the initial public draft of an internal report on data classification concepts and considerations for improving data collection. You should familiarize yourself with the basic terminology and fundamental concepts in data classification explained in NIST IR 8496.
Understand your data: You need to know what kind of data you have. Analyze your data and all regulations that your organization must follow. We’ll take a closer look at these regulations in the next section.
Create a data classification model: Next, you should build a data classification model. Start with a few basic classification levels. You can add more complex levels as needed.
Organize your data: Decide how to tag your data based on its level of sensitivity and potential impact. As the sensitivity increases from low to high, the classification level should also increase. Add more restrictions at each level.
Validate your results: All results, whether classified manually or automatically, should be reviewed and validated for accuracy. Create a process that clearly identifies who is involved and what steps are required to review and validate these results.
Figure out how your results can benefit your organization: Once you’ve validated your results, you can analyze them to determine their best use. Maybe they can be used to streamline workflows or enhance a data security policy that benefits your organization.
Change classification criteria as needed: Your classification criteria may need to be updated due to changes in business or new regulations. So you should establish a process not only for discovering and classifying new data but also for periodically reviewing your criteria.

Compliance frameworks for data classification

Compliance frameworks can be useful for building your data classification policies. There are several security frameworks that you should keep in mind when classifying data.

SOC 2

Systems and Organization Controls (SOC) 2 evaluates how a company’s security aligns with the Trust Services Criteria. These criteria include security, availability, confidentiality, processing integrity, and privacy.

This framework helps your organization manage customer data and third-party partner risk management.

While valuable, implementing SOC 2 can be complicated. Secureframe can help simplify your SOC 2 compliance.

HIPAA

The Health Insurance Portability and Accountability Act (HIPAA) created standards for protecting patient health information (PHI).

PHI is considered high-risk data. Healthcare organizations must follow strict cybersecurity practices to comply with HIPAA. You need procedures for classifying the data you collect, use, store, or transmit.

You can learn more about streamlining your HIPAA compliance here.

PCI DSS

The Payment Card Industry Data Security Standard (PCI DSS) requires businesses that handle credit card data to protect cardholders’ information.

Unlike government frameworks, private payment companies (MasterCard, Visa, etc.) enforce PCI DSS compliance.

Learn how you can accelerate your PCI DSS compliance with Secureframe.

GDPR

The General Data Protection Regulation (GDPR) protects the data of European Union citizens.

Under GDPR, any organization that handles an EU citizen’s personal data must have a data classification system. Organizations also need a system for tagging data as public, proprietary, or confidential.

You can get and maintain GDPR compliance securely with Secureframe — learn how.

How Secureframe can help simplify data classification

Secureframe can help you quickly set up a data classification policy that meets security and compliance requirements that apply to your organization and keep it up-to-date. Using the Secureframe platform, you can:

Start with a data classification policy template that’s been approved by former auditors or bring your own existing policy into the platform
Define data classification levels within this policy
Easily tailor this policy using Secureframe’s comprehensive policy editor with AI-powered text revisions
Once finalized, distribute this policy to employees and track acceptance
Use employee policy implementation and acceptance as evidence of adherence to corporate controls and framework requirements
Assign a policy owner and use version control to easily track changes and improve visibility

To learn more about how Secureframe can simplify data classification policy management and other aspects of compliance, request a demo.

FAQs

What is meant by data classification?

Data classification is the process of sorting data into different categories. This allows for easier data management, security, and storage.

What is data classification with example?

Data can be classified by sensitivity, from high to medium to low. High sensitivity data is data that if it were compromised, lost, or destroyed, would have a catastrophic impact on the organization. Examples of high sensitivity data include financial records, personally identifiable information (PII) or protected health information (PHI), authentication data, or proprietary data such as intellectual property.

What are the 3 main types of data classification?

The three main data types or categories that make up a data classification scheme are typically based on sensitivity:

Low sensitivity data may be classified as public.
Medium sensitivity data may be classified as internal or private.
High sensitivity data may be classified as confidential or restricted.

What is data classification in cyber security?

Data classification — the process of identifying and categorizing data based on its sensitivity, importance, and other predefined criteria — is foundational to cybersecurity. Once data is classified, organizations can apply the appropriate security and privacy requirements and controls to the data types in each classification level.

Anna Fitzgerald

Senior Content Marketing Manager

Anna Fitzgerald is a digital and product marketing professional with nearly a decade of experience delivering high-quality content across highly regulated and technical industries, including healthcare, web development, and cybersecurity compliance. At Secureframe, she specializes in translating complex regulatory frameworks—such as CMMC, FedRAMP, NIST, and SOC 2—into practical resources that help organizations of all sizes and maturity levels meet evolving compliance requirements and improve their overall risk management strategy.

Fortuna Gyeltsen

Senior Manager, Compliance and Product (Automation)

Fortuna Gyeltsen is a former auditor and security consultant with nearly fifteen years of experience in security, privacy, and compliance. As a consultant for Blue Canopy and Coalfire, she developed deep expertise in FISMA, ISO 27001, SOC 2, PCI DSS, BSI C5, and DoD IL 4 and 5. At Secureframe, she worked as a Senior Manager of Compliance and now of Product to help customers automate more of the compliance process so they can focus on big picture improvements rather than shallow work.

Secureframe Comply

Defense for CMMC

Risk & Vendor Management

Solutions

Top Frameworks

Partner Types

Partner Program

Security and Compliance Resources

Framework Resources

Customer Resources

Company