Amazon Macie

A fully managed data security service that uses machine learning and pattern matching to discover and classify sensitive data — almost exclusively in Amazon S3. Macie answers "where is my sensitive data, and is it exposed," not "is something attacking me" (GuardDuty) or "is there a known vulnerability" (Inspector).

Detective — data discovery Data Protection — primary role Compliance — GDPR/HIPAA support

1 GB

Free sensitive-data scan/acct/region/month

30 days

Free trial

100+

Managed data identifiers

10,000

Buckets covered by posture monitoring

How Macie Actually Works

The Core Mechanism — Two Distinct Jobs, Often Confused

Bucket-level posture evaluation — the moment Macie is enabled, it automatically and continuously inventories every S3 bucket in scope and evaluates whether each is public, unencrypted, or shared/replicated outside your AWS Organization. This is always-on and free — it happens regardless of whether you ever run a data-classification scan.
Sensitive data discovery is the separate, billable layer: Macie actually opens objects and inspects their contents for sensitive data using ML and pattern matching against managed data identifiers (PII, financial data, credentials) and any custom identifiers you define.
Automated sensitive data discovery — the default, cost-efficient mode. Macie intelligently samples a subset of objects (using resource clustering by bucket name, file type, and prefix to avoid redundant scanning) to build and continuously maintain a sensitivity score and interactive data map for every bucket, without needing to scan every object.
Targeted (classic) discovery jobs — an on-demand or scheduled job you explicitly configure to scan specific buckets, optionally in full rather than sampled, typically used to satisfy a specific compliance requirement (e.g. a full HIPAA-driven scan of a particular bucket).
Sample retrieval — once sensitive data is found, Macie can temporarily retrieve up to 10 KMS-encrypted examples so an analyst can validate the finding without manually digging through the object.

⚠️ The Recurring Exam Theme

Nearly every Macie question tests one of three things: (1) do you know Macie's sensitive-data scanning scope is S3 only — it does not natively scan RDS, DynamoDB, EBS, or EFS, (2) can you distinguish automated discovery (continuous, sampled, low-cost) from a targeted job (on-demand, can be full-scan, more expensive), and (3) can you separate Macie's job (classify sensitive data) from GuardDuty S3 Protection's job (detect anomalous access to that data).

Exam Domain Mapping

Domain	Where Macie Shows Up
Data Protection	The centerpiece — sensitive data classification, managed/custom identifiers, encryption-aware scanning, GDPR/HIPAA use cases
Security Logging & Monitoring	Findings to Security Hub/EventBridge, automated remediation of exposed buckets
Management & Security Governance	Organizations-wide enablement, per-account automated discovery control, exclusion lists
Infrastructure Security	Bucket-level posture findings (public access, encryption, external sharing)

Decision Tree — Mental Model

Threat

Unknown locations of sensitive data (PII, financial data, credentials) in S3; publicly exposed or unencrypted buckets containing that data; regulatory non-compliance (GDPR/HIPAA)

↓

Security Goal

Discover where sensitive data lives across S3, continuously and cost-efficiently, and evaluate the security posture of the buckets holding it

↓

AWS Service

Amazon Macie

Bucket posture evaluation (always-on, free) Automated sensitive data discovery (sampled, continuous) Targeted discovery jobs (on-demand, full-scan capable) Sample retrieval (validate findings)

↓

Implementation

Enable via Organizations delegated administrator. Configure automated discovery scope per account; exclude non-sensitive buckets. Define custom identifiers and allow lists.

↓

Monitoring

Findings → EventBridge (routing) and Security Hub (aggregation). Sensitivity scores guide which buckets need a deeper targeted job.

↓

Remediation

EventBridge-triggered Lambda blocks public access, enables default encryption, or revokes external sharing on a flagged bucket.

Final Summary

Must Memorize

Macie's sensitive-data scanning scope is S3 only
Bucket posture evaluation is always-on and free; sensitive-data scanning is billable
Automated discovery (sampled, continuous) vs targeted jobs (on-demand, can be full-scan)
Macie = data classification, not anomalous access detection (that's GuardDuty S3 Protection)

Must Understand

Why resource clustering/sampling makes automated discovery cost-efficient at scale
Managed vs custom data identifiers, and when allow lists matter
The EventBridge-driven remediation pattern for exposed buckets
The distinction triangle: Macie (what data exists) vs GuardDuty S3 Protection (who's accessing it) vs Inspector (vulnerabilities)

Can De-prioritize

Exact list of all 100+ managed data identifiers
Console UI navigation specifics
Precise regional pricing figures

Exam appearance probability: HIGH

Discovery Mechanics & Capabilities

Macie's two layers — posture evaluation and content discovery — operate independently. Understanding which layer a finding came from is half the exam battle.

2.1 Automatic Bucket-Level Posture Evaluation Frequently misunderstood

TriggerAutomatic the moment Macie is enabled — no job configuration needed

ChecksPublic accessibility, encryption status, sharing/replication outside the AWS Organization

CostFree — part of the always-on inventory, distinct from billable content scanning

This layer runs whether or not you ever enable sensitive data discovery — many candidates incorrectly assume all Macie functionality is billable.
Now scales to monitor up to 10,000 general purpose S3 buckets per account for this preventative posture monitoring.

2.2 Automated Sensitive Data Discovery High exam relevance

WhatContinuous, intelligently sampled scanning that builds an interactive data map and sensitivity score per bucket

HowResource clustering by bucket name, file type, and prefix minimizes redundant scanning across similar objects

Default mode for new Macie deployments — the cost-efficient way to get organization-wide visibility without scanning every object in every bucket.
A delegated Macie administrator can enable/disable automated discovery per individual account, and exclude specific S3 buckets from the analysis — granular scope control at the organization level.
Member accounts (non-administrator) have read access to their own automated-discovery statistics and inventory data.

2.3 Targeted (Classic) Sensitive Data Discovery Jobs High exam relevance

WhatExplicitly configured, on-demand or scheduled job scanning specific buckets — can scan fully, not just sampled

Used when automated discovery's sampling isn't sufficient — e.g. a compliance mandate (HIPAA, GDPR) requiring a documented full scan of a specific bucket at a specific point in time.
"Run a full, on-demand scan of this specific bucket to satisfy an audit requirement" → targeted job, not automated discovery.

2.4 Managed & Custom Data Identifiers

Managed identifiers100+ built-in patterns: PII, financial data (credit cards), credentials (AWS keys, Stripe keys, Google Cloud keys), regional government IDs

Custom identifiersRegex-based patterns you define for proprietary/organization-specific sensitive data formats

Macie also supports a configurable default set of managed identifiers specifically recommended for discovery jobs, or a custom subset you choose.

2.5 Allow Lists

PurposeSuppress known false positives — e.g. test/synthetic data that matches a sensitive-data pattern but isn't actually sensitive

2.6 Sensitive Data Sample Retrieval

WhatOne-click, temporary retrieval of up to 10 examples of the sensitive data found in an object

SecurityEncrypted with customer-managed KMS keys, viewable only temporarily within the console

Lets an analyst validate a finding (is this really a credit card number, or a false positive) without independently digging through the raw object.

2.7 Encryption-Aware Scanning

WhatSupports analyzing objects encrypted with dual-layer server-side encryption using KMS keys (DSSE-KMS)

AWS Exam Thinking

Requirement → Keywords → Expected Answer → why every distractor fails.

Find where PII/financial data lives across S3

PIIwhere is sensitive dataGDPR / HIPAA

Expected Answer

Amazon Macie

Distractor	Why it's wrong
`GuardDuty S3 Protection`	Detects anomalous access patterns, doesn't classify the data's content/sensitivity
`Inspector`	Finds software vulnerabilities, not data content
`AWS Config`	Evaluates configuration compliance, not object content

Detect a publicly exposed S3 bucket without running a data scan

public bucketno scan requiredfree

Expected Answer

Macie's automatic bucket-level posture evaluation (always-on, free)

Distractor	Why it's wrong
Run a targeted sensitive data discovery job	Unnecessary cost/complexity — posture evaluation already covers public/encryption/sharing status automatically
`AWS Config + S3 public access rule`	Valid alternative, but Macie's posture evaluation is purpose-built and free the moment Macie is enabled

Continuously, cost-efficiently discover sensitive data org-wide

continuouscost-efficientorganization-wide

Expected Answer

Macie automated sensitive data discovery

Distractor	Why it's wrong
Run a full targeted discovery job on every bucket every day	Massively more expensive — automated discovery's sampling/clustering is specifically built to avoid this cost

Satisfy an audit requirement for a documented full scan of a specific bucket

auditfull scanspecific bucket

Expected Answer

Macie targeted (classic) sensitive data discovery job

Distractor	Why it's wrong
Rely on automated discovery's sampled results	Sampling, by design, doesn't guarantee every object was inspected — insufficient for a documented full-scan audit requirement

Suppress repeated false-positive findings on known test data

false positivetest data

Expected Answer

Macie allow list

Distractor	Why it's wrong
Custom data identifier	Used to find MORE specific sensitive patterns, not to suppress known-safe ones

Discover sensitive data stored in RDS / DynamoDB / EBS

RDSDynamoDBnon-S3

Expected Answer

Not Macie — Macie's content scanning is S3-only

Distractor	Why it's wrong
Macie	This is the classic trap — Macie does NOT natively scan RDS, DynamoDB, EBS, or EFS for sensitive data. Exporting/extracting data to S3 first would be required, or a different approach entirely

Security Controls Mapping & Integrations

4 — Controls Mapping

Detective

Sensitive data discovery findings and bucket posture findings — e.g. "S3 object contains PII" or "bucket is publicly accessible"

Data Protection

Macie's core identity — classification of sensitive data and assessment of the controls (encryption, access) protecting it

Responsive (via integration, not native)

EventBridge → Lambda triggered by a finding — e.g. automatically apply S3 Block Public Access or enable default bucket encryption when an exposed-sensitive-bucket finding fires

Compliance

Supports GDPR/HIPAA-driven data mapping and audit requirements through targeted discovery jobs and continuous data maps

⚠️ Macie is NOT preventive and does NOT detect anomalous access

It classifies what data exists and evaluates static bucket posture. It does not block access, and it does not flag unusual GetObject patterns (that's GuardDuty S3 Protection's job).

5 — Integrations

EventBridge

WhatAll Macie findings are sent to EventBridge

PatternFinding → rule → Lambda (e.g. automatically block public access to a flagged bucket)

Security Hub

WhatFindings can be published into Security Hub for aggregation

WhyCross-service correlation — exposure findings can combine Macie's "sensitive data present" signal with GuardDuty/Inspector signals on the same resource

Organizations

WhatDelegated administrator model with multi-account support

CapabilityPer-account enable/disable of automated discovery, bucket exclusions, member account read access to their own stats

KMS

WhatEncrypts retrieved sensitive data samples; supports analyzing DSSE-KMS encrypted objects

Athena & QuickSight

WhatMacie discovery results can be queried/visualized via Athena and QuickSight for custom reporting

Costs, Limits & Quotas

Pricing Model

Bucket posture evaluationAlways free, regardless of other Macie usage

Sensitive data discoveryFirst 1 GB per account per region per month free; billed thereafter based on data evaluated

Trial30-day free trial including automated discovery and S3 bucket-level evaluation

Common Cost Mistakes

Running full targeted jobs across every bucket when automated discovery's sampling would provide sufficient ongoing visibility at far lower cost
Not excluding clearly non-sensitive buckets (e.g. public website assets, build artifacts) from automated discovery scope
Forgetting that storage class and object count both factor into evaluation cost — very large or numerous-object buckets cost more to scan

Cost Optimization

Default to automated discovery; reserve targeted full-scan jobs for specific compliance-driven needs
Exclude non-sensitive buckets from automated discovery scope at the delegated administrator level
Use allow lists to avoid wasted analyst time on repeated known-false-positive findings (this saves operational time, not scan cost)

Limits & Quotas

ScopeRegional service — enable per region

Data sourceS3 only for content/sensitive-data scanning — no native RDS/DynamoDB/EBS/EFS support

Posture monitoring scaleUp to 10,000 general purpose S3 buckets per account

Custom identifiersRegex-based, account-defined, count-limited per account

⚠️ Exam trap

The S3-only scope is the single most tested limitation. A scenario asking to discover PII inside an RDS database or DynamoDB table is explicitly testing whether you know Macie cannot do this natively — data would need to be exported/extracted to S3 first, or a different tool used entirely.

Best Practices & Common Exam Traps

8 — Best Practices

Must Know

Macie's content scanning scope is S3 only
Bucket posture evaluation is always-on and free; content scanning is billable
Automated discovery (sampled, continuous, cheap) vs targeted jobs (on-demand, full-scan capable, audit-grade)
Macie classifies data; it does not detect anomalous access (GuardDuty's job) or block anything

Good Practice

Enable org-wide via delegated administrator
Exclude clearly non-sensitive buckets from automated discovery to control cost
Forward findings to EventBridge for automated remediation of exposed buckets
Use allow lists to cut down on repeated known-false-positive review time

Advanced Practice

Build custom data identifiers for organization-specific proprietary data formats
Use Athena/QuickSight to build custom compliance dashboards from discovery results
Schedule targeted jobs specifically aligned to compliance audit cycles (e.g. quarterly HIPAA scans of PHI buckets)

9 — Common Exam Traps

Misconception	Reality
"Macie scans RDS, DynamoDB, or EBS for sensitive data"	Macie's content discovery is S3-only — no native support for other data stores
"All Macie functionality costs money"	Bucket-level posture evaluation (public/encrypted/shared status) is always free and automatic
"Automated discovery scans every object in every bucket"	It intelligently samples using resource clustering — it does not guarantee every object was inspected, unlike a full targeted job
"Macie detects who is accessing sensitive data and when"	That's GuardDuty S3 Protection's job — Macie only classifies what sensitive data exists, not access patterns
"Macie blocks public access automatically"	Macie only generates a finding — actually blocking access requires a separate remediation action (e.g. S3 Block Public Access via EventBridge/Lambda)

Macie vs. The Lookalikes

Service	What it actually answers
vs GuardDuty S3 Protection	Macie = what sensitive data exists in this bucket (content classification). GuardDuty S3 Protection = is access to this bucket behaving anomalously (behavioral). Complementary — Macie tells you what's at risk, GuardDuty tells you if it's being attacked
vs Inspector	Macie = sensitive data content. Inspector = software vulnerabilities. Different problem spaces entirely, occasionally paired as distractors
vs AWS Config	Config evaluates resource CONFIGURATION against rules generally. Macie's posture evaluation is purpose-built specifically for S3 (public/encrypted/shared) plus the unique content-classification layer Config cannot do at all
vs Security Hub	Macie generates the underlying sensitive-data finding. Security Hub aggregates it alongside GuardDuty/Inspector/CSPM findings to build correlated exposure findings — Security Hub doesn't classify data itself

Flashcards — 16 Cards

Click card to flip. Mark right or wrong to track score.

Click to reveal answer

1 / 16

Mark: Score: 0/0

Practice Quiz — 10 Questions

SCS-C02 scenario style, Easy → Specialty. Select an answer to reveal the explanation.

out of 10 correct