Amazon Macie
A fully managed data security service that uses machine learning and pattern matching to discover and classify sensitive data — almost exclusively in Amazon S3. Macie answers "where is my sensitive data, and is it exposed," not "is something attacking me" (GuardDuty) or "is there a known vulnerability" (Inspector).
Detective — data discovery
Data Protection — primary role
Compliance — GDPR/HIPAA support
1 GB
Free sensitive-data scan/acct/region/month
100+
Managed data identifiers
10,000
Buckets covered by posture monitoring
How Macie Actually Works
The Core Mechanism — Two Distinct Jobs, Often Confused
- Bucket-level posture evaluation — the moment Macie is enabled, it automatically and continuously inventories every S3 bucket in scope and evaluates whether each is public, unencrypted, or shared/replicated outside your AWS Organization. This is always-on and free — it happens regardless of whether you ever run a data-classification scan.
- Sensitive data discovery is the separate, billable layer: Macie actually opens objects and inspects their contents for sensitive data using ML and pattern matching against managed data identifiers (PII, financial data, credentials) and any custom identifiers you define.
- Automated sensitive data discovery — the default, cost-efficient mode. Macie intelligently samples a subset of objects (using resource clustering by bucket name, file type, and prefix to avoid redundant scanning) to build and continuously maintain a sensitivity score and interactive data map for every bucket, without needing to scan every object.
- Targeted (classic) discovery jobs — an on-demand or scheduled job you explicitly configure to scan specific buckets, optionally in full rather than sampled, typically used to satisfy a specific compliance requirement (e.g. a full HIPAA-driven scan of a particular bucket).
- Sample retrieval — once sensitive data is found, Macie can temporarily retrieve up to 10 KMS-encrypted examples so an analyst can validate the finding without manually digging through the object.
⚠️ The Recurring Exam Theme
Nearly every Macie question tests one of three things: (1) do you know Macie's sensitive-data scanning scope is S3 only — it does not natively scan RDS, DynamoDB, EBS, or EFS, (2) can you distinguish automated discovery (continuous, sampled, low-cost) from a targeted job (on-demand, can be full-scan, more expensive), and (3) can you separate Macie's job (classify sensitive data) from GuardDuty S3 Protection's job (detect anomalous access to that data).
Exam Domain Mapping
| Domain | Where Macie Shows Up |
| Data Protection | The centerpiece — sensitive data classification, managed/custom identifiers, encryption-aware scanning, GDPR/HIPAA use cases |
| Security Logging & Monitoring | Findings to Security Hub/EventBridge, automated remediation of exposed buckets |
| Management & Security Governance | Organizations-wide enablement, per-account automated discovery control, exclusion lists |
| Infrastructure Security | Bucket-level posture findings (public access, encryption, external sharing) |
Decision Tree — Mental Model
Threat
Unknown locations of sensitive data (PII, financial data, credentials) in S3; publicly exposed or unencrypted buckets containing that data; regulatory non-compliance (GDPR/HIPAA)
↓
Security Goal
Discover where sensitive data lives across S3, continuously and cost-efficiently, and evaluate the security posture of the buckets holding it
↓
AWS Service
Amazon Macie
Bucket posture evaluation (always-on, free)
Automated sensitive data discovery (sampled, continuous)
Targeted discovery jobs (on-demand, full-scan capable)
Sample retrieval (validate findings)
↓
Implementation
Enable via Organizations delegated administrator. Configure automated discovery scope per account; exclude non-sensitive buckets. Define custom identifiers and allow lists.
↓
Monitoring
Findings → EventBridge (routing) and Security Hub (aggregation). Sensitivity scores guide which buckets need a deeper targeted job.
↓
Remediation
EventBridge-triggered Lambda blocks public access, enables default encryption, or revokes external sharing on a flagged bucket.
Final Summary
Must Memorize
- Macie's sensitive-data scanning scope is S3 only
- Bucket posture evaluation is always-on and free; sensitive-data scanning is billable
- Automated discovery (sampled, continuous) vs targeted jobs (on-demand, can be full-scan)
- Macie = data classification, not anomalous access detection (that's GuardDuty S3 Protection)
Must Understand
- Why resource clustering/sampling makes automated discovery cost-efficient at scale
- Managed vs custom data identifiers, and when allow lists matter
- The EventBridge-driven remediation pattern for exposed buckets
- The distinction triangle: Macie (what data exists) vs GuardDuty S3 Protection (who's accessing it) vs Inspector (vulnerabilities)
Can De-prioritize
- Exact list of all 100+ managed data identifiers
- Console UI navigation specifics
- Precise regional pricing figures
Exam appearance probability: HIGH
Discovery Mechanics & Capabilities
Macie's two layers — posture evaluation and content discovery — operate independently. Understanding which layer a finding came from is half the exam battle.
2.1 Automatic Bucket-Level Posture Evaluation Frequently misunderstood
TriggerAutomatic the moment Macie is enabled — no job configuration needed
ChecksPublic accessibility, encryption status, sharing/replication outside the AWS Organization
CostFree — part of the always-on inventory, distinct from billable content scanning
- This layer runs whether or not you ever enable sensitive data discovery — many candidates incorrectly assume all Macie functionality is billable.
- Now scales to monitor up to 10,000 general purpose S3 buckets per account for this preventative posture monitoring.
2.2 Automated Sensitive Data Discovery High exam relevance
WhatContinuous, intelligently sampled scanning that builds an interactive data map and sensitivity score per bucket
HowResource clustering by bucket name, file type, and prefix minimizes redundant scanning across similar objects
- Default mode for new Macie deployments — the cost-efficient way to get organization-wide visibility without scanning every object in every bucket.
- A delegated Macie administrator can enable/disable automated discovery per individual account, and exclude specific S3 buckets from the analysis — granular scope control at the organization level.
- Member accounts (non-administrator) have read access to their own automated-discovery statistics and inventory data.
2.3 Targeted (Classic) Sensitive Data Discovery Jobs High exam relevance
WhatExplicitly configured, on-demand or scheduled job scanning specific buckets — can scan fully, not just sampled
- Used when automated discovery's sampling isn't sufficient — e.g. a compliance mandate (HIPAA, GDPR) requiring a documented full scan of a specific bucket at a specific point in time.
- "Run a full, on-demand scan of this specific bucket to satisfy an audit requirement" → targeted job, not automated discovery.
2.4 Managed & Custom Data Identifiers
Managed identifiers100+ built-in patterns: PII, financial data (credit cards), credentials (AWS keys, Stripe keys, Google Cloud keys), regional government IDs
Custom identifiersRegex-based patterns you define for proprietary/organization-specific sensitive data formats
- Macie also supports a configurable default set of managed identifiers specifically recommended for discovery jobs, or a custom subset you choose.
2.5 Allow Lists
PurposeSuppress known false positives — e.g. test/synthetic data that matches a sensitive-data pattern but isn't actually sensitive
2.6 Sensitive Data Sample Retrieval
WhatOne-click, temporary retrieval of up to 10 examples of the sensitive data found in an object
SecurityEncrypted with customer-managed KMS keys, viewable only temporarily within the console
- Lets an analyst validate a finding (is this really a credit card number, or a false positive) without independently digging through the raw object.
2.7 Encryption-Aware Scanning
WhatSupports analyzing objects encrypted with dual-layer server-side encryption using KMS keys (DSSE-KMS)
AWS Exam Thinking
Requirement → Keywords → Expected Answer → why every distractor fails.
Find where PII/financial data lives across S3
PIIwhere is sensitive dataGDPR / HIPAA
Expected AnswerAmazon Macie
| Distractor | Why it's wrong |
GuardDuty S3 Protection | Detects anomalous access patterns, doesn't classify the data's content/sensitivity |
Inspector | Finds software vulnerabilities, not data content |
AWS Config | Evaluates configuration compliance, not object content |
Detect a publicly exposed S3 bucket without running a data scan
public bucketno scan requiredfree
Expected AnswerMacie's automatic bucket-level posture evaluation (always-on, free)
| Distractor | Why it's wrong |
| Run a targeted sensitive data discovery job | Unnecessary cost/complexity — posture evaluation already covers public/encryption/sharing status automatically |
AWS Config + S3 public access rule | Valid alternative, but Macie's posture evaluation is purpose-built and free the moment Macie is enabled |
Continuously, cost-efficiently discover sensitive data org-wide
continuouscost-efficientorganization-wide
Expected AnswerMacie automated sensitive data discovery
| Distractor | Why it's wrong |
| Run a full targeted discovery job on every bucket every day | Massively more expensive — automated discovery's sampling/clustering is specifically built to avoid this cost |
Satisfy an audit requirement for a documented full scan of a specific bucket
auditfull scanspecific bucket
Expected AnswerMacie targeted (classic) sensitive data discovery job
| Distractor | Why it's wrong |
| Rely on automated discovery's sampled results | Sampling, by design, doesn't guarantee every object was inspected — insufficient for a documented full-scan audit requirement |
Suppress repeated false-positive findings on known test data
false positivetest data
Expected AnswerMacie allow list
| Distractor | Why it's wrong |
| Custom data identifier | Used to find MORE specific sensitive patterns, not to suppress known-safe ones |
Discover sensitive data stored in RDS / DynamoDB / EBS
RDSDynamoDBnon-S3
Expected AnswerNot Macie — Macie's content scanning is S3-only
| Distractor | Why it's wrong |
| Macie | This is the classic trap — Macie does NOT natively scan RDS, DynamoDB, EBS, or EFS for sensitive data. Exporting/extracting data to S3 first would be required, or a different approach entirely |
Security Controls Mapping & Integrations
4 — Controls Mapping
Detective
Sensitive data discovery findings and bucket posture findings — e.g. "S3 object contains PII" or "bucket is publicly accessible"
Data Protection
Macie's core identity — classification of sensitive data and assessment of the controls (encryption, access) protecting it
Responsive (via integration, not native)
EventBridge → Lambda triggered by a finding — e.g. automatically apply S3 Block Public Access or enable default bucket encryption when an exposed-sensitive-bucket finding fires
Compliance
Supports GDPR/HIPAA-driven data mapping and audit requirements through targeted discovery jobs and continuous data maps
⚠️ Macie is NOT preventive and does NOT detect anomalous access
It classifies what data exists and evaluates static bucket posture. It does not block access, and it does not flag unusual GetObject patterns (that's GuardDuty S3 Protection's job).
5 — Integrations
EventBridge
WhatAll Macie findings are sent to EventBridge
PatternFinding → rule → Lambda (e.g. automatically block public access to a flagged bucket)
Security Hub
WhatFindings can be published into Security Hub for aggregation
WhyCross-service correlation — exposure findings can combine Macie's "sensitive data present" signal with GuardDuty/Inspector signals on the same resource
Organizations
WhatDelegated administrator model with multi-account support
CapabilityPer-account enable/disable of automated discovery, bucket exclusions, member account read access to their own stats
KMS
WhatEncrypts retrieved sensitive data samples; supports analyzing DSSE-KMS encrypted objects
Athena & QuickSight
WhatMacie discovery results can be queried/visualized via Athena and QuickSight for custom reporting
Costs, Limits & Quotas
Pricing Model
Bucket posture evaluationAlways free, regardless of other Macie usage
Sensitive data discoveryFirst 1 GB per account per region per month free; billed thereafter based on data evaluated
Trial30-day free trial including automated discovery and S3 bucket-level evaluation
Common Cost Mistakes
- Running full targeted jobs across every bucket when automated discovery's sampling would provide sufficient ongoing visibility at far lower cost
- Not excluding clearly non-sensitive buckets (e.g. public website assets, build artifacts) from automated discovery scope
- Forgetting that storage class and object count both factor into evaluation cost — very large or numerous-object buckets cost more to scan
Cost Optimization
- Default to automated discovery; reserve targeted full-scan jobs for specific compliance-driven needs
- Exclude non-sensitive buckets from automated discovery scope at the delegated administrator level
- Use allow lists to avoid wasted analyst time on repeated known-false-positive findings (this saves operational time, not scan cost)
Limits & Quotas
ScopeRegional service — enable per region
Data sourceS3 only for content/sensitive-data scanning — no native RDS/DynamoDB/EBS/EFS support
Posture monitoring scaleUp to 10,000 general purpose S3 buckets per account
Custom identifiersRegex-based, account-defined, count-limited per account
⚠️ Exam trap
The S3-only scope is the single most tested limitation. A scenario asking to discover PII inside an RDS database or DynamoDB table is explicitly testing whether you know Macie cannot do this natively — data would need to be exported/extracted to S3 first, or a different tool used entirely.