Domain 1: FM Integration, Data Management, and Compliance (31%) โ
โ Back to Overview ยท Next: Domain 2 โ
Exam Tip
This is the highest-weighted domain at 31%. Focus on FM selection trade-offs, chunking strategies for RAG, and compliance/data residency controls. The exam tests why you choose a specific FM or architecture, not just what they are. Know the RAG vs. fine-tuning decision cold.
1.1 Foundation Model Selection โ
Model Families on Amazon Bedrock โ
| Model Family | Vendor | Strengths | Best For |
|---|---|---|---|
| Claude | Anthropic | Reasoning, safety, large context (up to 200k tokens) | Long-document analysis, complex reasoning, summarization |
| Llama | Meta | Open-source, fine-tunable | Custom fine-tuning, cost-efficient inference |
| Mistral | Mistral AI | Efficient, high performance relative to size | Fast inference, resource-constrained scenarios |
| Titan | AWS | AWS-native, embeddings, summarization | Generating embeddings for RAG, AWS-optimized applications |
| Cohere | Cohere | Multilingual embeddings, retrieval | Multilingual RAG, semantic search |
Selection Criteria โ
Context Window:
- Needed for long-document RAG, multi-turn conversations, and large prompt contexts
- Claude: up to 200k tokens | Llama 3: 128k tokens | Titan: shorter context windows
- Choose Claude when the scenario requires processing very long documents
Latency:
- Use smaller/faster models (Claude Haiku, Mistral) for real-time chat interfaces
- Use larger models (Claude Sonnet/Opus) for reasoning-heavy tasks where latency is acceptable
Cost:
- On-demand: pay per input/output token
- Provisioned Throughput: fixed cost for guaranteed Model Units (suitable for predictable high-volume)
Exam Trap
The exam frequently offers fine-tuning as a tempting option. Default to RAG when:
- The knowledge base changes frequently
- You need source attribution / traceability
- You want to avoid the cost and complexity of model training
Prefer fine-tuning when:
- You need the model to adopt a new tone, writing style, or domain-specific format
- The underlying data is stable and unlikely to change often
1.2 Prompt Engineering Strategies โ
Core Techniques โ
| Technique | Description | When to Use |
|---|---|---|
| Zero-shot | Provide task instructions without examples | Simple, well-defined tasks |
| Few-shot | Include 2โ5 examples in the prompt | Tasks where examples clarify expected output format |
| Chain-of-Thought (CoT) | Ask the model to "think step by step" | Multi-step reasoning, math, logical deduction |
| System Prompt | Persistent instructions that frame the model's persona and rules | All production applications |
Prompt Optimization Best Practices โ
- Be specific: Vague prompts produce vague outputs โ always define the output format
- Separate instructions from data: Use XML tags or delimiters to clearly separate task instructions from user-provided content
- Control output length: Set
maxTokensexplicitly to cap responses and control cost - Temperature control:
- Low temperature (0.0โ0.3): Deterministic, factual outputs โ use for Q&A and summarization
- High temperature (0.7โ1.0): Creative, diverse outputs โ use for brainstorming and creative writing
1.3 Data Management & RAG Pipelines โ
RAG Pipeline Architecture โ
S3 (raw documents: PDFs, TXT, HTML, Markdown)
โ ingestion / pre-processing
Chunking (split into smaller text pieces)
โ
Embedding Model (Titan Text Embeddings / Cohere Embed)
โ convert text to vectors
Vector Store (OpenSearch Serverless or Aurora pgvector)
โ at inference time
Query โ Embed Query โ Vector Search โ Retrieve Top-K Chunks โ FM PromptChunking Strategies โ
| Strategy | How It Works | Best For | Trade-off |
|---|---|---|---|
| Fixed-size | Split by token count (e.g., 300 tokens) | Uniform documents | May break mid-sentence or mid-context |
| Fixed-size + overlap | Tokens overlap between adjacent chunks | Preserving cross-boundary context | Higher storage and retrieval cost |
| Semantic | Split by meaning/topic using NLP | Long-form, varied content | Higher processing complexity |
| Hierarchical | Parent chunk + child chunk structure | Complex docs needing broad + fine retrieval | More complex retrieval logic |
TIP
Hierarchical chunking is best when you need both broad context and fine-grained retrieval. Fixed-size with overlap is the simplest option for preserving context across chunk boundaries.
Embedding Models โ
| Model | Vendor | Best For |
|---|---|---|
| Titan Text Embeddings v2 | AWS | General purpose, AWS-native RAG (configurable dimensions) |
| Cohere Embed | Cohere | Multilingual retrieval, semantic search |
1.4 Vector Stores โ
Primary Options โ
| Vector Store | Type | Best For | Key Characteristic |
|---|---|---|---|
| Amazon OpenSearch Serverless | Managed, serverless | Bedrock Knowledge Bases (default) | Scales to zero, no cluster management |
| Aurora PostgreSQL + pgvector | RDS extension | Existing PostgreSQL infrastructure | SQL + vector search in one database |
| Amazon Kendra | Managed enterprise search | Enterprise document retrieval | NLP-powered, not pure vector similarity |
Exam Trap
OpenSearch Serverless is the default for Bedrock Knowledge Bases โ not standard OpenSearch managed clusters. The exam distinguishes between these. Choose OpenSearch Serverless when the question mentions "Knowledge Bases," "fully managed vector store," or "RAG with Amazon Bedrock."
OpenSearch Serverless โ Key Facts โ
- Collection type must be set to Vector search at creation
- Supports up to 16,000 dimensions per vector
- Integrated with Bedrock Knowledge Bases via IAM service-linked role
- Does NOT support standard OpenSearch full-text features (custom analyzers, etc.)
1.5 Compliance, Data Residency & Security โ
Key Controls โ
| Requirement | AWS Control | How |
|---|---|---|
| Data not used to train models | Amazon Bedrock default | Customer data is isolated โ Bedrock never uses it to train base models |
| Data residency | AWS Region + VPC Endpoints | Choose the AWS region; use PrivateLink to keep traffic off public internet |
| Encryption at rest | AWS KMS | Enable KMS keys on S3 buckets and OpenSearch Serverless collections |
| Encryption in transit | TLS | All Bedrock API calls are TLS-encrypted by default |
| Network isolation | VPC Endpoints (PrivateLink) | Routes Bedrock traffic through the AWS backbone, bypassing the public internet |
| Access control | AWS IAM | Least-privilege resource policies scoped to specific model IDs |
Key Fact
Amazon Bedrock does not use customer prompts, completions, or training data to train the underlying foundation models. This is a built-in data privacy guarantee โ any answer suggesting otherwise is wrong.
Flashcards
Which FM on Bedrock has the largest context window?
(Click to reveal)