Skip to content

Domain 1: FM Integration, Data Management, and Compliance (31%) โ€‹

โ† Back to Overview ยท Next: Domain 2 โ†’

Exam Tip

This is the highest-weighted domain at 31%. Focus on FM selection trade-offs, chunking strategies for RAG, and compliance/data residency controls. The exam tests why you choose a specific FM or architecture, not just what they are. Know the RAG vs. fine-tuning decision cold.


1.1 Foundation Model Selection โ€‹

Model Families on Amazon Bedrock โ€‹

Model FamilyVendorStrengthsBest For
ClaudeAnthropicReasoning, safety, large context (up to 200k tokens)Long-document analysis, complex reasoning, summarization
LlamaMetaOpen-source, fine-tunableCustom fine-tuning, cost-efficient inference
MistralMistral AIEfficient, high performance relative to sizeFast inference, resource-constrained scenarios
TitanAWSAWS-native, embeddings, summarizationGenerating embeddings for RAG, AWS-optimized applications
CohereCohereMultilingual embeddings, retrievalMultilingual RAG, semantic search

Selection Criteria โ€‹

Context Window:

  • Needed for long-document RAG, multi-turn conversations, and large prompt contexts
  • Claude: up to 200k tokens | Llama 3: 128k tokens | Titan: shorter context windows
  • Choose Claude when the scenario requires processing very long documents

Latency:

  • Use smaller/faster models (Claude Haiku, Mistral) for real-time chat interfaces
  • Use larger models (Claude Sonnet/Opus) for reasoning-heavy tasks where latency is acceptable

Cost:

  • On-demand: pay per input/output token
  • Provisioned Throughput: fixed cost for guaranteed Model Units (suitable for predictable high-volume)

Exam Trap

The exam frequently offers fine-tuning as a tempting option. Default to RAG when:

  • The knowledge base changes frequently
  • You need source attribution / traceability
  • You want to avoid the cost and complexity of model training

Prefer fine-tuning when:

  • You need the model to adopt a new tone, writing style, or domain-specific format
  • The underlying data is stable and unlikely to change often

1.2 Prompt Engineering Strategies โ€‹

Core Techniques โ€‹

TechniqueDescriptionWhen to Use
Zero-shotProvide task instructions without examplesSimple, well-defined tasks
Few-shotInclude 2โ€“5 examples in the promptTasks where examples clarify expected output format
Chain-of-Thought (CoT)Ask the model to "think step by step"Multi-step reasoning, math, logical deduction
System PromptPersistent instructions that frame the model's persona and rulesAll production applications

Prompt Optimization Best Practices โ€‹

  • Be specific: Vague prompts produce vague outputs โ€” always define the output format
  • Separate instructions from data: Use XML tags or delimiters to clearly separate task instructions from user-provided content
  • Control output length: Set maxTokens explicitly to cap responses and control cost
  • Temperature control:
    • Low temperature (0.0โ€“0.3): Deterministic, factual outputs โ€” use for Q&A and summarization
    • High temperature (0.7โ€“1.0): Creative, diverse outputs โ€” use for brainstorming and creative writing

1.3 Data Management & RAG Pipelines โ€‹

RAG Pipeline Architecture โ€‹

S3 (raw documents: PDFs, TXT, HTML, Markdown)
    โ†“ ingestion / pre-processing
Chunking (split into smaller text pieces)
    โ†“
Embedding Model (Titan Text Embeddings / Cohere Embed)
    โ†“ convert text to vectors
Vector Store (OpenSearch Serverless or Aurora pgvector)
    โ†“ at inference time
Query โ†’ Embed Query โ†’ Vector Search โ†’ Retrieve Top-K Chunks โ†’ FM Prompt

Chunking Strategies โ€‹

StrategyHow It WorksBest ForTrade-off
Fixed-sizeSplit by token count (e.g., 300 tokens)Uniform documentsMay break mid-sentence or mid-context
Fixed-size + overlapTokens overlap between adjacent chunksPreserving cross-boundary contextHigher storage and retrieval cost
SemanticSplit by meaning/topic using NLPLong-form, varied contentHigher processing complexity
HierarchicalParent chunk + child chunk structureComplex docs needing broad + fine retrievalMore complex retrieval logic

TIP

Hierarchical chunking is best when you need both broad context and fine-grained retrieval. Fixed-size with overlap is the simplest option for preserving context across chunk boundaries.

Embedding Models โ€‹

ModelVendorBest For
Titan Text Embeddings v2AWSGeneral purpose, AWS-native RAG (configurable dimensions)
Cohere EmbedCohereMultilingual retrieval, semantic search

1.4 Vector Stores โ€‹

Primary Options โ€‹

Vector StoreTypeBest ForKey Characteristic
Amazon OpenSearch ServerlessManaged, serverlessBedrock Knowledge Bases (default)Scales to zero, no cluster management
Aurora PostgreSQL + pgvectorRDS extensionExisting PostgreSQL infrastructureSQL + vector search in one database
Amazon KendraManaged enterprise searchEnterprise document retrievalNLP-powered, not pure vector similarity

Exam Trap

OpenSearch Serverless is the default for Bedrock Knowledge Bases โ€” not standard OpenSearch managed clusters. The exam distinguishes between these. Choose OpenSearch Serverless when the question mentions "Knowledge Bases," "fully managed vector store," or "RAG with Amazon Bedrock."

OpenSearch Serverless โ€” Key Facts โ€‹

  • Collection type must be set to Vector search at creation
  • Supports up to 16,000 dimensions per vector
  • Integrated with Bedrock Knowledge Bases via IAM service-linked role
  • Does NOT support standard OpenSearch full-text features (custom analyzers, etc.)

1.5 Compliance, Data Residency & Security โ€‹

Key Controls โ€‹

RequirementAWS ControlHow
Data not used to train modelsAmazon Bedrock defaultCustomer data is isolated โ€” Bedrock never uses it to train base models
Data residencyAWS Region + VPC EndpointsChoose the AWS region; use PrivateLink to keep traffic off public internet
Encryption at restAWS KMSEnable KMS keys on S3 buckets and OpenSearch Serverless collections
Encryption in transitTLSAll Bedrock API calls are TLS-encrypted by default
Network isolationVPC Endpoints (PrivateLink)Routes Bedrock traffic through the AWS backbone, bypassing the public internet
Access controlAWS IAMLeast-privilege resource policies scoped to specific model IDs

Key Fact

Amazon Bedrock does not use customer prompts, completions, or training data to train the underlying foundation models. This is a built-in data privacy guarantee โ€” any answer suggesting otherwise is wrong.


Flashcards

1 / 7
โ“

Which FM on Bedrock has the largest context window?

(Click to reveal)
๐Ÿ’ก
Claude (Anthropic) โ€” up to 200,000 tokens. Use Claude when the task requires processing long documents or maintaining extended conversation history.

โ† Back to Overview ยท Next: Domain 2 โ†’

Happy Studying! ๐Ÿš€ โ€ข Privacy-friendly analytics โ€” no cookies, no personal data
Privacy Policy โ€ข AI Disclaimer โ€ข Report an issue