Domain 2: Implementation and Integration (26%) โ
โ Domain 1 ยท Next: Domain 3 โ
Exam Tip
This domain tests your ability to implement GenAI applications end-to-end. Know when to use Bedrock Agents vs. a simple Knowledge Base call, understand the three Bedrock APIs, and be able to explain chunking strategy trade-offs. Agents + Domain 1 together = 57% of the exam.
2.1 Agentic AI & Amazon Bedrock Agents โ
What is an Agent? โ
Amazon Bedrock Agents enable multi-step, autonomous reasoning by orchestrating FM calls, tool invocations, and knowledge base retrievals to complete a user goal.
Core Components:
| Component | Role |
|---|---|
| Agent | The orchestrator โ receives the user request and plans the steps |
| Action Groups | Lambda functions the Agent can invoke to interact with external systems |
| Knowledge Bases | RAG pipeline the Agent can query for document-based context |
| Orchestration Trace | Step-by-step log of the Agent's reasoning and tool calls |
Action Groups โ
Action Groups define what external actions an Agent can take:
- Each action group is backed by a Lambda function
- The Agent decides whether and when to call an action based on the user's request
- Actions are described with an OpenAPI schema โ the Agent uses this to understand what parameters to pass
Examples:
- Look up a customer's order status in a database
- Send a confirmation email via an external API
- Write a record to an S3 bucket
Knowledge Base Integration โ
Agents can be connected to a Knowledge Base to retrieve document context:
- The Agent automatically decides when to query the Knowledge Base vs. call an Action Group
- Knowledge Bases use the RAG pipeline: S3 โ chunking โ embeddings โ OpenSearch Serverless
Orchestration Trace โ
Enable with enableTrace: true in the InvokeAgent API call:
- Shows the Agent's step-by-step reasoning chain (which step, which tool, which decision)
- Critical for debugging unexpected Agent behavior
Exam Trap
Bedrock Agents vs. Knowledge Bases (RAG only):
- Use a Knowledge Base alone when: the task is document Q&A with no external action needed
- Use Bedrock Agents when: the task requires multi-step reasoning, calling external APIs, or taking actions beyond retrieval
The exam will present scenarios โ choose Agents when action or orchestration is involved.
2.2 Knowledge Base Architecture โ
Amazon Bedrock Knowledge Bases is the fully managed RAG capability in Bedrock. It automates the core workflow of:
- ingesting data from a source such as S3
- chunking that data
- generating embeddings
- storing those embeddings in a vector database
- retrieving relevant context to augment the FM response
This reduces the amount of custom pipeline code you would otherwise need to build and maintain yourself.
End-to-End RAG Flow โ
1. Data Source: S3 bucket (PDFs, Word, HTML, Markdown, CSV)
2. Parser: Extract and clean text from documents
3. Chunker: Split text into smaller pieces (fixed, semantic, hierarchical)
4. Embedder: Titan Text Embeddings / Cohere Embed โ convert text to vectors
5. Vector Store: Amazon OpenSearch Serverless (or Aurora pgvector)
6. At inference: embed query โ vector search โ top-K chunks โ inject into FM promptWhy Use Knowledge Bases โ
- Best answer when the question asks for managed RAG with minimal custom code
- Strong fit for internal manuals, FAQs, policy documents, and document-grounded assistants
- Can be used directly in an application or attached to a Bedrock Agent
Knowledge Base Backing Patterns โ
- Vector store: the standard Bedrock RAG pattern for chunked embeddings
- Structured data store: useful when data already lives in structured systems and semantic search is needed over that data
- Kendra GenAI Index: useful when the scenario emphasizes document understanding and Kendra-backed retrieval
Supported Data Sources โ
- Amazon S3 โ primary ingestion source (most common exam scenario)
- Web Crawler โ Bedrock Knowledge Bases can crawl websites
- Salesforce, Confluence, SharePoint โ native connectors
Chunking Strategies โ
| Strategy | How It Works | Best For | Trade-off |
|---|---|---|---|
| Fixed-size | Split by token count (e.g., 300 tokens) | Uniform documents | May break mid-sentence or mid-context |
| Fixed-size + overlap | Tokens overlap between adjacent chunks | Preserving cross-boundary context | Higher storage and retrieval cost |
| Semantic | Split by meaning/topic using NLP | Long-form, varied content | Higher processing complexity |
| Hierarchical | Parent chunk + child chunk structure | Complex docs needing broad + fine retrieval | More complex retrieval logic |
TIP
Hierarchical chunking is best when you need both broad context and fine-grained retrieval.
Fixed-size with overlap is the simplest option for preserving context across chunk boundaries.
Sync and Ingestion โ
- After updating S3 content, you must trigger a sync to update the vector store
- The Knowledge Base does NOT auto-update when S3 changes
- Ingestion status is visible in the Bedrock console
2.3 API Integration Patterns โ
The Three Core Bedrock Runtime APIs โ
| API | Use Case | Response Model |
|---|---|---|
InvokeModel | Synchronous โ get a complete response in one call | Blocking, full response at once |
InvokeModelWithResponseStream | Streaming โ receive response token by token | Non-blocking, chunk-by-chunk delivery |
InvokeAgent | Multi-step agentic workflow | Streaming trace + final response |
When to Use Each โ
User needs a complete answer in one shot?
โโ InvokeModel (synchronous, simple)
User interface needs a low-latency "typing" feel?
โโ InvokeModelWithResponseStream (streaming)
Task requires tool calls, external APIs, or multi-step planning?
โโ InvokeAgentInvokeModel Request Body (Simplified) โ
{
"modelId": "anthropic.claude-3-sonnet-20240229-v1:0",
"body": {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [
{ "role": "user", "content": "Summarize this document..." }
]
}
}InvokeModel uses a model-specific request body. The payload is JSON, but the expected JSON fields vary by model family. For example, Anthropic Claude models use fields such as anthropic_version, messages, and max_tokens.
Key parameters:
modelId: The specific FM to invokemax_tokens: Cap on output tokens โ controls cost and response lengthtemperature: Controls creativity vs. determinism (0.0โ1.0)topP: Narrows or widens the token probability pool used for samplingstopSequences: Explicitly tells the model to stop generating when a specified token or phrase appears
InvokeModel vs. Converse โ
| Feature | InvokeModel | Converse |
|---|---|---|
| Request shape | Model-specific JSON payload | More consistent message-based structure |
| Best use | Direct invocation when you know the target model and its schema | Easier cross-model portability |
| Exam takeaway | You may need to match model-specific fields | Bedrock's unified API for chat-style interaction |
Exam Framing
The exam is unlikely to ask you to memorize exact request bodies, byte encoding, or SDK syntax. What matters is knowing that InvokeModel is more model-specific, while Converse exists to reduce that variation.
Converse API โ
The Converse API is the general-purpose, message-based Bedrock API for foundation model interactions. Think of it as the Bedrock Swiss army knife for chat-style requests: it gives you a more consistent request shape across models, which makes it easier to swap models in your application without rewriting as much integration code.
At a minimum, Converse needs:
- A
modelId - A prompt expressed through
messages
It can also include:
inferenceConfigformaxTokens,temperature,topP, and stop sequencesguardrailConfigto apply Guardrails within the same requesttoolConfigfor tool definitions and tool choice in tool-calling / agentic patternssysteminstructions for top-level model behaviorpromptVariableswhen using a stored prompt from Bedrock Prompt ManagementadditionalModelRequestFieldsfor model-specific pass-through settings
How to Think About Converse on the Exam
The exam will not ask you to write Converse request bodies from memory. What matters is understanding why Converse exists:
- a unified message-based API surface
- easier model portability across Bedrock-supported models
- one place to pass inference settings, tools, and guardrail configuration
Common Confusion
Do not confuse Converse with InvokeAgent:
- Converse = unified model interaction API
- InvokeAgent = managed Bedrock Agent orchestration with multi-step reasoning and trace support
2.4 Retrieval Configuration & Tuning โ
Top-K Retrieval โ
- At query time, the embedding of the user's question is compared against all vectors in the store
- The top-K most similar chunks are retrieved and injected into the FM prompt
- Higher K: More context but longer prompts (higher cost, risk of irrelevant chunks)
- Lower K: Focused context but may miss relevant information
Metadata Filtering โ
- Attach metadata to chunks during ingestion (e.g.,
department: "legal",year: 2024) - At query time, filter retrieval to only return chunks matching specific metadata
- Allows scoped retrieval without maintaining separate vector stores per category
Hybrid Search โ
- Hybrid search combines semantic vector retrieval with keyword matching
- Use it when exact terms, product codes, error messages, or legal phrases matter in addition to overall semantic meaning
- This is often better than pure vector search for enterprise documents and mixed-structure datasets
Reranking โ
- Retrieval is often a two-stage process
- Stage 1: fetch a broader candidate set quickly
- Stage 2: rerank those candidates with a stronger scoring model or rule set
- Reranking helps improve precision when the initial top-K results include loosely related chunks
Precision vs. Recall in RAG โ
| Problem | Likely Issue | Common Fix |
|---|---|---|
| Relevant chunks are missing | Low recall | Increase candidate set, improve chunking, use hybrid search |
| Too many irrelevant chunks are returned | Low precision | Add metadata filters, rerank results, reduce top-K |
| Good chunks exist but answer still drifts | Prompt/context issue | Tighten prompt instructions, reduce noise, improve citations |
Distance and Similarity Intuition โ
- Embedding search relies on a distance or similarity metric
- In practice, you may see references to cosine similarity or other vector-distance approaches
- You do not need to memorize math formulas for the exam, but you should understand that retrieval is driven by vector closeness
2.5 Amazon SageMaker for GenAI Applications โ
What SageMaker Provides โ
Amazon SageMaker is the managed ML platform for training, fine-tuning, and deploying custom models. For the AIP-C01 exam, know when SageMaker is the right choice over Bedrock.
Key SageMaker Concepts โ
| Concept | Description |
|---|---|
| Training Jobs | Managed compute for training ML models โ bring your own script and framework |
| SageMaker JumpStart | Pre-trained foundation models (Llama, Falcon, etc.) that can be fine-tuned and deployed from the SageMaker console |
| SageMaker Endpoints | Real-time inference endpoints for deployed models |
| Batch Transform | Asynchronous batch inference on large datasets (analogous to Bedrock Batch Inference) |
Bedrock vs. SageMaker Decision โ
Need a managed FM from a major provider (Claude, Titan, Cohere)?
โโ Amazon Bedrock (InvokeModel)
Need full training control, custom framework, or open-source LLM?
โโ Amazon SageMaker (Training Job + Endpoint)
Need a pre-trained open-source LLM deployed on your own infra?
โโ SageMaker JumpStartTIP
The exam will not ask you to configure SageMaker training pipelines in detail. Know when to choose SageMaker vs. Bedrock and what SageMaker JumpStart provides.
SageMaker Inference Optimization Patterns โ
When a fine-tuned LLM is deployed on a SageMaker endpoint, the exam may test whether you can improve GPU utilization and cost efficiency without changing the model itself.
Important ideas:
- DJL Serving (Deep Java Library) supports advanced LLM serving patterns on SageMaker
- Continuous batching improves throughput by processing more requests together
- If real-world requests are much shorter than the configured maximum sequence length, lowering that limit can free memory
- Freed memory can often be used to support larger effective batch sizes and higher concurrency per instance
GPU parallelism trade-off:
- If model weights and activations fit within fewer GPUs than originally allocated, reduce the tensor parallel setting
- This can allow multiple model replicas per instance instead of one oversized replica spread across all GPUs
- The result is often better utilization and lower cost for the same traffic
TIP
For SageMaker LLM serving questions, watch for this pattern:
- Actual requests are much shorter than the configured max sequence length
- Instance concurrency is low
- The model can fit on fewer GPUs
That usually points to:
- reducing maximum sequence length
- adjusting tensor parallelism
2.6 Amazon Comprehend โ
What Comprehend Does โ
Amazon Comprehend is a managed NLP service โ it analyzes text without requiring a foundation model invocation.
Key Capabilities โ
| Capability | Description | Exam Use Case |
|---|---|---|
| Entity Recognition | Detect people, places, organizations, dates in text | Pre-process documents before RAG ingestion |
| Sentiment Analysis | Positive / negative / neutral / mixed classification | Post-process FM output or classify user input |
| Key Phrase Extraction | Identify the most important phrases in a document | Summarization metadata, tagging |
| Language Detection | Identify the language of a text input | Route to the correct multilingual model |
| PII Detection | Find and optionally redact PII entities | Compliance pre-processing before sending to an FM |
Comprehend in a GenAI Pipeline โ
User Input โ Comprehend (PII check / sentiment / language)
โ only if safe/appropriate
Amazon Bedrock FM (InvokeModel)
โ
FM Output โ Comprehend (sentiment / entity tagging)
โ
Application ResponseComprehend vs. Bedrock Guardrails PII Redaction
Both can detect PII. Choose:
- Comprehend when PII detection/redaction happens before reaching Bedrock, or as a standalone NLP step
- Bedrock Guardrails PII Redaction when you want PII handled within the Bedrock request/response cycle
2.7 Exposing Bedrock APIs: API Gateway & Container Deployments โ
Amazon API Gateway + Lambda + Bedrock โ
The standard pattern to expose a Bedrock-powered feature as an HTTP API:
External Client
โ HTTPS
Amazon API Gateway (authentication, rate limiting, routing)
โ
AWS Lambda (business logic, prompt construction, Bedrock call)
โ
Amazon Bedrock (InvokeModel / InvokeAgent)Why each layer:
- API Gateway: Handles auth (IAM, Cognito, API Keys), throttling, CORS, and request/response transformation
- Lambda: Stateless compute to build the prompt, call Bedrock, and format the response
- Bedrock: Foundation model inference
API Gateway WebSocket APIs for Real-Time GenAI โ
API Gateway WebSocket APIs are used when the application needs a persistent, bidirectional connection instead of simple request-response HTTP.
Why this matters:
- The client can send messages to the backend at any time
- The backend can push updates or model output back immediately
- The connection stays open, so the app avoids repeated reconnect overhead and client polling
Good fit:
- Real-time chat interfaces
- Collaborative applications
- Streaming GenAI responses where the user sends input and receives incremental output over an ongoing connection
How to think about it on the exam:
- HTTP API Gateway = standard request/response API exposure
- WebSocket API Gateway = persistent, full-duplex communication for interactive real-time applications
TIP
If the scenario says bidirectional streaming, server push, persistent connection, or real-time collaboration, think API Gateway WebSocket API rather than a basic HTTP API.
ECS / EKS for GenAI Application Backends โ
For applications that need persistent processes, long-running containers, or complex orchestration:
| Service | Use Case |
|---|---|
| Amazon ECS | Managed container orchestration โ simpler than Kubernetes, suitable for most GenAI web backends |
| Amazon EKS | Kubernetes-based โ use when team already runs k8s or needs advanced scheduling |
Typical architecture:
Load Balancer โ ECS/EKS (GenAI app container) โ Bedrock APILambda vs. ECS/EKS
- Lambda: Stateless, event-driven, <15 min execution โ best for per-request Bedrock calls
- ECS/EKS: Long-running services, streaming WebSocket connections, or containers with heavy dependencies
Hard Limits โ Lambda & API Gateway
| Service | Limit | Impact |
|---|---|---|
| Lambda | 15 min (900s) max execution | FM inference or document processing exceeding this โ use ECS or Bedrock Batch |
| Lambda | 6 MB synchronous response payload | Large FM responses โ use streaming (InvokeModelWithResponseStream) |
| API Gateway | 29 seconds integration timeout โ cannot be increased | Cannot proxy long synchronous FM calls; requires async submit-and-poll pattern |
Async pattern for long FM calls behind API Gateway:
POST /invoke โ Lambda submits job โ returns job ID (fast)
โ SQS / Step Functions processes async
GET /result/{id} โ Lambda checks status โ returns result when ready2.8 Cross-Region Inference & Bedrock Flows โ
Cross-Region Inference Profiles โ
Cross-region inference routes model invocations across multiple AWS regions to improve availability and throughput.
- Bedrock creates a cross-region inference profile that you invoke like a regular model
- Bedrock automatically routes requests to the region with available capacity
- Use when: on-demand throughput in a single region is insufficient, or you need higher availability
- The calling region still applies for billing and data residency awareness โ verify compliance requirements
How to invoke:
Instead of: modelId = "anthropic.claude-3-sonnet-..."
Use: modelId = "us.anthropic.claude-3-sonnet-..." (cross-region profile ID)Amazon Bedrock Flows โ
Bedrock Flows is a visual, low-code workflow orchestration tool for chaining FM calls, knowledge base retrievals, and prompt logic.
| Feature | Description |
|---|---|
| Node types | FM prompt, Knowledge Base retrieval, Lambda function, Condition, Input/Output |
| Use case | Multi-step GenAI pipelines without writing orchestration code |
| vs. Bedrock Agents | Flows are deterministic, explicitly defined workflows; Agents reason autonomously |
Flows vs. Agents on the Exam
- Bedrock Agents: autonomous reasoning, decides at runtime which tools to call
- Bedrock Flows: predefined workflow graph, each step is explicit โ no autonomous decision-making Choose Flows when the workflow is known upfront and deterministic; choose Agents when the task requires dynamic planning.
Low-Ops Multimodal Analysis Pattern โ
When the workload needs to analyze images or video, extract insights, and present summary trends with minimal infrastructure management, think in terms of a fully managed AWS stack.
Typical pattern:
Photos / videos
โ
Amazon Bedrock multimodal FM (for visual analysis)
โ
AWS Step Functions (coordinate stages, branching, retries)
โ
Store extracted results
โ
Amazon QuickSight dashboardWhy this is low ops:
- Bedrock avoids custom model training and hosting
- Step Functions handles workflow state, retries, and orchestration without servers
- QuickSight provides managed dashboards without running BI infrastructure
TIP
If the scenario says analyze visual content, extract structured insights, and show trends in dashboards with the least operational overhead, a strong answer is: Bedrock multimodal model + Step Functions + QuickSight
2.9 Human-in-the-Loop with AWS Step Functions Task Token โ
The Problem โ
Some GenAI workflows require a human decision before proceeding โ content approval, sensitive action confirmation, compliance review. The challenge: how do you pause a workflow for hours or days without polling or consuming resources?
Step Functions Task Token Pattern โ
AWS Step Functions solves this with the task token callback pattern:
1. Step Functions starts a workflow task โ generates a unique task token
2. Token is sent to an external system (email, dashboard, Slack, SQS)
3. Workflow PAUSES โ no polling, no resource consumption
4. Human reviews and approves/rejects
5. External system calls SendTaskSuccess (or SendTaskFailure) with the token
6. Workflow resumes from where it pausedKey API calls:
SendTaskSuccess(taskToken, output)โ human approved, workflow continuesSendTaskFailure(taskToken, error, cause)โ human rejected, workflow handles failure
When to Use This Pattern โ
| Scenario | Pattern |
|---|---|
| Content generated by FM needs editor approval before publishing | Task Token โ wait can be hours/days |
| Agent needs one-step human confirmation within a session | Bedrock Agents Return of Control (RoC) |
| Compliance review of AI-generated legal/medical text | Task Token โ external reviewer, long wait |
| Real-time interactive confirmation in a chatbot | Bedrock Agents RoC |
Bedrock Agents RoC vs. Step Functions Task Token โ
| Bedrock Agents Return of Control | Step Functions Task Token | |
|---|---|---|
| Where | Inside Bedrock Agent orchestration | External Step Functions workflow wrapping Bedrock |
| Pause mechanism | Agent returns control to the caller | Workflow suspends, waits for callback |
| Wait duration | Short โ within the same session | Any duration โ minutes, hours, or days |
| Resource use while waiting | Session remains active | Zero โ workflow is fully suspended |
| Best for | Single-step confirmation during an agent loop | Long-running approval workflows with external systems |
Exam Scenario
"A GenAI application generates content that requires human editorial review before publishing. The review may take up to 48 hours. How should the workflow be designed?" โ Step Functions task token pattern โ the workflow pauses without consuming resources until the reviewer sends the callback.
If the wait is short and within a single agent session โ Bedrock Agents Return of Control.
Standard vs. Express Workflow Limits
| Standard Workflow | Express Workflow | |
|---|---|---|
| Max duration | 1 year | 5 minutes |
| Execution model | At-least-once | At-most-once / at-least-once |
| Audit history | Full auditable history | CloudWatch Logs only |
| Best for | Human-in-the-loop, long approvals | High-throughput event pipelines |
Always use Standard for human review workflows โ Express's 5-minute cap rules it out for any human wait.