Skip to content

Domain 2: Implementation and Integration (26%) โ€‹

โ† Domain 1 ยท Next: Domain 3 โ†’

Exam Tip

This domain tests your ability to implement GenAI applications end-to-end. Know when to use Bedrock Agents vs. a simple Knowledge Base call, understand the three Bedrock APIs, and be able to explain chunking strategy trade-offs. Agents + Domain 1 together = 57% of the exam.


2.1 Agentic AI & Amazon Bedrock Agents โ€‹

What is an Agent? โ€‹

Amazon Bedrock Agents enable multi-step, autonomous reasoning by orchestrating FM calls, tool invocations, and knowledge base retrievals to complete a user goal.

Core Components:

ComponentRole
AgentThe orchestrator โ€” receives the user request and plans the steps
Action GroupsLambda functions the Agent can invoke to interact with external systems
Knowledge BasesRAG pipeline the Agent can query for document-based context
Orchestration TraceStep-by-step log of the Agent's reasoning and tool calls

Action Groups โ€‹

Action Groups define what external actions an Agent can take:

  • Each action group is backed by a Lambda function
  • The Agent decides whether and when to call an action based on the user's request
  • Actions are described with an OpenAPI schema โ€” the Agent uses this to understand what parameters to pass

Examples:

  • Look up a customer's order status in a database
  • Send a confirmation email via an external API
  • Write a record to an S3 bucket

Knowledge Base Integration โ€‹

Agents can be connected to a Knowledge Base to retrieve document context:

  • The Agent automatically decides when to query the Knowledge Base vs. call an Action Group
  • Knowledge Bases use the RAG pipeline: S3 โ†’ chunking โ†’ embeddings โ†’ OpenSearch Serverless

Orchestration Trace โ€‹

Enable with enableTrace: true in the InvokeAgent API call:

  • Shows the Agent's step-by-step reasoning chain (which step, which tool, which decision)
  • Critical for debugging unexpected Agent behavior

Exam Trap

Bedrock Agents vs. Knowledge Bases (RAG only):

  • Use a Knowledge Base alone when: the task is document Q&A with no external action needed
  • Use Bedrock Agents when: the task requires multi-step reasoning, calling external APIs, or taking actions beyond retrieval

The exam will present scenarios โ€” choose Agents when action or orchestration is involved.


2.2 Knowledge Base Architecture โ€‹

Amazon Bedrock Knowledge Bases is the fully managed RAG capability in Bedrock. It automates the core workflow of:

  • ingesting data from a source such as S3
  • chunking that data
  • generating embeddings
  • storing those embeddings in a vector database
  • retrieving relevant context to augment the FM response

This reduces the amount of custom pipeline code you would otherwise need to build and maintain yourself.

End-to-End RAG Flow โ€‹

text
1. Data Source: S3 bucket (PDFs, Word, HTML, Markdown, CSV)
2. Parser: Extract and clean text from documents
3. Chunker: Split text into smaller pieces (fixed, semantic, hierarchical)
4. Embedder: Titan Text Embeddings / Cohere Embed โ†’ convert text to vectors
5. Vector Store: Amazon OpenSearch Serverless (or Aurora pgvector)
6. At inference: embed query โ†’ vector search โ†’ top-K chunks โ†’ inject into FM prompt

Why Use Knowledge Bases โ€‹

  • Best answer when the question asks for managed RAG with minimal custom code
  • Strong fit for internal manuals, FAQs, policy documents, and document-grounded assistants
  • Can be used directly in an application or attached to a Bedrock Agent

Knowledge Base Backing Patterns โ€‹

  • Vector store: the standard Bedrock RAG pattern for chunked embeddings
  • Structured data store: useful when data already lives in structured systems and semantic search is needed over that data
  • Kendra GenAI Index: useful when the scenario emphasizes document understanding and Kendra-backed retrieval

Supported Data Sources โ€‹

  • Amazon S3 โ€” primary ingestion source (most common exam scenario)
  • Web Crawler โ€” Bedrock Knowledge Bases can crawl websites
  • Salesforce, Confluence, SharePoint โ€” native connectors

Chunking Strategies โ€‹

StrategyHow It WorksBest ForTrade-off
Fixed-sizeSplit by token count (e.g., 300 tokens)Uniform documentsMay break mid-sentence or mid-context
Fixed-size + overlapTokens overlap between adjacent chunksPreserving cross-boundary contextHigher storage and retrieval cost
SemanticSplit by meaning/topic using NLPLong-form, varied contentHigher processing complexity
HierarchicalParent chunk + child chunk structureComplex docs needing broad + fine retrievalMore complex retrieval logic

TIP

Hierarchical chunking is best when you need both broad context and fine-grained retrieval.
Fixed-size with overlap is the simplest option for preserving context across chunk boundaries.

Sync and Ingestion โ€‹

  • After updating S3 content, you must trigger a sync to update the vector store
  • The Knowledge Base does NOT auto-update when S3 changes
  • Ingestion status is visible in the Bedrock console

2.3 API Integration Patterns โ€‹

The Three Core Bedrock Runtime APIs โ€‹

APIUse CaseResponse Model
InvokeModelSynchronous โ€” get a complete response in one callBlocking, full response at once
InvokeModelWithResponseStreamStreaming โ€” receive response token by tokenNon-blocking, chunk-by-chunk delivery
InvokeAgentMulti-step agentic workflowStreaming trace + final response

When to Use Each โ€‹

text
User needs a complete answer in one shot?
โ””โ”€ InvokeModel (synchronous, simple)

User interface needs a low-latency "typing" feel?
โ””โ”€ InvokeModelWithResponseStream (streaming)

Task requires tool calls, external APIs, or multi-step planning?
โ””โ”€ InvokeAgent

InvokeModel Request Body (Simplified) โ€‹

json
{
  "modelId": "anthropic.claude-3-sonnet-20240229-v1:0",
  "body": {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "messages": [
      { "role": "user", "content": "Summarize this document..." }
    ]
  }
}

InvokeModel uses a model-specific request body. The payload is JSON, but the expected JSON fields vary by model family. For example, Anthropic Claude models use fields such as anthropic_version, messages, and max_tokens.

Key parameters:

  • modelId: The specific FM to invoke
  • max_tokens: Cap on output tokens โ€” controls cost and response length
  • temperature: Controls creativity vs. determinism (0.0โ€“1.0)
  • topP: Narrows or widens the token probability pool used for sampling
  • stopSequences: Explicitly tells the model to stop generating when a specified token or phrase appears

InvokeModel vs. Converse โ€‹

FeatureInvokeModelConverse
Request shapeModel-specific JSON payloadMore consistent message-based structure
Best useDirect invocation when you know the target model and its schemaEasier cross-model portability
Exam takeawayYou may need to match model-specific fieldsBedrock's unified API for chat-style interaction

Exam Framing

The exam is unlikely to ask you to memorize exact request bodies, byte encoding, or SDK syntax. What matters is knowing that InvokeModel is more model-specific, while Converse exists to reduce that variation.

Converse API โ€‹

The Converse API is the general-purpose, message-based Bedrock API for foundation model interactions. Think of it as the Bedrock Swiss army knife for chat-style requests: it gives you a more consistent request shape across models, which makes it easier to swap models in your application without rewriting as much integration code.

At a minimum, Converse needs:

  • A modelId
  • A prompt expressed through messages

It can also include:

  • inferenceConfig for maxTokens, temperature, topP, and stop sequences
  • guardrailConfig to apply Guardrails within the same request
  • toolConfig for tool definitions and tool choice in tool-calling / agentic patterns
  • system instructions for top-level model behavior
  • promptVariables when using a stored prompt from Bedrock Prompt Management
  • additionalModelRequestFields for model-specific pass-through settings

How to Think About Converse on the Exam

The exam will not ask you to write Converse request bodies from memory. What matters is understanding why Converse exists:

  • a unified message-based API surface
  • easier model portability across Bedrock-supported models
  • one place to pass inference settings, tools, and guardrail configuration

Common Confusion

Do not confuse Converse with InvokeAgent:

  • Converse = unified model interaction API
  • InvokeAgent = managed Bedrock Agent orchestration with multi-step reasoning and trace support

2.4 Retrieval Configuration & Tuning โ€‹

Top-K Retrieval โ€‹

  • At query time, the embedding of the user's question is compared against all vectors in the store
  • The top-K most similar chunks are retrieved and injected into the FM prompt
  • Higher K: More context but longer prompts (higher cost, risk of irrelevant chunks)
  • Lower K: Focused context but may miss relevant information

Metadata Filtering โ€‹

  • Attach metadata to chunks during ingestion (e.g., department: "legal", year: 2024)
  • At query time, filter retrieval to only return chunks matching specific metadata
  • Allows scoped retrieval without maintaining separate vector stores per category
  • Hybrid search combines semantic vector retrieval with keyword matching
  • Use it when exact terms, product codes, error messages, or legal phrases matter in addition to overall semantic meaning
  • This is often better than pure vector search for enterprise documents and mixed-structure datasets

Reranking โ€‹

  • Retrieval is often a two-stage process
  • Stage 1: fetch a broader candidate set quickly
  • Stage 2: rerank those candidates with a stronger scoring model or rule set
  • Reranking helps improve precision when the initial top-K results include loosely related chunks

Precision vs. Recall in RAG โ€‹

ProblemLikely IssueCommon Fix
Relevant chunks are missingLow recallIncrease candidate set, improve chunking, use hybrid search
Too many irrelevant chunks are returnedLow precisionAdd metadata filters, rerank results, reduce top-K
Good chunks exist but answer still driftsPrompt/context issueTighten prompt instructions, reduce noise, improve citations

Distance and Similarity Intuition โ€‹

  • Embedding search relies on a distance or similarity metric
  • In practice, you may see references to cosine similarity or other vector-distance approaches
  • You do not need to memorize math formulas for the exam, but you should understand that retrieval is driven by vector closeness

2.5 Amazon SageMaker for GenAI Applications โ€‹

What SageMaker Provides โ€‹

Amazon SageMaker is the managed ML platform for training, fine-tuning, and deploying custom models. For the AIP-C01 exam, know when SageMaker is the right choice over Bedrock.

Key SageMaker Concepts โ€‹

ConceptDescription
Training JobsManaged compute for training ML models โ€” bring your own script and framework
SageMaker JumpStartPre-trained foundation models (Llama, Falcon, etc.) that can be fine-tuned and deployed from the SageMaker console
SageMaker EndpointsReal-time inference endpoints for deployed models
Batch TransformAsynchronous batch inference on large datasets (analogous to Bedrock Batch Inference)

Bedrock vs. SageMaker Decision โ€‹

text
Need a managed FM from a major provider (Claude, Titan, Cohere)?
โ””โ”€ Amazon Bedrock (InvokeModel)

Need full training control, custom framework, or open-source LLM?
โ””โ”€ Amazon SageMaker (Training Job + Endpoint)

Need a pre-trained open-source LLM deployed on your own infra?
โ””โ”€ SageMaker JumpStart

TIP

The exam will not ask you to configure SageMaker training pipelines in detail. Know when to choose SageMaker vs. Bedrock and what SageMaker JumpStart provides.

SageMaker Inference Optimization Patterns โ€‹

When a fine-tuned LLM is deployed on a SageMaker endpoint, the exam may test whether you can improve GPU utilization and cost efficiency without changing the model itself.

Important ideas:

  • DJL Serving (Deep Java Library) supports advanced LLM serving patterns on SageMaker
  • Continuous batching improves throughput by processing more requests together
  • If real-world requests are much shorter than the configured maximum sequence length, lowering that limit can free memory
  • Freed memory can often be used to support larger effective batch sizes and higher concurrency per instance

GPU parallelism trade-off:

  • If model weights and activations fit within fewer GPUs than originally allocated, reduce the tensor parallel setting
  • This can allow multiple model replicas per instance instead of one oversized replica spread across all GPUs
  • The result is often better utilization and lower cost for the same traffic

TIP

For SageMaker LLM serving questions, watch for this pattern:

  • Actual requests are much shorter than the configured max sequence length
  • Instance concurrency is low
  • The model can fit on fewer GPUs

That usually points to:

  • reducing maximum sequence length
  • adjusting tensor parallelism

2.6 Amazon Comprehend โ€‹

What Comprehend Does โ€‹

Amazon Comprehend is a managed NLP service โ€” it analyzes text without requiring a foundation model invocation.

Key Capabilities โ€‹

CapabilityDescriptionExam Use Case
Entity RecognitionDetect people, places, organizations, dates in textPre-process documents before RAG ingestion
Sentiment AnalysisPositive / negative / neutral / mixed classificationPost-process FM output or classify user input
Key Phrase ExtractionIdentify the most important phrases in a documentSummarization metadata, tagging
Language DetectionIdentify the language of a text inputRoute to the correct multilingual model
PII DetectionFind and optionally redact PII entitiesCompliance pre-processing before sending to an FM

Comprehend in a GenAI Pipeline โ€‹

text
User Input โ†’ Comprehend (PII check / sentiment / language)
    โ†“ only if safe/appropriate
Amazon Bedrock FM (InvokeModel)
    โ†“
FM Output โ†’ Comprehend (sentiment / entity tagging)
    โ†“
Application Response

Comprehend vs. Bedrock Guardrails PII Redaction

Both can detect PII. Choose:

  • Comprehend when PII detection/redaction happens before reaching Bedrock, or as a standalone NLP step
  • Bedrock Guardrails PII Redaction when you want PII handled within the Bedrock request/response cycle

2.7 Exposing Bedrock APIs: API Gateway & Container Deployments โ€‹

Amazon API Gateway + Lambda + Bedrock โ€‹

The standard pattern to expose a Bedrock-powered feature as an HTTP API:

text
External Client
    โ†“ HTTPS
Amazon API Gateway  (authentication, rate limiting, routing)
    โ†“
AWS Lambda  (business logic, prompt construction, Bedrock call)
    โ†“
Amazon Bedrock (InvokeModel / InvokeAgent)

Why each layer:

  • API Gateway: Handles auth (IAM, Cognito, API Keys), throttling, CORS, and request/response transformation
  • Lambda: Stateless compute to build the prompt, call Bedrock, and format the response
  • Bedrock: Foundation model inference

API Gateway WebSocket APIs for Real-Time GenAI โ€‹

API Gateway WebSocket APIs are used when the application needs a persistent, bidirectional connection instead of simple request-response HTTP.

Why this matters:

  • The client can send messages to the backend at any time
  • The backend can push updates or model output back immediately
  • The connection stays open, so the app avoids repeated reconnect overhead and client polling

Good fit:

  • Real-time chat interfaces
  • Collaborative applications
  • Streaming GenAI responses where the user sends input and receives incremental output over an ongoing connection

How to think about it on the exam:

  • HTTP API Gateway = standard request/response API exposure
  • WebSocket API Gateway = persistent, full-duplex communication for interactive real-time applications

TIP

If the scenario says bidirectional streaming, server push, persistent connection, or real-time collaboration, think API Gateway WebSocket API rather than a basic HTTP API.

ECS / EKS for GenAI Application Backends โ€‹

For applications that need persistent processes, long-running containers, or complex orchestration:

ServiceUse Case
Amazon ECSManaged container orchestration โ€” simpler than Kubernetes, suitable for most GenAI web backends
Amazon EKSKubernetes-based โ€” use when team already runs k8s or needs advanced scheduling

Typical architecture:

text
Load Balancer โ†’ ECS/EKS (GenAI app container) โ†’ Bedrock API

Lambda vs. ECS/EKS

  • Lambda: Stateless, event-driven, <15 min execution โ€” best for per-request Bedrock calls
  • ECS/EKS: Long-running services, streaming WebSocket connections, or containers with heavy dependencies

Hard Limits โ€” Lambda & API Gateway

ServiceLimitImpact
Lambda15 min (900s) max executionFM inference or document processing exceeding this โ†’ use ECS or Bedrock Batch
Lambda6 MB synchronous response payloadLarge FM responses โ†’ use streaming (InvokeModelWithResponseStream)
API Gateway29 seconds integration timeout โ€” cannot be increasedCannot proxy long synchronous FM calls; requires async submit-and-poll pattern

Async pattern for long FM calls behind API Gateway:

text
POST /invoke   โ†’ Lambda submits job โ†’ returns job ID (fast)
               โ†’ SQS / Step Functions processes async
GET  /result/{id} โ†’ Lambda checks status โ†’ returns result when ready

2.8 Cross-Region Inference & Bedrock Flows โ€‹

Cross-Region Inference Profiles โ€‹

Cross-region inference routes model invocations across multiple AWS regions to improve availability and throughput.

  • Bedrock creates a cross-region inference profile that you invoke like a regular model
  • Bedrock automatically routes requests to the region with available capacity
  • Use when: on-demand throughput in a single region is insufficient, or you need higher availability
  • The calling region still applies for billing and data residency awareness โ€” verify compliance requirements

How to invoke:

text
Instead of: modelId = "anthropic.claude-3-sonnet-..."
Use:        modelId = "us.anthropic.claude-3-sonnet-..."  (cross-region profile ID)

Amazon Bedrock Flows โ€‹

Bedrock Flows is a visual, low-code workflow orchestration tool for chaining FM calls, knowledge base retrievals, and prompt logic.

FeatureDescription
Node typesFM prompt, Knowledge Base retrieval, Lambda function, Condition, Input/Output
Use caseMulti-step GenAI pipelines without writing orchestration code
vs. Bedrock AgentsFlows are deterministic, explicitly defined workflows; Agents reason autonomously

Flows vs. Agents on the Exam

  • Bedrock Agents: autonomous reasoning, decides at runtime which tools to call
  • Bedrock Flows: predefined workflow graph, each step is explicit โ€” no autonomous decision-making Choose Flows when the workflow is known upfront and deterministic; choose Agents when the task requires dynamic planning.

Low-Ops Multimodal Analysis Pattern โ€‹

When the workload needs to analyze images or video, extract insights, and present summary trends with minimal infrastructure management, think in terms of a fully managed AWS stack.

Typical pattern:

text
Photos / videos
    โ†“
Amazon Bedrock multimodal FM (for visual analysis)
    โ†“
AWS Step Functions (coordinate stages, branching, retries)
    โ†“
Store extracted results
    โ†“
Amazon QuickSight dashboard

Why this is low ops:

  • Bedrock avoids custom model training and hosting
  • Step Functions handles workflow state, retries, and orchestration without servers
  • QuickSight provides managed dashboards without running BI infrastructure

TIP

If the scenario says analyze visual content, extract structured insights, and show trends in dashboards with the least operational overhead, a strong answer is: Bedrock multimodal model + Step Functions + QuickSight


2.9 Human-in-the-Loop with AWS Step Functions Task Token โ€‹

The Problem โ€‹

Some GenAI workflows require a human decision before proceeding โ€” content approval, sensitive action confirmation, compliance review. The challenge: how do you pause a workflow for hours or days without polling or consuming resources?

Step Functions Task Token Pattern โ€‹

AWS Step Functions solves this with the task token callback pattern:

text
1. Step Functions starts a workflow task โ†’ generates a unique task token
2. Token is sent to an external system (email, dashboard, Slack, SQS)
3. Workflow PAUSES โ€” no polling, no resource consumption
4. Human reviews and approves/rejects
5. External system calls SendTaskSuccess (or SendTaskFailure) with the token
6. Workflow resumes from where it paused

Key API calls:

  • SendTaskSuccess(taskToken, output) โ€” human approved, workflow continues
  • SendTaskFailure(taskToken, error, cause) โ€” human rejected, workflow handles failure

When to Use This Pattern โ€‹

ScenarioPattern
Content generated by FM needs editor approval before publishingTask Token โ€” wait can be hours/days
Agent needs one-step human confirmation within a sessionBedrock Agents Return of Control (RoC)
Compliance review of AI-generated legal/medical textTask Token โ€” external reviewer, long wait
Real-time interactive confirmation in a chatbotBedrock Agents RoC

Bedrock Agents RoC vs. Step Functions Task Token โ€‹

Bedrock Agents Return of ControlStep Functions Task Token
WhereInside Bedrock Agent orchestrationExternal Step Functions workflow wrapping Bedrock
Pause mechanismAgent returns control to the callerWorkflow suspends, waits for callback
Wait durationShort โ€” within the same sessionAny duration โ€” minutes, hours, or days
Resource use while waitingSession remains activeZero โ€” workflow is fully suspended
Best forSingle-step confirmation during an agent loopLong-running approval workflows with external systems

Exam Scenario

"A GenAI application generates content that requires human editorial review before publishing. The review may take up to 48 hours. How should the workflow be designed?" โ†’ Step Functions task token pattern โ€” the workflow pauses without consuming resources until the reviewer sends the callback.

If the wait is short and within a single agent session โ†’ Bedrock Agents Return of Control.

Standard vs. Express Workflow Limits

Standard WorkflowExpress Workflow
Max duration1 year5 minutes
Execution modelAt-least-onceAt-most-once / at-least-once
Audit historyFull auditable historyCloudWatch Logs only
Best forHuman-in-the-loop, long approvalsHigh-throughput event pipelines

Always use Standard for human review workflows โ€” Express's 5-minute cap rules it out for any human wait.


โ† Domain 1 ยท Next: Domain 3 โ†’

Happy Studying! ๐Ÿš€ โ€ข Privacy-friendly analytics โ€” no cookies, no personal data
Privacy Policy โ€ข AI Disclaimer โ€ข Report an issue