AIP-C01 · 5-Minute Cheat Sheet

31%

FM Integration, Data & Compliance

26%

Implementation & Integration

20%

AI Safety, Security & Governance

12%

Operational Efficiency & Optimization

11%

Testing, Validation & Troubleshooting

D1 · Amazon Bedrock Essentials

Bedrock = managed API to invoke FMs — no infra to manage
FMs available: Anthropic Claude, Meta Llama, Cohere, Amazon Titan, Stability AI, Mistral
Inference params: temperature (creativity), top-P/K (sampling), max_tokens, stop_sequences
API calls: InvokeModel (sync) · InvokeModelWithResponseStream (streaming)
Pricing modes: On-demand = pay per token · Provisioned = reserved capacity, needed for custom models · Batch = async -50% cost
Embeddings: Amazon Titan Embeddings, Cohere Embed — convert text → vectors for RAG

D1 · RAG Pipeline (highest exam weight)

Docs / S3

source

→

Chunk

split

→

Embed

vectorize

→

Vector DB

store

→

Retrieve

kNN

→

Augment

prompt

→

generate

Knowledge Bases for Bedrock = fully managed RAG (no code)
Vector stores: OpenSearch Serverless · Aurora pgvector · Pinecone · Redis
Chunking: fixed-size (simple) · semantic (quality) · hierarchical (docs)
Hybrid search = semantic + keyword (BM25) → better recall
Metadata filters narrow retrieved chunks before similarity search

D1 · Customization Decision Tree

Scenario	Use
Need fresh/dynamic data	RAG
Teach model a new style/format	Fine-tune
Domain-specific vocabulary adaptation	Continued pre-train
Quick, no training data	RAG
Consistent tone in outputs	Fine-tune
Latency-critical, no retrieval lag	Fine-tune

Fine-tune data format: JSONL with prompt/completion pairs
Custom models require provisioned throughput to invoke (not on-demand)
Model Distillation: large teacher → small student, same quality cheaper

D2 · Prompt Engineering Patterns

Zero-shot — no examples, rely on model knowledge Use when task is simple & model knows the domain
Few-shot — 3–5 examples in prompt Use when format/style must be controlled precisely
Chain-of-Thought (CoT) — "think step by step" Use for math, logic, multi-step reasoning
ReAct = Reason + Act → basis of Bedrock Agents loop
Prompt chaining — output of step N becomes input to step N+1
System prompt sets persona/constraints · Human/Assistant turns = conversation
Prompt injection — attacker embeds malicious instructions → mitigate with Guardrails

D2 · Bedrock Agents (ReAct Loop)

Agent Orchestration Cycle

User
Input

→

FM
Reason

→

Decide
Tool

→

Lambda
Action

→

Observe
Result

→

Loop or
Answer

Action Group = Lambda fn + OpenAPI schema → what the agent can do
Knowledge Base attached to agent → automatic RAG retrieval
Return of Control (RoC) — agent pauses, awaits human approval
Multi-agent: Supervisor agent delegates to Sub-agents by specialty
Agent memory: session (within convo) · cross-session (persisted)
Bedrock Flows = visual drag-and-drop workflow builder (low-code)

D2 · Right Service, Right Job

Need	Service
Custom FM app, full control	Amazon Bedrock
Managed RAG chatbot over company docs	Amazon Q Business
Enterprise keyword + semantic search	Amazon Kendra
Sentiment, entities, PII from text	Amazon Comprehend
Extract data from PDFs / forms	Amazon Textract
Train/deploy custom ML models	Amazon SageMaker
Image / video understanding	Amazon Rekognition
Multi-service pipeline orchestration	AWS Step Functions

D3 · Guardrails & Responsible AI

Content Filters hate, violence, sexual, insults — configurable severity

Topic Denial block competitor mentions, legal advice, etc.

PII Redaction SSN, credit card, email, phone → ANONYMIZE or BLOCK

Grounding Check verify answer is supported by retrieved context (anti-hallucination)

Word Filters custom deny-list · profanity filter

Responsible AI pillars: Fairness · Explainability · Privacy · Robustness · Transparency
Model invocation logging → S3 / CloudWatch (required for audit)
Encryption: KMS at rest · TLS in transit · VPC endpoints for private access
IAM: identity-based + resource-based policies · least-privilege per Lambda

D4 · Cost & Performance Optimization

Strategy	Cost ↓	Latency ↓
Batch Inference API	✓ -50%	✗ async
Prompt Caching	✓	✓
Streaming responses	~	✓ perceived
Smaller model	✓	✓
Provisioned throughput	~ (high vol)	✓
Token compression	✓	✓

Prompt caching caches the system prompt portion (static, repeated context)
Batch inference = async jobs, for nightly/bulk processing, not real-time
CloudWatch metrics: InputTokenCount · OutputTokenCount · InvocationLatency · ThrottledRequests

D5 · Testing, Evaluation & Troubleshooting

Model Evaluation jobs (Bedrock built-in): auto metrics or human review
Auto metrics: ROUGE (summarization) · BERTScore (semantic) · accuracy
RAG-specific (RAGAS): Context Precision · Context Recall · Answer Relevancy · Groundedness
Hallucination → use grounding check in Guardrails + citations in response
Agent debugging: enable Trace in Bedrock console → shows each ReAct step
X-Ray tracing for end-to-end latency across Lambda + Bedrock calls
ThrottlingException → implement exponential backoff + jitter, or upgrade to provisioned throughput
Common failures: retrieval miss (bad chunking) · context window overflow · prompt injection · format errors

Exam Tips

D1+D2 = 57% — master RAG & Agents first No penalty for guessing — always answer all 75 questions "Managed / no-code" in stem → usually Q Business or Knowledge Bases Custom model inference always needs Provisioned Throughput Bulk/async workloads → Batch Inference API Competitor blocking → Topic Denial in Guardrails Compensatory scoring — weak on D5 is OK if D1/D2 are strong