AWS Certification · Generative AI Developer — Professional
AIP-C01 Cheat Sheet
5-Minute Master Reference · All Five Domains · Key Services · Decision Trees
Pass: 750 / 1000  ·  65 scored · 10 unscored · 170 min
31%
D1
FM Integration, Data & Compliance
26%
D2
Implementation & Integration
20%
D3
AI Safety, Security & Governance
12%
D4
Operational Efficiency & Optimization
11%
D5
Testing, Validation & Troubleshooting
D1 · Amazon Bedrock Essentials
  • Bedrock — managed API to invoke 20+ FMs, no infrastructure to manage
  • FMs available: Anthropic Claude, Meta Llama, Cohere, Amazon Titan, Stability AI, Mistral
  • Inference params: temperature (creativity 0→1) · top-P/K (sampling) · max_tokens · stop_sequences
  • API calls: InvokeModel (sync) · InvokeModelWithResponseStream (streaming)
  • Pricing: On-demand = pay per token · Provisioned = reserved capacity, required for custom models · Batch = async, −50% cost
  • Embeddings: Amazon Titan Embeddings, Cohere Embed — convert text → vectors for RAG
  • Cross-region inference: route across AWS regions for high availability
D1 · RAG Pipeline
Docs / S3
source
Chunk
split
Embed
vectorise
Vector DB
store
Retrieve
kNN
Augment
prompt
FM
generate
  • Knowledge Bases — fully managed RAG, zero code
  • Vector stores: OpenSearch Serverless (default) · Aurora pgvector · Pinecone · Redis
  • Chunking: fixed-size (simple) · semantic (quality) · hierarchical (docs)
  • Hybrid search = semantic + keyword (BM25) → better recall
  • Metadata filters narrow retrieved chunks before similarity search
  • OpenSearch Neural Plugin — semantic + BM25 hybrid in one query
  • Re-ranking — cross-encoder second pass for precision
D1 · Customisation Decision
ScenarioUse
Need fresh / dynamic dataRAG
Teach model a new output format or toneFine-tune
Domain-specific vocabulary adaptationCont. Pre-train
Quick, no training data availableRAG
Consistent JSON / structured outputFine-tune
Latency-critical, no retrieval lagFine-tune
  • Fine-tune data format: JSONLprompt / completion pairs
  • Custom models require Provisioned Throughput (not on-demand)
  • Distillation: large teacher → small student, same quality cheaper
  • LoRA — updates <1% of params; adapters load on top of base model
D2 · Prompt Engineering
  • Zero-shot — no examples; rely on model knowledge
  • Few-shot — 3–5 examples in prompt; controls format & style precisely
  • Chain-of-Thought (CoT) — "think step by step"; improves logic & maths
  • ReAct = Reason + Act → basis of Bedrock Agents loop
  • Prompt chaining — output of step N becomes input to step N+1
  • System prompt sets persona/constraints · Human/Assistant turns = conversation
  • Prompt injection — attacker embeds malicious instructions → mitigate with Guardrails
  • Bedrock Prompt Management — versioned templates with variable placeholders
D2 · Bedrock Agents (ReAct Loop)
Agent Orchestration Cycle
User
Input
Reason
(Thought)
Act
(Tool)
Observe
(Result)
Answer /
Loop
  • Action Group = Lambda fn + OpenAPI schema → what the agent can do
  • Knowledge Base attached to agent → automatic RAG retrieval
  • Return of Control (RoC) — agent pauses, application handles human approval
  • Multi-agent: Supervisor → delegates to Specialist Sub-agents
  • Agent memory: session (within convo) · cross-session (persisted)
  • Bedrock Flows — low-code visual pipeline builder
D2 · AWS Service Selector
Custom FM app, full control
Amazon Bedrock
No-code chatbot over company docs
Amazon Q Business
Enterprise keyword + semantic search
Amazon Kendra
Sentiment, entities, PII from text
Amazon Comprehend
Extract data from PDFs / forms
Amazon Textract
Train / deploy custom ML on GPU
Amazon SageMaker
Image / video analysis
Amazon Rekognition
Multi-step pipeline orchestration
AWS Step Functions
Audio → text (multimodal pipelines)
Amazon Transcribe
Validate data before FM ingestion
Glue Data Quality
D2 · SageMaker LMI & DJL Serving
ParameterControlsKey insight
tensor_parallel_degree=N GPUs per replica 8 GPUs ÷ 4 = 2 replicas
max_sequence_length KV-cache size ↓ length → ↑ batch → ↑ throughput
max_rolling_batch_size Concurrent requests Continuous batching fills GPU dynamically
  • Endpoint types: Real-time (<60s) · Async (long jobs) · Serverless (sporadic) · Batch Transform (offline)
  • Model Registry — Pending → Approved → CI/CD deploy (governance gate)
  • Model Cards — programmatic Responsible AI documentation via SDK
  • Glue Data Catalog — data lineage & source attribution for compliance
D3 · Guardrails & Responsible AI
Content Filters
hate · violence · sexual · insults — configurable severity threshold
Topic Denial
block competitor mentions, legal advice, investment topics, etc.
PII Redaction
SSN · credit card · email · phone → ANONYMIZE or BLOCK
Grounding Check
verify answer is supported by retrieved context (anti-hallucination)
Word Filters
custom deny-list · profanity filter
  • Responsible AI pillars: Fairness · Explainability · Privacy · Robustness · Transparency
  • Invocation logging → S3 / CloudWatch (required for audit; captures full prompt + response)
  • Encryption: KMS at rest · TLS in transit · VPC endpoints for private access
  • IAM: identity-based + resource-based policies · least-privilege per Lambda
  • SCP — deny bedrock:InvokeModel without GuardrailIdentifier org-wide
D4 · Cost & Performance Optimization
Strategy
Cost ↓
Latency ↓
Batch Inference API
✓ −50%
✗ async
Prompt Caching
Streaming responses
~
✓ perceived
Smaller / distilled model
Provisioned Throughput
~ high-vol
✓ no throttle
Reduce max_sequence_length
✓ ↑ batch
  • Prompt caching caches the static system prompt portion — subsequent calls skip re-compute
  • Batch inference = async jobs for nightly/bulk processing, not real-time
  • CloudWatch metrics: InputTokenCount · OutputTokenCount · InvocationLatency · ThrottledRequests
  • ThrottlingException → exponential backoff + jitter, or upgrade to provisioned throughput
D5 · Testing, Evaluation & Troubleshooting
  • Model Evaluation jobs (Bedrock built-in): auto metrics or human review
  • Auto metrics: ROUGE (summarisation) · BERTScore (semantic) · accuracy
  • RAG-specific (RAGAS):
      Context Precision · Context Recall · Answer Relevancy · Faithfulness
  • Faithfulness = is answer grounded in retrieved context? Low = hallucination
  • Hallucination → use grounding check in Guardrails + citations in response
  • Agent debugging: enable Trace in Bedrock console → shows each ReAct step (Thought / Action / Observation)
  • X-Ray tracing — end-to-end latency across Lambda + Bedrock calls
  • LLM-as-a-Judge — use an FM to evaluate another FM's output quality
  • Common failures: retrieval miss (bad chunking) · context window overflow · prompt injection · format errors · throttling
Exam Tips
D1+D2 = 57% — master RAG & Agents first
No penalty for guessing — answer all 75 questions
"Managed / no-code" → usually Q Business or Knowledge Bases
Custom model inference always needs Provisioned Throughput
Bulk/async → Batch Inference API
Competitor blocking → Topic Denial in Guardrails
Reduce max_sequence_length → frees KV-cache → more concurrency
Compensatory scoring — weak D5 is OK if D1/D2 are strong