Skip to content

AIP-C01: Cheatsheet โ€‹

โ† Overview ยท โ† Exam Guide ยท Visual Study Kit โ†’

Exam Day Reference

Review this page 5 minutes before the exam.


Foundation Model Quick Reference โ€‹

FMVendorKey StrengthBest For
ClaudeAnthropic200k token context, reasoningLong docs, complex reasoning
LlamaMetaOpen-source, fine-tunableCustom fine-tuning
MistralMistral AIEfficient, fastCost-efficient inference
TitanAWSAWS-native, embeddingsRAG embeddings, summarization
Cohere EmbedCohereMultilingual embeddingsMultilingual RAG

Bedrock API Comparison โ€‹

APIDeliveryUse Case
InvokeModelSynchronous (full response)Simple query-response
InvokeModelWithResponseStreamStreaming (token by token)Low-latency UX / chat
InvokeAgentStreaming + traceMulti-step agentic workflows

InvokeModel: Uses model-specific JSON in the request body. The expected schema varies by model family.

Converse API: Use when you want a consistent message-based interface across Bedrock models with optional inference settings, tool config, guardrails, and prompt variables.

Inference settings:
Low temperature = more deterministic and consistent
Higher temperature = more creative and variable
Lower topP = tighter token selection


Agents vs. Knowledge Bases โ€‹

Knowledge Base (RAG only)Bedrock Agents
External API callsNoYes (via Action Groups + Lambda)
Multi-step reasoningNoYes
Document retrievalYesYes (optional)
Best forStatic knowledge Q&ADynamic workflows, tool use

High-Value Service Map โ€‹

Service / FeaturePurposeQuick Refresher
Amazon BedrockServerless FM accessThe central managed entry point for Claude, Titan, Llama, and other supported models
Knowledge BasesAutomated RAGHandles ingestion, chunking, embedding, storage, and retrieval for private data
Bedrock AgentsMulti-step reasoningBest when the workflow must use tools or take actions through Action Groups
Bedrock FlowsOrchestrated workflowsBetter than Agents when the path is deterministic and explicitly defined
Prompt CachingCost + latency optimizationReuses a repeated static prompt prefix such as a long system prompt
Amazon OpenSearch Serverless / Vector EngineSemantic searchStores embeddings and supports low-latency similarity search for RAG

Guardrails โ€” Four Filter Types โ€‹

Filter TypeWhat It Controls
Content FiltersHarmful categories: hate, violence, sexual content, insults
Denied TopicsTopics the model must refuse to discuss
Word FiltersExact word/phrase blocklists
PII RedactionNames, emails, SSNs, credit cards โ€” Redact or Block

Key rules:

  • Guardrails apply to both inputs AND outputs
  • Must be explicitly applied per API call via guardrailIdentifier + guardrailVersion
  • PII modes: Redact (mask with placeholder) vs. Block (reject request/response)
  • Denied Topics = business-policy blocking at the topic level, not just exact keyword blocking

Safety, Governance, and Ops โ€‹

Service / FeaturePurposeQuick Refresher
Bedrock GuardrailsContent filteringFilters PII, harmful content, and denied topics across model inputs and outputs
Model EvaluationQuality controlUses human or automated metrics such as Groundedness, Relevance, Accuracy, and Fluency
IAM PoliciesAccess controlScope bedrock:InvokeModel permissions tightly; Bedrock does not train on your data by default
AWS PrivateLinkData privacyKeeps Bedrock traffic between your VPC and AWS backbone instead of the public internet
CloudWatchMonitoringToken counts, latency, alarms, dashboards
AWS X-RayTracingEnd-to-end tracing across multi-service GenAI request paths

Vector Store Options โ€‹

StoreTypeUse When
Amazon OpenSearch ServerlessManaged, serverlessBedrock Knowledge Bases (default)
Aurora PostgreSQL (pgvector)RDS extensionExisting PostgreSQL infrastructure
Amazon KendraEnterprise searchNLP-powered enterprise retrieval

Retrieval concepts to remember:

  • k-NN = nearest vectors
  • Hybrid search = semantic + keyword
  • BM25 = keyword ranking
  • Reranking = re-score first-pass results
  • Recall = find more relevant chunks
  • Precision = reduce irrelevant chunks

Chunking Strategies โ€‹

StrategyBest ForTrade-off
Fixed-sizeUniform documentsMay break context at boundaries
Fixed-size + overlapPreserving cross-boundary contextHigher storage cost
SemanticVaried, long-form contentHigher processing complexity
HierarchicalComplex docs: broad + fine retrievalMore complex retrieval logic

Provisioned Throughput vs. On-Demand โ€‹

Provisioned Throughput (PTU)On-Demand
TrafficPredictable, 24/7Sporadic, variable
PricingFixed (per MU/hour)Per token
Commitment1 month or 6 monthsNone
Best forProduction steady-stateDev/test, bursts

Batch Inference = ~50% cheaper than on-demand for non-real-time high-volume jobs.


Optimization Concepts โ€‹

ConceptPurposeQuick Refresher
Provisioned ThroughputGuaranteed capacityUse for steady production workloads to avoid throttling
Model DistillationCost optimizationTrain a smaller student model to mimic a larger teacher model
TemperatureRandomness control0.0 = deterministic; higher values = more creative
Prompt CachingRepeated-prefix reuseSaves cost and latency when large static prompt prefixes repeat across requests

Model Evaluation Metrics โ€‹

MetricWhat It Measures
GroundednessResponse supported by retrieved context? (detects hallucinations)
RelevanceResponse answers the user's question?
AccuracyFactually correct?
FluencyWell-written and natural?

Quick Decision Rules โ€‹

Vector store for Knowledge Base?
โ†’ Amazon OpenSearch Serverless (default) ยท pgvector on Aurora (alternative)

Multi-step reasoning + tool use?
โ†’ Bedrock Agents + Action Groups (Lambda)

Need a unified message-based API across models?
โ†’ Converse API

Content moderation / PII?
โ†’ Guardrails for Amazon Bedrock

Lowest cost for repeated large prompt prefixes?
โ†’ Prompt Caching

Highest grounding / accuracy for enterprise knowledge questions?
โ†’ RAG with Knowledge Bases + strong retrieval

Audit trail for compliance?
โ†’ AWS CloudTrail (not CloudWatch)

Operational monitoring (latency, errors, token counts)?
โ†’ Amazon CloudWatch

Predictable 24/7 throughput?
โ†’ Provisioned Throughput (PTU) โ€” 1 or 6 month commitment

Bulk, non-real-time inference?
โ†’ Batch Inference

Private connectivity to Bedrock API?
โ†’ VPC Endpoint โ€” bedrock-runtime for inference calls

Knowledge changes frequently / need traceability?
โ†’ RAG (Knowledge Bases), not fine-tuning

Detect hallucinations in RAG?
โ†’ Groundedness metric in Model Evaluation

Log all prompts and responses for AI governance?
โ†’ Model Invocation Logging (to S3 or CloudWatch Logs)

Need a fixed multi-step workflow with retry/error handling?
โ†’ Step Functions or Bedrock Flows, not Agents


Key Terminology โ€‹

  • FM: Foundation Model โ€” pre-trained large AI model (Claude, Llama, Titan, etc.)
  • RAG: Retrieval-Augmented Generation โ€” FM inference + vector store retrieval
  • PTU: Provisioned Throughput Unit โ€” reserved Bedrock model capacity
  • MU: Model Unit โ€” unit of PTU capacity purchased
  • PII: Personally Identifiable Information โ€” data that identifies an individual
  • Groundedness: Metric measuring how well a response is grounded in retrieved context
  • Hallucination: FM generating information not present in the provided context
  • Action Group: Lambda function exposed to a Bedrock Agent for tool use
  • OpenSearch Serverless: Managed, serverless vector store used by Bedrock Knowledge Bases
  • Batch Inference: Asynchronous bulk FM inference via S3 JSONL input/output
  • Model Invocation Logging: Bedrock feature that logs all prompts + responses to S3/CloudWatch Logs
  • Prompt Caching: Reuse of a static prompt prefix to reduce repeated compute and cost
  • LLM-as-a-Judge: Using one model to evaluate the quality of another model's output

โ† Overview ยท โ† Exam Guide ยท Visual Study Kit โ†’

Happy Studying! ๐Ÿš€ โ€ข Privacy-friendly analytics โ€” no cookies, no personal data
Privacy Policy โ€ข AI Disclaimer โ€ข Report an issue