Skip to content

AIP-C01: Cheatsheet โ€‹

โ† Overview ยท โ† Exam Guide

Exam Day Reference

Review this page 5 minutes before the exam.


Foundation Model Quick Reference โ€‹

FMVendorKey StrengthBest For
ClaudeAnthropic200k token context, reasoningLong docs, complex reasoning
LlamaMetaOpen-source, fine-tunableCustom fine-tuning
MistralMistral AIEfficient, fastCost-efficient inference
TitanAWSAWS-native, embeddingsRAG embeddings, summarization
Cohere EmbedCohereMultilingual embeddingsMultilingual RAG

Bedrock API Comparison โ€‹

APIDeliveryUse Case
InvokeModelSynchronous (full response)Simple query-response
InvokeModelWithResponseStreamStreaming (token by token)Low-latency UX / chat
InvokeAgentStreaming + traceMulti-step agentic workflows

Agents vs. Knowledge Bases โ€‹

Knowledge Base (RAG only)Bedrock Agents
External API callsNoYes (via Action Groups + Lambda)
Multi-step reasoningNoYes
Document retrievalYesYes (optional)
Best forStatic knowledge Q&ADynamic workflows, tool use

Guardrails โ€” Four Filter Types โ€‹

Filter TypeWhat It Controls
Content FiltersHarmful categories: hate, violence, sexual content, insults
Denied TopicsTopics the model must refuse to discuss
Word FiltersExact word/phrase blocklists
PII RedactionNames, emails, SSNs, credit cards โ€” Redact or Block

Key rules:

  • Guardrails apply to both inputs AND outputs
  • Must be explicitly applied per API call via guardrailIdentifier + guardrailVersion
  • PII modes: Redact (mask with placeholder) vs. Block (reject request/response)

Vector Store Options โ€‹

StoreTypeUse When
Amazon OpenSearch ServerlessManaged, serverlessBedrock Knowledge Bases (default)
Aurora PostgreSQL (pgvector)RDS extensionExisting PostgreSQL infrastructure
Amazon KendraEnterprise searchNLP-powered enterprise retrieval

Chunking Strategies โ€‹

StrategyBest ForTrade-off
Fixed-sizeUniform documentsMay break context at boundaries
Fixed-size + overlapPreserving cross-boundary contextHigher storage cost
SemanticVaried, long-form contentHigher processing complexity
HierarchicalComplex docs: broad + fine retrievalMore complex retrieval logic

Provisioned Throughput vs. On-Demand โ€‹

Provisioned Throughput (PTU)On-Demand
TrafficPredictable, 24/7Sporadic, variable
PricingFixed (per MU/hour)Per token
Commitment1 month or 6 monthsNone
Best forProduction steady-stateDev/test, bursts

Batch Inference = ~50% cheaper than on-demand for non-real-time high-volume jobs.


Model Evaluation Metrics โ€‹

MetricWhat It Measures
GroundednessResponse supported by retrieved context? (detects hallucinations)
RelevanceResponse answers the user's question?
AccuracyFactually correct?
FluencyWell-written and natural?

Quick Decision Rules โ€‹

Vector store for Knowledge Base? โ†’ Amazon OpenSearch Serverless (default) ยท pgvector on Aurora (alternative)

Multi-step reasoning + tool use? โ†’ Bedrock Agents + Action Groups (Lambda)

Content moderation / PII? โ†’ Guardrails for Amazon Bedrock

Audit trail for compliance? โ†’ AWS CloudTrail (not CloudWatch)

Operational monitoring (latency, errors, token counts)? โ†’ Amazon CloudWatch

Predictable 24/7 throughput? โ†’ Provisioned Throughput (PTU) โ€” 1 or 6 month commitment

Bulk, non-real-time inference? โ†’ Batch Inference

Private connectivity to Bedrock API? โ†’ VPC Endpoint โ€” bedrock-runtime for inference calls

Knowledge changes frequently / need traceability? โ†’ RAG (Knowledge Bases), not fine-tuning

Detect hallucinations in RAG? โ†’ Groundedness metric in Model Evaluation

Log all prompts and responses for AI governance? โ†’ Model Invocation Logging (to S3 or CloudWatch Logs)


Key Terminology โ€‹

  • FM: Foundation Model โ€” pre-trained large AI model (Claude, Llama, Titan, etc.)
  • RAG: Retrieval-Augmented Generation โ€” FM inference + vector store retrieval
  • PTU: Provisioned Throughput Unit โ€” reserved Bedrock model capacity
  • MU: Model Unit โ€” unit of PTU capacity purchased
  • PII: Personally Identifiable Information โ€” data that identifies an individual
  • Groundedness: Metric measuring how well a response is grounded in retrieved context
  • Hallucination: FM generating information not present in the provided context
  • Action Group: Lambda function exposed to a Bedrock Agent for tool use
  • OpenSearch Serverless: Managed, serverless vector store used by Bedrock Knowledge Bases
  • Batch Inference: Asynchronous bulk FM inference via S3 JSONL input/output
  • Model Invocation Logging: Bedrock feature that logs all prompts + responses to S3/CloudWatch Logs

โ† Overview ยท โ† Exam Guide

Happy Studying! ๐Ÿš€ โ€ข Privacy-friendly analytics โ€” no cookies, no personal data
Privacy Policy โ€ข AI Disclaimer โ€ข Report an issue