AIP-C01: Cheatsheet โ
โ Overview ยท โ Exam Guide
Exam Day Reference
Review this page 5 minutes before the exam.
Foundation Model Quick Reference โ
| FM | Vendor | Key Strength | Best For |
|---|---|---|---|
| Claude | Anthropic | 200k token context, reasoning | Long docs, complex reasoning |
| Llama | Meta | Open-source, fine-tunable | Custom fine-tuning |
| Mistral | Mistral AI | Efficient, fast | Cost-efficient inference |
| Titan | AWS | AWS-native, embeddings | RAG embeddings, summarization |
| Cohere Embed | Cohere | Multilingual embeddings | Multilingual RAG |
Bedrock API Comparison โ
| API | Delivery | Use Case |
|---|---|---|
InvokeModel | Synchronous (full response) | Simple query-response |
InvokeModelWithResponseStream | Streaming (token by token) | Low-latency UX / chat |
InvokeAgent | Streaming + trace | Multi-step agentic workflows |
Agents vs. Knowledge Bases โ
| Knowledge Base (RAG only) | Bedrock Agents | |
|---|---|---|
| External API calls | No | Yes (via Action Groups + Lambda) |
| Multi-step reasoning | No | Yes |
| Document retrieval | Yes | Yes (optional) |
| Best for | Static knowledge Q&A | Dynamic workflows, tool use |
Guardrails โ Four Filter Types โ
| Filter Type | What It Controls |
|---|---|
| Content Filters | Harmful categories: hate, violence, sexual content, insults |
| Denied Topics | Topics the model must refuse to discuss |
| Word Filters | Exact word/phrase blocklists |
| PII Redaction | Names, emails, SSNs, credit cards โ Redact or Block |
Key rules:
- Guardrails apply to both inputs AND outputs
- Must be explicitly applied per API call via
guardrailIdentifier+guardrailVersion - PII modes: Redact (mask with placeholder) vs. Block (reject request/response)
Vector Store Options โ
| Store | Type | Use When |
|---|---|---|
| Amazon OpenSearch Serverless | Managed, serverless | Bedrock Knowledge Bases (default) |
| Aurora PostgreSQL (pgvector) | RDS extension | Existing PostgreSQL infrastructure |
| Amazon Kendra | Enterprise search | NLP-powered enterprise retrieval |
Chunking Strategies โ
| Strategy | Best For | Trade-off |
|---|---|---|
| Fixed-size | Uniform documents | May break context at boundaries |
| Fixed-size + overlap | Preserving cross-boundary context | Higher storage cost |
| Semantic | Varied, long-form content | Higher processing complexity |
| Hierarchical | Complex docs: broad + fine retrieval | More complex retrieval logic |
Provisioned Throughput vs. On-Demand โ
| Provisioned Throughput (PTU) | On-Demand | |
|---|---|---|
| Traffic | Predictable, 24/7 | Sporadic, variable |
| Pricing | Fixed (per MU/hour) | Per token |
| Commitment | 1 month or 6 months | None |
| Best for | Production steady-state | Dev/test, bursts |
Batch Inference = ~50% cheaper than on-demand for non-real-time high-volume jobs.
Model Evaluation Metrics โ
| Metric | What It Measures |
|---|---|
| Groundedness | Response supported by retrieved context? (detects hallucinations) |
| Relevance | Response answers the user's question? |
| Accuracy | Factually correct? |
| Fluency | Well-written and natural? |
Quick Decision Rules โ
Vector store for Knowledge Base? โ Amazon OpenSearch Serverless (default) ยท pgvector on Aurora (alternative)
Multi-step reasoning + tool use? โ Bedrock Agents + Action Groups (Lambda)
Content moderation / PII? โ Guardrails for Amazon Bedrock
Audit trail for compliance? โ AWS CloudTrail (not CloudWatch)
Operational monitoring (latency, errors, token counts)? โ Amazon CloudWatch
Predictable 24/7 throughput? โ Provisioned Throughput (PTU) โ 1 or 6 month commitment
Bulk, non-real-time inference? โ Batch Inference
Private connectivity to Bedrock API? โ VPC Endpoint โ bedrock-runtime for inference calls
Knowledge changes frequently / need traceability? โ RAG (Knowledge Bases), not fine-tuning
Detect hallucinations in RAG? โ Groundedness metric in Model Evaluation
Log all prompts and responses for AI governance? โ Model Invocation Logging (to S3 or CloudWatch Logs)
Key Terminology โ
- FM: Foundation Model โ pre-trained large AI model (Claude, Llama, Titan, etc.)
- RAG: Retrieval-Augmented Generation โ FM inference + vector store retrieval
- PTU: Provisioned Throughput Unit โ reserved Bedrock model capacity
- MU: Model Unit โ unit of PTU capacity purchased
- PII: Personally Identifiable Information โ data that identifies an individual
- Groundedness: Metric measuring how well a response is grounded in retrieved context
- Hallucination: FM generating information not present in the provided context
- Action Group: Lambda function exposed to a Bedrock Agent for tool use
- OpenSearch Serverless: Managed, serverless vector store used by Bedrock Knowledge Bases
- Batch Inference: Asynchronous bulk FM inference via S3 JSONL input/output
- Model Invocation Logging: Bedrock feature that logs all prompts + responses to S3/CloudWatch Logs