Amazon Bedrock
Core Exam Service
Fully managed API to invoke 20+ FMs. Includes Knowledge Bases (RAG), Agents (multi-step reasoning), Guardrails (safety), Prompt Management, Model Evaluation, and Batch Inference.
Use when scenario mentions:
Building a custom GenAI application
RAG pipeline with control over chunking / retrieval
Bedrock Agents for autonomous multi-step tasks
Fine-tuning, distillation, or prompt management
Safety controls via Guardrails
Custom codeFM includedAgents + RAG
Amazon Q Business
No-Code RAG
Fully managed enterprise AI assistant. Connect to 40+ data sources (S3, SharePoint, Confluence, Salesforce). Employees ask questions in natural language — Q retrieves and answers from company docs.
Use when:
"Zero custom code" or "operational in days"
Internal employee chatbot over company documents
Connect SharePoint, Confluence, Salesforce, S3
No ML engineers available
No custom codeFM included40+ connectors
Amazon Kendra
Enterprise Search
Managed intelligent search combining keyword (BM25) and semantic search. Returns ranked results, not conversational answers. Often combined with Bedrock for a generation layer on top.
Use when:
Search product (results list), not a chatbot
Hybrid keyword + semantic search needed
Enterprise doc search with relevance tuning
Often feeds into a Bedrock FM for answer generation
Search onlyNo generation40+ connectors
Amazon SageMaker
ML Platform + LMI
Full ML lifecycle AND primary service for deploying fine-tuned LLMs on GPU via Large Model Inference (LMI) containers with DJL Serving. Required when Bedrock doesn't give enough infrastructure control.
Use when:
Deploying fine-tuned LLMs on GPU (LMI + DJL)
Optimizing GPU: tensor parallelism, continuous batching
Long FM completions (>60s) → Async Inference endpoint
Model versioning + approval → Model Registry
Training from scratch or LoRA with full control
Custom codeNo FM includedLMI · DJL · GPU
DJL Serving (SageMaker LMI)
GPU Optimization
Open-source serving framework inside SageMaker LMI containers. Key insight: KV-cache ∝ max_sequence_length. Reducing it frees GPU memory for more concurrent requests.
Critical exam parameters:
tensor_parallel_degree=N → shards across N GPUs. 8 GPUs ÷ 4 = 2 replicas
max_sequence_length → controls KV-cache. ↓ length = ↑ batch size
max_rolling_batch_size → concurrent requests per instance
Continuous batching fills GPU slots dynamically
KV-Cache ∝ seq_lenMulti-replica
Amazon Comprehend
NLP Service
Managed NLP for text analysis: sentiment, key phrases, named entities, language detection, PII detection, and custom classification. Pre/post-processing step alongside FM calls.
Use when:
Sentiment analysis on customer reviews/feedback
Extract named entities from text at scale
Detect and classify PII in text documents
Route tickets by topic before sending to FM
No generationText analysisPre-processing
Amazon Textract
Document Extraction
Extracts text, tables, forms, key-value pairs from scanned documents and PDFs. Essential pre-processing before feeding document content into a Bedrock RAG pipeline.
Use when:
Scanned PDFs, invoices, medical forms, contracts
Extract structured tables from documents
Pre-process docs before RAG ingestion
"Document extraction" or "OCR" in scenario
Ingestion pre-stepTables + formsNo generation
Amazon Rekognition
Computer Vision
Managed computer vision: object/scene labels, facial analysis, celebrity recognition, content moderation, text-in-image detection, and activity detection in video.
Use when:
Image or video analysis in the pipeline
Content moderation for user-uploaded media
Multimodal: image → labels → FM prompt
"Face detection" or "image classification"
Vision onlyNo text generation
AWS Step Functions
Workflow Orchestration
Visual workflow orchestration with state machines. Coordinates Lambda, Bedrock, SageMaker, and other services with built-in retry, error handling, parallel branches, and circuit breaker patterns.
Use when:
Multi-step AI pipeline needing retry + error handling
Parallel processing branches required
Coordinate Textract → Comprehend → Bedrock pipeline
"Orchestrate", "circuit breaker", "error handling between steps"
Pipeline onlyRetry built-inNo FM
AWS Glue Data Quality
Data Validation
Defines and runs data quality rules (completeness, uniqueness, validity) against datasets before FM fine-tuning or RAG ingestion. Prevents bad data from degrading model or retrieval quality.
Use when:
Validating documents before RAG KB ingestion
Checking training data quality before fine-tuning
"Data validation", "quality rules", "completeness check"
Pre-ingestionNo FM
Amazon Transcribe
Speech-to-Text
Managed speech-to-text service converting audio recordings to text. Standard pre-processing step for audio data entering FM pipelines — enables voice/call data to be embedded, indexed, or analyzed.
Use when:
Audio recordings or call data needs FM analysis
Multimodal pipeline includes voice/audio data
"Call recordings", "audio to text", "voice data"
Audio pre-stepNo generation