Domain 3: Applications of Foundation Models (28%) โ
โ Domain 2 ยท Next Domain โ
3.1: Design Considerations โ
Cost Optimization โ
Factors Affecting Cost:
- Input tokens: Text sent to model
- Output tokens: Text generated by model
- Model size: Larger models cost more
- Request frequency
Cost Optimization Strategies:
- โ
Choose appropriate model size
- Don't use Claude Opus for simple tasks
- โ
Optimize prompts
- Be concise, avoid unnecessary context
- โ
Cache responses
- Store common queries
- โ
Use smaller models when possible
- Titan for simple tasks, Claude for complex
Latency Considerations โ
Factors Affecting Latency:
- Model size (larger = slower)
- Output length
- Context window usage
- Concurrent requests
Optimization:
- Use smaller models for real-time apps
- Implement streaming responses
- Asynchronous processing for batch jobs
3.2: Choosing the Right Foundation Model โ
Model Selection
Which model for long document analysis?
(Click to reveal)200K context window
Best for analyzing lengthy documents, contracts, research papers.
Decision Table: Model Selection โ
| Use Case | Recommended Model | Why |
|---|---|---|
| Long document analysis | Anthropic Claude | 200K context window |
| Cost-effective text generation | Amazon Titan Text | AWS-optimized, lower cost |
| Image generation | Stable Diffusion XL | High-quality images |
| Code generation | Claude or CodeWhisperer | Strong coding abilities |
| Multilingual content | Cohere or Jurassic | Better language support |
| Embeddings for RAG | Amazon Titan Embeddings | Optimized for search |
| General chatbot | Claude or Titan | Good balance |
3.3: Prompt Engineering โ
Prompt Engineering Techniques
What is Zero-Shot Prompting?
(Click to reveal)Example: "Translate to French: Hello"
Works well for common tasks.
Zero-Shot Prompting โ
No examples, just the instruction
Prompt: "Translate to French: Hello, how are you?"
Output: "Bonjour, comment allez-vous?"Few-Shot Prompting โ
Provide examples to guide the model
Prompt:
Sentiment analysis examples:
"I love this!" โ Positive
"Terrible experience" โ Negative
"It was okay" โ Neutral
Now classify: "Best purchase ever!"
Output: PositiveChain-of-Thought Prompting โ
Ask model to show its reasoning
Prompt: "If apples cost $2/lb and I buy 3.5 lbs, how much do I pay?
Show your work."
Output:
Let me calculate:
- Price per pound: $2
- Pounds purchased: 3.5
- Total: $2 ร 3.5 = $7
Answer: $7Prompt Engineering Best Practices โ
โ Be Specific
- โ "Write about dogs"
- โ "Write a 200-word informative article about dog breeds suitable for apartment living"
โ Provide Context
- โ "Summarize this"
- โ "You are a financial analyst. Summarize this quarterly report for executives."
โ Use Formatting
- Use bullet points, sections
- Clear instructions
โ Specify Output Format
- "Respond in JSON format"
- "Use bullet points"
โ Add Constraints
- "In 100 words or less"
- "Using simple language"
3.4: Retrieval Augmented Generation (RAG) โ
What is RAG? โ
Problem: LLMs don't know your private data or recent events
Solution: RAG provides relevant context to the model
RAG Concepts
What problem does RAG solve?
(Click to reveal)RAG retrieves relevant documents and provides them as context
Reduces hallucinations, enables current data access.
How RAG Works:
1. User asks question
โ
2. Convert question to embedding (vector)
โ
3. Search vector database for similar content
โ
4. Retrieve relevant documents
โ
5. Provide documents + question to LLM
โ
6. LLM generates answer with contextRAG Architecture Components โ
1. Document Ingestion
- Split documents into chunks
- Generate embeddings for each chunk
- Store in vector database
2. Vector Database
- Stores embeddings
- Enables semantic search
- AWS Options: OpenSearch, Bedrock Knowledge Bases
3. Retrieval
- Convert query to embedding
- Find similar documents
- Return top K results
4. Generation
- Send retrieved docs + query to LLM
- LLM generates contextual answer
Benefits of RAG โ
โ Reduces Hallucinations
- Grounds responses in factual documents
โ Up-to-Date Information
- Access to current data
โ Domain-Specific Knowledge
- Use your proprietary data
โ Citations
- Can reference source documents
โ Cost-Effective
- No fine-tuning needed
Amazon Bedrock Knowledge Bases โ
Managed RAG solution
Features:
- Automatic document ingestion
- Vector database management
- Built-in embeddings (Titan Embeddings)
- Simple API integration
Supported Data Sources:
- Amazon S3
- Web crawlers
- Confluence
- SharePoint
- Salesforce
Fine-Tuning vs. Prompt Engineering โ
| Approach | When to Use | Cost | Effort | Flexibility |
|---|---|---|---|---|
| Prompt Engineering | First choice | Low | Low | High |
| RAG | Need current/private data | Medium | Medium | High |
| Fine-Tuning | Domain-specific tasks | High | High | Low |
Exam Decision Pattern
- Start with prompt engineering (cheapest, fastest)
- Use RAG if you need external knowledge
- Fine-tune only if above methods insufficient