Domain 3: Applications of Foundation Models (28%)

← Domain 2 · Next Domain →

3.1: Design Considerations

Cost Optimization

Factors Affecting Cost:

Input tokens: Text sent to model
Output tokens: Text generated by model
Model size: Larger models cost more
Request frequency

Cost Optimization Strategies:

✅ Choose appropriate model size
- Don't use Claude Opus for simple tasks
✅ Optimize prompts
- Be concise, avoid unnecessary context
✅ Cache responses
- Store common queries
✅ Use smaller models when possible
- Titan for simple tasks, Claude for complex

Latency Considerations

Factors Affecting Latency:

Model size (larger = slower)
Output length
Context window usage
Concurrent requests

Optimization:

Use smaller models for real-time apps
Implement streaming responses
Asynchronous processing for batch jobs

3.2: Choosing the Right Foundation Model

Model Selection

1 / 4

❓

Which model for long document analysis?

(Click to reveal)

💡

Anthropic Claude
200K context window
Best for analyzing lengthy documents, contracts, research papers.

Decision Table: Model Selection

Use Case	Recommended Model	Why
Long document analysis	Anthropic Claude	200K context window
Cost-effective text generation	Amazon Titan Text	AWS-optimized, lower cost
Image generation	Stable Diffusion XL	High-quality images
Code generation	Claude or CodeWhisperer	Strong coding abilities
Multilingual content	Cohere or Jurassic	Better language support
Embeddings for RAG	Amazon Titan Embeddings	Optimized for search
General chatbot	Claude or Titan	Good balance

3.3: Prompt Engineering

Prompt Engineering Techniques

1 / 3

❓

What is Zero-Shot Prompting?

(Click to reveal)

💡

No examples, just the instruction
Example: "Translate to French: Hello"
Works well for common tasks.

Zero-Shot Prompting

No examples, just the instruction

Prompt: "Translate to French: Hello, how are you?"
Output: "Bonjour, comment allez-vous?"

Few-Shot Prompting

Provide examples to guide the model

Prompt:
Sentiment analysis examples:
"I love this!" → Positive
"Terrible experience" → Negative
"It was okay" → Neutral

Now classify: "Best purchase ever!"
Output: Positive

Chain-of-Thought Prompting

Ask model to show its reasoning

Prompt: "If apples cost $2/lb and I buy 3.5 lbs, how much do I pay?
Show your work."

Output:
Let me calculate:
- Price per pound: $2
- Pounds purchased: 3.5
- Total: $2 × 3.5 = $7
Answer: $7

Prompt Engineering Best Practices

✅ Be Specific
- ❌ "Write about dogs"
- ✅ "Write a 200-word informative article about dog breeds suitable for apartment living"
✅ Provide Context
- ❌ "Summarize this"
- ✅ "You are a financial analyst. Summarize this quarterly report for executives."
✅ Use Formatting
- Use bullet points, sections
- Clear instructions
✅ Specify Output Format
- "Respond in JSON format"
- "Use bullet points"
✅ Add Constraints
- "In 100 words or less"
- "Using simple language"

3.4: Retrieval Augmented Generation (RAG)

What is RAG?

Problem: LLMs don't know your private data or recent events

Solution: RAG provides relevant context to the model

RAG Concepts

1 / 3

❓

What problem does RAG solve?

(Click to reveal)

💡

LLMs do not know your private data or recent events
RAG retrieves relevant documents and provides them as context
Reduces hallucinations, enables current data access.

How RAG Works:

1. User asks question
   ↓
2. Convert question to embedding (vector)
   ↓
3. Search vector database for similar content
   ↓
4. Retrieve relevant documents
   ↓
5. Provide documents + question to LLM
   ↓
6. LLM generates answer with context

RAG Architecture Components

1. Document Ingestion

Split documents into chunks
Generate embeddings for each chunk
Store in vector database

2. Vector Database

Stores embeddings
Enables semantic search
AWS Options: OpenSearch, Bedrock Knowledge Bases

3. Retrieval

Convert query to embedding
Find similar documents
Return top K results

4. Generation

Send retrieved docs + query to LLM
LLM generates contextual answer

Benefits of RAG

✅ Reduces Hallucinations

Grounds responses in factual documents

✅ Up-to-Date Information

Access to current data

✅ Domain-Specific Knowledge

Use your proprietary data

✅ Citations

Can reference source documents

✅ Cost-Effective

No fine-tuning needed

Amazon Bedrock Knowledge Bases

Managed RAG solution

Features:

Automatic document ingestion
Vector database management
Built-in embeddings (Titan Embeddings)
Simple API integration

Supported Data Sources:

Amazon S3
Web crawlers
Confluence
SharePoint
Salesforce

Fine-Tuning vs. Prompt Engineering

Approach	When to Use	Cost	Effort	Flexibility
Prompt Engineering	First choice	Low	Low	High
RAG	Need current/private data	Medium	Medium	High
Fine-Tuning	Domain-specific tasks	High	High	Low

Exam Decision Pattern

Start with prompt engineering (cheapest, fastest)
Use RAG if you need external knowledge
Fine-tune only if above methods insufficient

← Domain 2 · Next Domain →

Domain 3: Applications of Foundation Models (28%) ​

3.1: Design Considerations ​

Cost Optimization ​

Latency Considerations ​

3.2: Choosing the Right Foundation Model ​

Model Selection

Decision Table: Model Selection ​

3.3: Prompt Engineering ​

Prompt Engineering Techniques

Zero-Shot Prompting ​

Few-Shot Prompting ​

Chain-of-Thought Prompting ​

Prompt Engineering Best Practices ​

3.4: Retrieval Augmented Generation (RAG) ​

What is RAG? ​

RAG Concepts

RAG Architecture Components ​

Benefits of RAG ​

Amazon Bedrock Knowledge Bases ​

Fine-Tuning vs. Prompt Engineering ​

Domain 3: Applications of Foundation Models (28%)

3.1: Design Considerations

Cost Optimization

Latency Considerations

3.2: Choosing the Right Foundation Model

Decision Table: Model Selection

3.3: Prompt Engineering

Zero-Shot Prompting

Few-Shot Prompting

Chain-of-Thought Prompting

Prompt Engineering Best Practices

3.4: Retrieval Augmented Generation (RAG)

What is RAG?

RAG Architecture Components

Benefits of RAG

Amazon Bedrock Knowledge Bases

Fine-Tuning vs. Prompt Engineering