Skip to content

Domain 3: Applications of Foundation Models (28%) โ€‹

โ† Domain 2 ยท Next Domain โ†’


3.1: Design Considerations โ€‹

Cost Optimization โ€‹

Factors Affecting Cost:

  • Input tokens: Text sent to model
  • Output tokens: Text generated by model
  • Model size: Larger models cost more
  • Request frequency

Cost Optimization Strategies:

  1. โœ… Choose appropriate model size
    • Don't use Claude Opus for simple tasks
  2. โœ… Optimize prompts
    • Be concise, avoid unnecessary context
  3. โœ… Cache responses
    • Store common queries
  4. โœ… Use smaller models when possible
    • Titan for simple tasks, Claude for complex

Latency Considerations โ€‹

Factors Affecting Latency:

  • Model size (larger = slower)
  • Output length
  • Context window usage
  • Concurrent requests

Optimization:

  • Use smaller models for real-time apps
  • Implement streaming responses
  • Asynchronous processing for batch jobs

3.2: Choosing the Right Foundation Model โ€‹

Model Selection

1 / 4
โ“

Which model for long document analysis?

(Click to reveal)
๐Ÿ’ก
Anthropic Claude
200K context window
Best for analyzing lengthy documents, contracts, research papers.

Decision Table: Model Selection โ€‹

Use CaseRecommended ModelWhy
Long document analysisAnthropic Claude200K context window
Cost-effective text generationAmazon Titan TextAWS-optimized, lower cost
Image generationStable Diffusion XLHigh-quality images
Code generationClaude or CodeWhispererStrong coding abilities
Multilingual contentCohere or JurassicBetter language support
Embeddings for RAGAmazon Titan EmbeddingsOptimized for search
General chatbotClaude or TitanGood balance

3.3: Prompt Engineering โ€‹

Prompt Engineering Techniques

1 / 3
โ“

What is Zero-Shot Prompting?

(Click to reveal)
๐Ÿ’ก
No examples, just the instruction
Example: "Translate to French: Hello"
Works well for common tasks.

Zero-Shot Prompting โ€‹

No examples, just the instruction

Prompt: "Translate to French: Hello, how are you?"
Output: "Bonjour, comment allez-vous?"

Few-Shot Prompting โ€‹

Provide examples to guide the model

Prompt:
Sentiment analysis examples:
"I love this!" โ†’ Positive
"Terrible experience" โ†’ Negative
"It was okay" โ†’ Neutral

Now classify: "Best purchase ever!"
Output: Positive

Chain-of-Thought Prompting โ€‹

Ask model to show its reasoning

Prompt: "If apples cost $2/lb and I buy 3.5 lbs, how much do I pay?
Show your work."

Output:
Let me calculate:
- Price per pound: $2
- Pounds purchased: 3.5
- Total: $2 ร— 3.5 = $7
Answer: $7

Prompt Engineering Best Practices โ€‹

  1. โœ… Be Specific

    • โŒ "Write about dogs"
    • โœ… "Write a 200-word informative article about dog breeds suitable for apartment living"
  2. โœ… Provide Context

    • โŒ "Summarize this"
    • โœ… "You are a financial analyst. Summarize this quarterly report for executives."
  3. โœ… Use Formatting

    • Use bullet points, sections
    • Clear instructions
  4. โœ… Specify Output Format

    • "Respond in JSON format"
    • "Use bullet points"
  5. โœ… Add Constraints

    • "In 100 words or less"
    • "Using simple language"

3.4: Retrieval Augmented Generation (RAG) โ€‹

What is RAG? โ€‹

Problem: LLMs don't know your private data or recent events

Solution: RAG provides relevant context to the model

RAG Concepts

1 / 3
โ“

What problem does RAG solve?

(Click to reveal)
๐Ÿ’ก
LLMs do not know your private data or recent events
RAG retrieves relevant documents and provides them as context
Reduces hallucinations, enables current data access.

How RAG Works:

1. User asks question
   โ†“
2. Convert question to embedding (vector)
   โ†“
3. Search vector database for similar content
   โ†“
4. Retrieve relevant documents
   โ†“
5. Provide documents + question to LLM
   โ†“
6. LLM generates answer with context

RAG Architecture Components โ€‹

1. Document Ingestion

  • Split documents into chunks
  • Generate embeddings for each chunk
  • Store in vector database

2. Vector Database

  • Stores embeddings
  • Enables semantic search
  • AWS Options: OpenSearch, Bedrock Knowledge Bases

3. Retrieval

  • Convert query to embedding
  • Find similar documents
  • Return top K results

4. Generation

  • Send retrieved docs + query to LLM
  • LLM generates contextual answer

Benefits of RAG โ€‹

โœ… Reduces Hallucinations

  • Grounds responses in factual documents

โœ… Up-to-Date Information

  • Access to current data

โœ… Domain-Specific Knowledge

  • Use your proprietary data

โœ… Citations

  • Can reference source documents

โœ… Cost-Effective

  • No fine-tuning needed

Amazon Bedrock Knowledge Bases โ€‹

Managed RAG solution

Features:

  • Automatic document ingestion
  • Vector database management
  • Built-in embeddings (Titan Embeddings)
  • Simple API integration

Supported Data Sources:

  • Amazon S3
  • Web crawlers
  • Confluence
  • SharePoint
  • Salesforce

Fine-Tuning vs. Prompt Engineering โ€‹

ApproachWhen to UseCostEffortFlexibility
Prompt EngineeringFirst choiceLowLowHigh
RAGNeed current/private dataMediumMediumHigh
Fine-TuningDomain-specific tasksHighHighLow

Exam Decision Pattern

  • Start with prompt engineering (cheapest, fastest)
  • Use RAG if you need external knowledge
  • Fine-tune only if above methods insufficient

โ† Domain 2 ยท Next Domain โ†’

Happy Studying! ๐Ÿš€ โ€ข Privacy-friendly analytics โ€” no cookies, no personal data
Privacy Policy โ€ข AI Disclaimer โ€ข Report an issue