Domain 2: Implement generative AI solutions (15-20%) โ
This domain covers building and deploying generative AI solutions using Azure OpenAI Service and Microsoft AI Foundry. It tests your ability to choose the right approach (RAG vs fine-tuning), design prompts, and evaluate model outputs.
2.1 Azure OpenAI Service and AI Foundry โ
Key Concepts โ
| Concept | Description |
|---|---|
| Model Catalog | Browse and deploy models: GPT-4o, GPT-3.5 Turbo, Embeddings (text-embedding-ada-002), DALL-E 3 |
| Deployment | A model instance with a name, capacity type (Standard/PTU), and version |
| AI Foundry Project | Where you build, test, and deploy generative AI apps |
| Foundry SDK | azure-ai-projects + azure-ai-inference packages for app integration |
Chat Completion API Structure โ
| Role | Purpose | Example |
|---|---|---|
system | Instructions and persona for the model | "You are a helpful assistant that only answers in English." |
user | The human's message | "What is the capital of France?" |
assistant | Prior model responses (conversation history) | "The capital of France is Paris." |
Sampling Parameters โ
| Parameter | Effect | Typical Use |
|---|---|---|
temperature | Higher = more creative/random; 0 = deterministic | Creative tasks: 0.7โ1.0; factual: 0.0โ0.2 |
top_p | Controls vocabulary diversity via nucleus sampling | Alternative to temperature (use one, not both) |
max_tokens | Hard limit on response length | Prevent runaway output |
stop | Sequence(s) that terminate the response | Structured output, templates |
frequency_penalty | Reduces repetition of tokens already used | Long-form generation |
2.2 Prompt Engineering โ
Core Techniques โ
| Technique | What It Does | When to Use |
|---|---|---|
| Zero-shot | Ask the model with no examples | Simple, well-understood tasks |
| Few-shot | Provide 2โ5 input/output examples in the prompt | Tasks needing specific format or tone |
| Chain-of-Thought (CoT) | Ask model to "think step by step" | Math, multi-step reasoning |
| Grounding | Include source data in the prompt (RAG context) | Factual accuracy, hallucination reduction |
| Prompt Templates | Parameterized prompts with placeholders | Consistent formatting across inputs |
Grounding vs Fine-tuning
Grounding (RAG): Inject relevant documents into the prompt at runtime โ model answers from provided context. Fast, updatable, no retraining. Fine-tuning: Bake knowledge/style into model weights โ training required. Use when tone, format, or rare domain expertise matters AND data doesn't change often.
The exam almost always prefers RAG when data changes frequently.
Multimodal Inputs โ
- GPT-4o: Accepts images + text in a single message. Can describe, analyze, or reason about images.
- DALL-E 3: Generates images from text prompts. Key parameters:
size(1024ร1024 etc.),quality(standard/hd),style(vivid/natural).
2.3 Retrieval Augmented Generation (RAG) โ
RAG grounds the model in your own data without retraining.
Architecture โ
User Query
โ Embed query (text-embedding-ada-002)
โ Search AI Search index (vector/hybrid search)
โ Retrieve top-K document chunks
โ Inject chunks into prompt as context
โ Model generates grounded responseOn Your Data (Azure OpenAI) โ
A built-in integration that connects Azure OpenAI directly to an Azure AI Search index:
- No custom code needed for the retrieval step
- Supports hybrid search (keyword + vector)
- Returns citations in the response
RAG vs Fine-tuning Decision
| Factor | RAG | Fine-tuning |
|---|---|---|
| Data changes frequently | โ | โ |
| Need specific response tone/format | โ | โ |
| Reduce hallucinations with factual data | โ | Partial |
| Expensive, requires retraining | โ | โ |
2.4 Prompt Flow โ
A visual orchestration tool in AI Foundry for designing, testing, and deploying LLM workflows.
Flow Types โ
| Type | Purpose |
|---|---|
| Standard Flow | General LLM pipeline (input โ processing โ output) |
| Chat Flow | Multi-turn conversational app with session memory |
| Evaluation Flow | Measures performance of another flow against a dataset |
Key Concepts โ
| Concept | Description |
|---|---|
| Nodes | Steps in the flow: LLM node (model call), Python node (custom code), Prompt node (template) |
| Variants | Multiple versions of a node's prompt โ run A/B comparisons in one flow |
| Inputs/Outputs | Define the schema for what the flow accepts and returns |
| Connections | References to Azure OpenAI or other service credentials |
Prompt Flow Exam Signals
"Visual workflow for LLM apps" โ Prompt Flow "Evaluate prompt variants against a dataset" โ Prompt Flow (Evaluation Flow) "A/B test two system prompts" โ Prompt Flow Variants
2.5 Model Fine-tuning โ
When to Fine-tune โ
| Situation | Recommended Approach |
|---|---|
| Data changes frequently, reduce hallucinations | RAG |
| Specific tone, format, or domain jargon | Fine-tuning |
| Teach the model a new task structure | Fine-tuning |
| Need citations from source documents | RAG |
Fine-tuning Workflow โ
- Prepare data: Format as JSONL with
system,user,assistantmessage objects (min 100โ500 examples) - Upload: Push training file to Azure OpenAI
- Create job: Start fine-tuning job โ specify base model and hyperparameters
- Monitor: Track loss metrics; job can take hours
- Deploy: Deploy the fine-tuned model as a new endpoint
- Evaluate: Test with held-out examples before promoting to production
// JSONL training example format:
{"messages": [
{"role": "system", "content": "You are a legal document assistant."},
{"role": "user", "content": "Summarize this contract clause."},
{"role": "assistant", "content": "This clause limits liability to..."}
]}2.6 Model Evaluation โ
Built-in Evaluation Metrics โ
| Metric | What It Measures |
|---|---|
| Groundedness | Does the response stay within the provided source context? (RAG quality) |
| Relevance | Does the answer address the user's question? |
| Coherence | Is the response logically structured? |
| Fluency | Is the language grammatically correct and natural? |
| Similarity | How close is the output to a golden reference answer? |
Evaluation Flows โ
- Create an Evaluation Flow in Prompt Flow to run a dataset of questions through your flow and score outputs automatically.
- Reference answers (ground truth) are compared against model outputs using the metrics above.
2.7 Monitoring and Safety โ
| Control | Purpose |
|---|---|
| Content Filters | Applied to inputs AND outputs of Azure OpenAI models (Hate, Violence, Self-Harm, Sexual) |
| Blocklists | Custom term lists โ model will never output blocked terms |
| Prompt Flow Tracing | See latency, token counts, and node execution for each run |
| Model Reflection | Pattern where model critiques its own draft before returning a final response |