Skip to content

Domain 2: Implement generative AI solutions (15-20%) โ€‹

โ† Domain 1 ยท Domain 3 โ†’


This domain covers building and deploying generative AI solutions using Azure OpenAI Service and Microsoft AI Foundry. It tests your ability to choose the right approach (RAG vs fine-tuning), design prompts, and evaluate model outputs.

2.1 Azure OpenAI Service and AI Foundry โ€‹

Key Concepts โ€‹

ConceptDescription
Model CatalogBrowse and deploy models: GPT-4o, GPT-3.5 Turbo, Embeddings (text-embedding-ada-002), DALL-E 3
DeploymentA model instance with a name, capacity type (Standard/PTU), and version
AI Foundry ProjectWhere you build, test, and deploy generative AI apps
Foundry SDKazure-ai-projects + azure-ai-inference packages for app integration

Chat Completion API Structure โ€‹

RolePurposeExample
systemInstructions and persona for the model"You are a helpful assistant that only answers in English."
userThe human's message"What is the capital of France?"
assistantPrior model responses (conversation history)"The capital of France is Paris."

Sampling Parameters โ€‹

ParameterEffectTypical Use
temperatureHigher = more creative/random; 0 = deterministicCreative tasks: 0.7โ€“1.0; factual: 0.0โ€“0.2
top_pControls vocabulary diversity via nucleus samplingAlternative to temperature (use one, not both)
max_tokensHard limit on response lengthPrevent runaway output
stopSequence(s) that terminate the responseStructured output, templates
frequency_penaltyReduces repetition of tokens already usedLong-form generation

2.2 Prompt Engineering โ€‹

Core Techniques โ€‹

TechniqueWhat It DoesWhen to Use
Zero-shotAsk the model with no examplesSimple, well-understood tasks
Few-shotProvide 2โ€“5 input/output examples in the promptTasks needing specific format or tone
Chain-of-Thought (CoT)Ask model to "think step by step"Math, multi-step reasoning
GroundingInclude source data in the prompt (RAG context)Factual accuracy, hallucination reduction
Prompt TemplatesParameterized prompts with placeholdersConsistent formatting across inputs

Grounding vs Fine-tuning

Grounding (RAG): Inject relevant documents into the prompt at runtime โ†’ model answers from provided context. Fast, updatable, no retraining. Fine-tuning: Bake knowledge/style into model weights โ†’ training required. Use when tone, format, or rare domain expertise matters AND data doesn't change often.

The exam almost always prefers RAG when data changes frequently.

Multimodal Inputs โ€‹

  • GPT-4o: Accepts images + text in a single message. Can describe, analyze, or reason about images.
  • DALL-E 3: Generates images from text prompts. Key parameters: size (1024ร—1024 etc.), quality (standard/hd), style (vivid/natural).

2.3 Retrieval Augmented Generation (RAG) โ€‹

RAG grounds the model in your own data without retraining.

Architecture โ€‹

User Query
  โ†’ Embed query (text-embedding-ada-002)
  โ†’ Search AI Search index (vector/hybrid search)
  โ†’ Retrieve top-K document chunks
  โ†’ Inject chunks into prompt as context
  โ†’ Model generates grounded response

On Your Data (Azure OpenAI) โ€‹

A built-in integration that connects Azure OpenAI directly to an Azure AI Search index:

  • No custom code needed for the retrieval step
  • Supports hybrid search (keyword + vector)
  • Returns citations in the response

RAG vs Fine-tuning Decision

FactorRAGFine-tuning
Data changes frequentlyโœ…โŒ
Need specific response tone/formatโŒโœ…
Reduce hallucinations with factual dataโœ…Partial
Expensive, requires retrainingโŒโœ…

2.4 Prompt Flow โ€‹

A visual orchestration tool in AI Foundry for designing, testing, and deploying LLM workflows.

Flow Types โ€‹

TypePurpose
Standard FlowGeneral LLM pipeline (input โ†’ processing โ†’ output)
Chat FlowMulti-turn conversational app with session memory
Evaluation FlowMeasures performance of another flow against a dataset

Key Concepts โ€‹

ConceptDescription
NodesSteps in the flow: LLM node (model call), Python node (custom code), Prompt node (template)
VariantsMultiple versions of a node's prompt โ€” run A/B comparisons in one flow
Inputs/OutputsDefine the schema for what the flow accepts and returns
ConnectionsReferences to Azure OpenAI or other service credentials

Prompt Flow Exam Signals

"Visual workflow for LLM apps" โ†’ Prompt Flow "Evaluate prompt variants against a dataset" โ†’ Prompt Flow (Evaluation Flow) "A/B test two system prompts" โ†’ Prompt Flow Variants


2.5 Model Fine-tuning โ€‹

When to Fine-tune โ€‹

SituationRecommended Approach
Data changes frequently, reduce hallucinationsRAG
Specific tone, format, or domain jargonFine-tuning
Teach the model a new task structureFine-tuning
Need citations from source documentsRAG

Fine-tuning Workflow โ€‹

  1. Prepare data: Format as JSONL with system, user, assistant message objects (min 100โ€“500 examples)
  2. Upload: Push training file to Azure OpenAI
  3. Create job: Start fine-tuning job โ€” specify base model and hyperparameters
  4. Monitor: Track loss metrics; job can take hours
  5. Deploy: Deploy the fine-tuned model as a new endpoint
  6. Evaluate: Test with held-out examples before promoting to production
json
// JSONL training example format:
{"messages": [
  {"role": "system", "content": "You are a legal document assistant."},
  {"role": "user", "content": "Summarize this contract clause."},
  {"role": "assistant", "content": "This clause limits liability to..."}
]}

2.6 Model Evaluation โ€‹

Built-in Evaluation Metrics โ€‹

MetricWhat It Measures
GroundednessDoes the response stay within the provided source context? (RAG quality)
RelevanceDoes the answer address the user's question?
CoherenceIs the response logically structured?
FluencyIs the language grammatically correct and natural?
SimilarityHow close is the output to a golden reference answer?

Evaluation Flows โ€‹

  • Create an Evaluation Flow in Prompt Flow to run a dataset of questions through your flow and score outputs automatically.
  • Reference answers (ground truth) are compared against model outputs using the metrics above.

2.7 Monitoring and Safety โ€‹

ControlPurpose
Content FiltersApplied to inputs AND outputs of Azure OpenAI models (Hate, Violence, Self-Harm, Sexual)
BlocklistsCustom term lists โ€” model will never output blocked terms
Prompt Flow TracingSee latency, token counts, and node execution for each run
Model ReflectionPattern where model critiques its own draft before returning a final response

Flashcards

1 / 8
โ“

(Click to reveal)
๐Ÿ’ก

โ† Domain 1 ยท Domain 3 โ†’

Happy Studying! ๐Ÿš€ โ€ข Privacy-friendly analytics โ€” no cookies, no personal data
Privacy Policy โ€ข AI Disclaimer โ€ข Report an issue