Domain 3: Implement AI capabilities in database solutions (25-30%) β
β Domain 2 Β· Back to Overview
This is the DP-800 domain that makes the cert unique. It connects SQL development with modern AI retrieval patterns.
3.1 Design and implement models and embeddings β
External model thinking β
The study guide expects you to evaluate models by:
- modality
- language support
- model size
- structured output support
Create and manage external models β
Microsoftβs study guide is broader than model selection alone. It also expects you to understand how SQL solutions create and manage external models.
| Concern | Why it matters |
|---|---|
| Model capability fit | Different models fit different retrieval, generation, and structured-output tasks |
| Operational management | External models are dependencies that affect latency, cost, and failure paths |
| Security and access | Endpoint configuration and credentials must be handled safely |
| Refresh and compatibility | Model changes can affect embedding quality and downstream search behavior |
Embedding pipeline β
Relational or semi-structured source data
-> select columns to represent meaning
-> split content into chunks
-> generate embeddings
-> store vectors
-> refresh embeddings when source data changesKey design decisions β
| Decision | Why it matters |
|---|---|
| Which columns to embed | Bad input columns produce bad semantic retrieval |
| Chunk size | Too large hurts precision; too small loses context |
| Refresh method | Must match data volatility and operational complexity |
| Model choice | Affects multilingual support, cost, latency, and output quality |
Embedding maintenance methods called out in the guide β
- table triggers
- Change Tracking
- CDC
- CES
- Azure Functions with SQL trigger binding
- Azure Logic Apps
- Microsoft Foundry
TIP
If the source data changes frequently, the correct answer usually includes a repeatable embedding refresh path, not one-time manual generation.
3.2 Design and implement intelligent search β
Search modes β
| Mode | Best for |
|---|---|
| Full-text search | Keyword/token matching |
| Semantic vector search | Meaning-based similarity |
| Hybrid search | Blend keywords with semantic similarity |
SQL/vector concepts to know β
- vector data type
- vector indexes
- vector index types and metrics
- vector size and dimensionality
- vector distance and normalization functions
- ANN vs ENN search behavior
- reciprocal rank fusion (RRF)
- vector and hybrid search performance evaluation
Functions Microsoft is signaling explicitly β
| Function | What to remember |
|---|---|
VECTOR_SEARCH | Search vectors for nearest matches |
VECTOR_DISTANCE | Compare how close vectors are |
VECTOR_NORMALIZE | Normalize vectors before comparison when the pattern requires it |
VECTORPROPERTY | Inspect vector metadata such as dimensionality |
ANN vs ENN β
| Option | Trade-off |
|---|---|
| ANN | Faster, approximate nearest neighbors, better for scale |
| ENN | Exact nearest neighbors, higher precision, usually slower |
Hybrid search β
Hybrid search matters because enterprise SQL use cases often need both:
- exact domain terms, codes, and identifiers
- semantic similarity over meaning
Evaluate vector and hybrid search performance β
This is an official study-guide verb and deserves explicit attention.
| What to evaluate | Why |
|---|---|
| Recall vs speed | ANN can improve speed but may trade away exactness |
| Exactness requirements | ENN may be the better fit when correctness matters more than latency |
| Query pattern mix | Hybrid retrieval helps when exact terms and semantic meaning both matter |
| Ranking quality | RRF improves blended ranking when combining lexical and vector results |
Reciprocal Rank Fusion β
RRF combines ranked results from multiple retrieval methods, typically full-text and vector search, into one final ranked list. If the exam mentions combining multiple ranked result sets into a better blended answer, think RRF.
3.3 Design and implement retrieval-augmented generation (RAG) β
RAG flow in a SQL-centered app β
- Retrieve relevant rows or chunks.
- Convert the context into JSON or structured prompt text.
- Send it to the model.
- Capture and parse the response.
SQL-specific clue β
The official guide explicitly mentions using sp_invoke_external_rest_endpoint to create prompts and send results to a language model.
That means you should be ready to think in this pattern:
- SQL retrieves data
- SQL shapes data into JSON
- SQL calls external AI endpoint
- app or SQL layer parses the response
RAG implementation details the notes should anchor β
- Use SQL retrieval first, not free-form prompting against stale memory
- Shape context into JSON or prompt-ready text
- Call external endpoints from SQL workflows when the scenario is SQL-centered
- Keep the answer tied to current enterprise data
When RAG is the right answer β
- Facts change often
- Grounding must come from current enterprise data
- You need retrieval traceability
- Fine-tuning would be too slow, costly, or stale
RAG Trap
If the requirement is "use the latest database content" or "include current enterprise records", do not jump to fine-tuning. The exam will usually want retrieval + prompt grounding instead.
Quick Decision Rules β
Need keyword-only precision?
Use full-text search.
Need meaning-based similarity?
Use vector search.
Need both business keywords and semantic relevance?
Use hybrid search, often with RRF.
Need generative answers grounded in current SQL data?
Use RAG.
Need high recall at scale with faster search?
Lean toward ANN.
Need exact nearest-neighbor correctness and can afford cost?
Lean toward ENN.
Fast Recall β
- Embeddings turn content into vectors
- Chunking controls retrieval quality
- ANN is approximate and fast
- ENN is exact and slower
- Hybrid search = keyword + semantic
- RRF blends ranked results
sp_invoke_external_rest_endpointis the SQL-native exam clue for external model invocation