DP-800 · Service Decision Flowchart

🔒 Security Feature Selection

Scenario / Requirement	Correct Feature	Why	Common Trap
Users must only see their own rows	Row-Level Security (RLS) — FILTER predicate	Transparent to app. Policy enforced in DB regardless of query source.	Using WHERE in every query — dev can forget it.
Show masked SSN to most users, real value to admins	Dynamic Data Masking (DDM)	db_owner and UNMASK-privileged users see real data. Others see mask.	DDM is NOT encryption — app-tier only. Privileged queries bypass it.
Encrypt a column. App needs to search/join on it (equality).	Always Encrypted — DETERMINISTIC	Same plaintext → same ciphertext → equality comparisons work.	Choosing RANDOMIZED blocks any filtering on the column.
Encrypt highly sensitive column. No filtering needed.	Always Encrypted — RANDOMIZED	Each encryption unique — no pattern leakage. Most secure.	Using DETERMINISTIC unnecessarily exposes patterns.
Azure resource must connect to Azure SQL — no passwords in code	Managed Identity + Azure AD token	No secret to rotate/leak. Integrates with Azure RBAC.	Storing connection string in app config or pipeline YAML.
Prevent user UPDATE from changing row to one they can't see	RLS BEFORE UPDATE BLOCK predicate	Checks the new values against the predicate before allowing UPDATE.	Using FILTER only — doesn't prevent exfiltration via UPDATE.
Store API key for external REST calls in T-SQL	DATABASE SCOPED CREDENTIAL	Encrypted in DB, referenced by name. Never appears in plain T-SQL.	Hardcoding in stored procedure or T-SQL variable.
Secure DAB GraphQL/REST endpoint in production	Managed Identity + Azure AD JWT validation	Roles map to permissions in dab-config.json. No key management.	Using anonymous role for non-public data; API key in URL params.

🔍 Search Strategy Decision Tree

Requirement	Search Type	Implementation	Key Functions
Find documents containing exact phrase "product recall"	Full-Text	Full-text index + catalog on content column	CONTAINS(col, '"product recall"')
Find documents about "car accident" even if they say "vehicle collision"	Semantic Vector	Embedding column + VECTOR_DISTANCE cosine	VECTOR_DISTANCE('cosine', emb, @qEmb)
Find documents matching both keyword and semantic meaning — production RAG	Hybrid (Full-text + Vector + RRF)	Both indexes. FREETEXTTABLE + VECTOR_SEARCH. Merge via RRF.	RRF: 1.0/(60+rank) summed, ORDER BY combined score
Multilingual search — users query in French, docs in English	Semantic Vector	Multilingual embedding model (e.g., text-embedding-3-large)	Embeddings capture cross-lingual semantics
100% recall required on 5,000 rows, latency not critical	ENN (Exact Nearest Neighbor)	ORDER BY VECTOR_DISTANCE — no index needed	Full table scan No index
Sub-50ms search on 5 million rows	ANN (DiskANN index)	CREATE VECTOR INDEX, VECTOR_SEARCH()	Index required DiskANN
Stemming — "runs" matches "running", "ran"	Full-Text with FORMSOF	Full-text index + CONTAINS FORMSOF(INFLECTIONAL)	CONTAINS(col, 'FORMSOF(INFLECTIONAL, run)')
Merge full-text (RANK 0-1000) and vector (distance 0-1) scores	RRF (Reciprocal Rank Fusion)	Rank each list separately, apply 1/(60+rank), sum scores	Scale-invariant: works regardless of score ranges

📡 Change Detection Method Selection

Method	What It Captures	Overhead	Best For	Not For
Change Tracking	Row identity + change type (INSERT/UPDATE/DELETE) since a sync version. No column values.	Very Low	Sync scenarios. Lightweight embedding trigger where you re-read the row.	When you need before/after column values without re-querying.
CDC (Change Data Capture)	Before AND after column values for each change, from transaction log.	Medium	ETL pipelines. Embedding refresh when you need exact new column values. Audit with history.	Very high-write tables (log read overhead). Doesn't work with memory-optimized tables.
CES (Change Event Streaming)	Real-time event stream of changes as they happen.	Higher	Event-driven AI pipelines. Sub-second embedding freshness. Stream processing.	Simple batch sync. Teams without stream processing expertise.
Azure Functions SQL Trigger	CDC-based row change events delivered to a serverless function.	Low (on DB)	Event-driven embedding refresh. Serverless, auto-scales. No polling overhead on DB.	When you need before-values (gets only new values). Long-running processing per row.
AFTER Trigger (SQL)	DML operation. Synchronous — fires within the transaction.	Medium (blocks write)	Simple audit logging. Small tables with infrequent writes. Cascading updates.	Calling external APIs (embedding) — blocks write transaction, risk of failure.
Azure Logic Apps	Connector-based polling or event reception from SQL/storage.	Medium	Low-code integration. Connecting SQL to non-technical downstream systems.	High-volume, low-latency scenarios. Cost-sensitive high-frequency polling.

📊 Index Type Selection Guide

Scenario	Index Type	Key Characteristics
OLTP: point lookups, frequent single-row reads by PK	Clustered Index (B-tree)	Row store. Fast single row / range scans. One per table. Defines physical row order.
Support queries that filter on non-PK columns, avoid key lookups	Non-Clustered Index + INCLUDE	Leaf level includes extra columns. Eliminates key lookup. Multiple per table.
Analytics: aggregations, GROUP BY, large table scans	Clustered Columnstore Index (CCI)	Column storage. Compression. Vectorized processing. Replaces row store. ~10x read perf for analytics.
Mixed OLTP + Analytics on same table	Non-Clustered Columnstore Index (NCCI)	Added alongside clustered row-store index. Analytics benefit without losing row-store OLTP perf.
Partitioned table — enforce partition elimination	Aligned Index (same partition scheme)	Index uses same partition function. Optimizer can skip irrelevant partitions.
Semantic vector similarity search — production scale (millions)	DiskANN Vector Index	ANN graph index. O(log n). Used by VECTOR_SEARCH(). Supports cosine/euclidean/dot_product.
Full-text keyword search with stemming/proximity	Full-Text Index	Requires full-text catalog. Linguistic processing. CONTAINS/FREETEXT predicates.
View needs to return aggregated data fast without recomputing	Indexed View (Clustered Index on View)	Materializes the view result. SCHEMABINDING required. Auto-maintained on DML. Enterprise: auto-used; Standard: NOEXPAND hint.

🗄️ Specialized Table Type Decision Guide

D1 Temporal Table

Use when: You need to query historical state (point-in-time, period-of-time)
Key clause: FOR SYSTEM_TIME AS OF / BETWEEN / ALL
Auto-managed: History table maintained by SQL Server
Don't use for: Compliance tamper-evidence (use Ledger for that)
Exam tip: SYSTEM_TIME period columns cannot be nullable

D1 In-Memory OLTP

Use when: Very high-throughput OLTP, latch contention, lock bottlenecks
Mechanism: Version-based concurrency — no locks on reads or writes
Bonus: Natively compiled SPs for extra performance
Don't use for: Large tables (memory-bound), analytics, complex joins
Exam tip: Does NOT support CDC — use Change Tracking instead

D1 Ledger Table

Use when: Compliance requires proof data was not tampered with
Two types: Append-only (no UPDATE/DELETE) vs Updatable (history maintained)
Mechanism: SHA-256 hash chain across all committed transactions
Exam tip: Temporal ≠ Ledger. Temporal = time-travel. Ledger = tamper-proof.

D1 Graph Tables

Use when: Modeling entity relationships, social graphs, recommendations
NODE table: Entities (Person, Product). Has $node_id.
EDGE table: Relationships (Follows, Purchased). Has $from_id, $to_id.
Query: MATCH clause for pattern traversal. SHORTEST_PATH for paths.
Exam tip: MATCH is in WHERE or FROM — not a function call

D1 External Tables

Use when: Query data in blob/ADLS/another DB without copying it
Fabric: External tables over Fabric Lakehouse files
Push-down: Some predicates pushed to source for filtering
Exam tip: Write support limited. Not for hot-path OLTP.

ALL Partitioned Tables

Use when: Very large tables needing partition elimination or archive switching
Partition function: Defines boundary values and LEFT/RIGHT assignment
Partition scheme: Maps function ranges to filegroups
Elimination: WHERE on partition key → only relevant partitions scanned
Exam tip: Index must use same partition scheme to be "aligned"

🤖 RAG Architecture Decision Guide

D3 Embedding Model Choice

text-embedding-3-small (1536 dims): Good quality, low cost. Most scenarios.
text-embedding-3-large (3072 dims): Higher quality. More storage/compute.
Multilingual-e5: Cross-lingual embeddings. When users query in other languages.
Multimodal: When content includes images alongside text (e.g., product images + description)
Exam tip: Match embedding model at query time to the one used at index time

D3 Chunking Strategy

Fixed-size (100–512 tokens): Simple, fast. Use 10-15% overlap. First choice.
Sentence: NLP boundary detection. Better semantic units. Moderate complexity.
Semantic: Topic-based splits. Best embedding quality. Most complex.
Whole document: Only for very short documents (<200 tokens)
Exam tip: Overlap prevents context loss at chunk boundaries — always mention it

D3 Embedding Maintenance

Trigger (sync): Simple. Blocks write. Only for small tables with rare writes.
Change Tracking + batch: Async, low overhead. Minutes of lag. Good for large tables.
CDC + Azure Functions: Near real-time. Serverless scale. Recommended
CES: Sub-second. Complex. Only when latency < 1s is a requirement.
Exam tip: Az Functions SQL trigger = async, no write-path overhead

D3 sp_invoke_external_rest_endpoint

Purpose: Call any REST API from T-SQL — OpenAI, Azure AI, custom APIs
@credential: DATABASE SCOPED CREDENTIAL (never inline API key)
@payload: JSON body, typically built with FOR JSON PATH
@response OUTPUT: JSON response, parsed with JSON_VALUE / OPENJSON
Exam tip: Core primitive for both embedding generation AND LLM chat completion in RAG

ALL RAG vs Fine-Tuning

RAG: Retrieves facts at query time. No model retraining. Data stays current. No hallucination on retrieved facts.
Fine-Tuning: Bakes knowledge into model weights. Data becomes stale. Can't update without retraining. Use for style/format, not knowledge.
Exam tip: For DP-800, always prefer RAG for enterprise SQL data scenarios. Fine-tuning is out of scope.

D2 DAB vs Direct SQL + sp_invoke

DAB: Declarative REST/GraphQL over SQL objects. No custom code. Config-driven. Best for standard CRUD + search APIs.
Stored Proc + sp_invoke: Custom logic, RAG pipelines, complex orchestration, embedding generation. Use when DAB is too simple.
Exam tip: DAB exposes SPs too — use stored-procedure source type for custom logic via DAB endpoint.