DP-800 ยท SQL AI Developer Associate
Service & Feature Decision Guide
๐Ÿ”’ Security Decisions
๐Ÿ” Search Strategy
๐Ÿ“ก Change Detection
๐Ÿ“Š Index Selection
๐Ÿ—„๏ธ Table Type Selection
๐Ÿค– RAG Architecture
๐Ÿ”’ Security Feature Selection
Scenario / RequirementCorrect FeatureWhyCommon Trap
Users must only see their own rowsRow-Level Security (RLS) โ€” FILTER predicateTransparent to app. Policy enforced in DB regardless of query source.Using WHERE in every query โ€” dev can forget it.
Show masked SSN to most users, real value to adminsDynamic Data Masking (DDM)db_owner and UNMASK-privileged users see real data. Others see mask.DDM is NOT encryption โ€” app-tier only. Privileged queries bypass it.
Encrypt a column. App needs to search/join on it (equality).Always Encrypted โ€” DETERMINISTICSame plaintext โ†’ same ciphertext โ†’ equality comparisons work.Choosing RANDOMIZED blocks any filtering on the column.
Encrypt highly sensitive column. No filtering needed.Always Encrypted โ€” RANDOMIZEDEach encryption unique โ€” no pattern leakage. Most secure.Using DETERMINISTIC unnecessarily exposes patterns.
Azure resource must connect to Azure SQL โ€” no passwords in codeManaged Identity + Azure AD tokenNo secret to rotate/leak. Integrates with Azure RBAC.Storing connection string in app config or pipeline YAML.
Prevent user UPDATE from changing row to one they can't seeRLS BEFORE UPDATE BLOCK predicateChecks the new values against the predicate before allowing UPDATE.Using FILTER only โ€” doesn't prevent exfiltration via UPDATE.
Store API key for external REST calls in T-SQLDATABASE SCOPED CREDENTIALEncrypted in DB, referenced by name. Never appears in plain T-SQL.Hardcoding in stored procedure or T-SQL variable.
Secure DAB GraphQL/REST endpoint in productionManaged Identity + Azure AD JWT validationRoles map to permissions in dab-config.json. No key management.Using anonymous role for non-public data; API key in URL params.
๐Ÿ“ก Change Detection Method Selection
MethodWhat It CapturesOverheadBest ForNot For
Change TrackingRow identity + change type (INSERT/UPDATE/DELETE) since a sync version. No column values.Very LowSync scenarios. Lightweight embedding trigger where you re-read the row.When you need before/after column values without re-querying.
CDC (Change Data Capture)Before AND after column values for each change, from transaction log.MediumETL pipelines. Embedding refresh when you need exact new column values. Audit with history.Very high-write tables (log read overhead). Doesn't work with memory-optimized tables.
CES (Change Event Streaming)Real-time event stream of changes as they happen.HigherEvent-driven AI pipelines. Sub-second embedding freshness. Stream processing.Simple batch sync. Teams without stream processing expertise.
Azure Functions SQL TriggerCDC-based row change events delivered to a serverless function.Low (on DB)Event-driven embedding refresh. Serverless, auto-scales. No polling overhead on DB.When you need before-values (gets only new values). Long-running processing per row.
AFTER Trigger (SQL)DML operation. Synchronous โ€” fires within the transaction.Medium (blocks write)Simple audit logging. Small tables with infrequent writes. Cascading updates.Calling external APIs (embedding) โ€” blocks write transaction, risk of failure.
Azure Logic AppsConnector-based polling or event reception from SQL/storage.MediumLow-code integration. Connecting SQL to non-technical downstream systems.High-volume, low-latency scenarios. Cost-sensitive high-frequency polling.
๐Ÿ“Š Index Type Selection Guide
ScenarioIndex TypeKey Characteristics
OLTP: point lookups, frequent single-row reads by PKClustered Index (B-tree)Row store. Fast single row / range scans. One per table. Defines physical row order.
Support queries that filter on non-PK columns, avoid key lookupsNon-Clustered Index + INCLUDELeaf level includes extra columns. Eliminates key lookup. Multiple per table.
Analytics: aggregations, GROUP BY, large table scansClustered Columnstore Index (CCI)Column storage. Compression. Vectorized processing. Replaces row store. ~10x read perf for analytics.
Mixed OLTP + Analytics on same tableNon-Clustered Columnstore Index (NCCI)Added alongside clustered row-store index. Analytics benefit without losing row-store OLTP perf.
Partitioned table โ€” enforce partition eliminationAligned Index (same partition scheme)Index uses same partition function. Optimizer can skip irrelevant partitions.
Semantic vector similarity search โ€” production scale (millions)DiskANN Vector IndexANN graph index. O(log n). Used by VECTOR_SEARCH(). Supports cosine/euclidean/dot_product.
Full-text keyword search with stemming/proximityFull-Text IndexRequires full-text catalog. Linguistic processing. CONTAINS/FREETEXT predicates.
View needs to return aggregated data fast without recomputingIndexed View (Clustered Index on View)Materializes the view result. SCHEMABINDING required. Auto-maintained on DML. Enterprise: auto-used; Standard: NOEXPAND hint.
๐Ÿ—„๏ธ Specialized Table Type Decision Guide
D1 Temporal Table
  • Use when: You need to query historical state (point-in-time, period-of-time)
  • Key clause: FOR SYSTEM_TIME AS OF / BETWEEN / ALL
  • Auto-managed: History table maintained by SQL Server
  • Don't use for: Compliance tamper-evidence (use Ledger for that)
  • Exam tip: SYSTEM_TIME period columns cannot be nullable
D1 In-Memory OLTP
  • Use when: Very high-throughput OLTP, latch contention, lock bottlenecks
  • Mechanism: Version-based concurrency โ€” no locks on reads or writes
  • Bonus: Natively compiled SPs for extra performance
  • Don't use for: Large tables (memory-bound), analytics, complex joins
  • Exam tip: Does NOT support CDC โ€” use Change Tracking instead
D1 Ledger Table
  • Use when: Compliance requires proof data was not tampered with
  • Two types: Append-only (no UPDATE/DELETE) vs Updatable (history maintained)
  • Mechanism: SHA-256 hash chain across all committed transactions
  • Exam tip: Temporal โ‰  Ledger. Temporal = time-travel. Ledger = tamper-proof.
D1 Graph Tables
  • Use when: Modeling entity relationships, social graphs, recommendations
  • NODE table: Entities (Person, Product). Has $node_id.
  • EDGE table: Relationships (Follows, Purchased). Has $from_id, $to_id.
  • Query: MATCH clause for pattern traversal. SHORTEST_PATH for paths.
  • Exam tip: MATCH is in WHERE or FROM โ€” not a function call
D1 External Tables
  • Use when: Query data in blob/ADLS/another DB without copying it
  • Fabric: External tables over Fabric Lakehouse files
  • Push-down: Some predicates pushed to source for filtering
  • Exam tip: Write support limited. Not for hot-path OLTP.
ALL Partitioned Tables
  • Use when: Very large tables needing partition elimination or archive switching
  • Partition function: Defines boundary values and LEFT/RIGHT assignment
  • Partition scheme: Maps function ranges to filegroups
  • Elimination: WHERE on partition key โ†’ only relevant partitions scanned
  • Exam tip: Index must use same partition scheme to be "aligned"
๐Ÿค– RAG Architecture Decision Guide
D3 Embedding Model Choice
  • text-embedding-3-small (1536 dims): Good quality, low cost. Most scenarios.
  • text-embedding-3-large (3072 dims): Higher quality. More storage/compute.
  • Multilingual-e5: Cross-lingual embeddings. When users query in other languages.
  • Multimodal: When content includes images alongside text (e.g., product images + description)
  • Exam tip: Match embedding model at query time to the one used at index time
D3 Chunking Strategy
  • Fixed-size (100โ€“512 tokens): Simple, fast. Use 10-15% overlap. First choice.
  • Sentence: NLP boundary detection. Better semantic units. Moderate complexity.
  • Semantic: Topic-based splits. Best embedding quality. Most complex.
  • Whole document: Only for very short documents (<200 tokens)
  • Exam tip: Overlap prevents context loss at chunk boundaries โ€” always mention it
D3 Embedding Maintenance
  • Trigger (sync): Simple. Blocks write. Only for small tables with rare writes.
  • Change Tracking + batch: Async, low overhead. Minutes of lag. Good for large tables.
  • CDC + Azure Functions: Near real-time. Serverless scale. Recommended
  • CES: Sub-second. Complex. Only when latency < 1s is a requirement.
  • Exam tip: Az Functions SQL trigger = async, no write-path overhead
D3 sp_invoke_external_rest_endpoint
  • Purpose: Call any REST API from T-SQL โ€” OpenAI, Azure AI, custom APIs
  • @credential: DATABASE SCOPED CREDENTIAL (never inline API key)
  • @payload: JSON body, typically built with FOR JSON PATH
  • @response OUTPUT: JSON response, parsed with JSON_VALUE / OPENJSON
  • Exam tip: Core primitive for both embedding generation AND LLM chat completion in RAG
ALL RAG vs Fine-Tuning
  • RAG: Retrieves facts at query time. No model retraining. Data stays current. No hallucination on retrieved facts.
  • Fine-Tuning: Bakes knowledge into model weights. Data becomes stale. Can't update without retraining. Use for style/format, not knowledge.
  • Exam tip: For DP-800, always prefer RAG for enterprise SQL data scenarios. Fine-tuning is out of scope.
D2 DAB vs Direct SQL + sp_invoke
  • DAB: Declarative REST/GraphQL over SQL objects. No custom code. Config-driven. Best for standard CRUD + search APIs.
  • Stored Proc + sp_invoke: Custom logic, RAG pipelines, complex orchestration, embedding generation. Use when DAB is too simple.
  • Exam tip: DAB exposes SPs too โ€” use stored-procedure source type for custom logic via DAB endpoint.