DP-800 · 5-Minute Cheat Sheet

📝 D1 · T-SQL Power Features

Feature	Key Syntax / Pattern	When
CTE	WITH cte AS (SELECT …) SELECT …	Recursive hierarchies, readability
Window	ROW_NUMBER() OVER (PARTITION BY … ORDER BY …)	Ranking, running totals, lag/lead
JSON	JSON_VALUE / OPENJSON / FOR JSON PATH	Semi-structured data, LLM payloads
REGEXP	REGEXP_LIKE(col, pattern)	Pattern matching (Fabric SQL)
Fuzzy	EDIT_DISTANCE(a, b)	Typo tolerance, similarity search
Graph	MATCH (node1-(edge)->node2)	Social, recommendation graphs
Error	BEGIN TRY … END TRY BEGIN CATCH THROW END CATCH	Stored procedures, RAG callers

🗄️ D1 · Specialized Table Types

Temporal TableSYSTEM_TIME history. FOR SYSTEM_TIME AS OF '2024-01-01'. Point-in-time queries. Separate history table auto-managed.

In-Memory OLTPMEMORY_OPTIMIZED=ON. No locks — version-based concurrency. Natively compiled SPs. Best for high-throughput OLTP.

Columnstore IndexClustered (replaces row store) vs Non-clustered. Delta store → compressed. Best for analytics / aggregations on large tables.

Ledger TableAppend-only or updatable. Cryptographic hash chain. Tamper-evident audit trail.

Graph TableNODE table (entities) + EDGE table (relationships). MATCH clause for pattern traversal. SHORTEST_PATH for path queries.

External TablePolyBase / Fabric. Reads from blob, ADLS, or other DB. No data moved. Push-down filtering.

⚙️ D1 · Programmability Objects

Object	Key Facts
View	WITH SCHEMABINDING for indexed views. Updatable if single-table, no aggregates. NOT WITH CHECK OPTION trap.
Inline TVF	Returns table, single SELECT. Optimised like a view — prefer over multi-statement TVF.
Multi-stmt TVF	Uses table variable. Optimizer can't see row count — cardinality issues. Avoid if performance matters.
Scalar UDF	Inhibits parallelism. Use inline TVF or computed column instead where possible.
Stored Proc	Input/output params. sp_executesql for parameterized dynamic SQL. Core RAG executor.
AFTER Trigger	Fires after DML. Uses inserted/deleted pseudo-tables. Good for audit, embedding update.
INSTEAD OF	Replaces DML. Needed for views. Risky — easy to break insert logic.
SEQUENCE	NEXT VALUE FOR seq. Can gap-skip on rollback. Useful across tables (unlike IDENTITY).

🔒 D2 · Security Quick Reference

Feature	Syntax / Config	Notes
RLS	CREATE SECURITY POLICY FILTER PREDICATE fn(col)	FILTER = SELECT. BLOCK = DML. Inline TVF predicate is mandatory.
DDM	MASKED WITH (FUNCTION='default()')	DEFAULT/EMAIL/PARTIAL/CUSTOM. UNMASK permission bypasses. Not encryption — app-layer only.
Always Encrypted	CMK (stored in Key Vault) → CEK → Column	Deterministic = can filter/join. Randomized = more secure, no search. Driver-side encryption.
Column Encryption	ENCRYPTBYKEY / DECRYPTBYKEY	Symmetric key. Server-side. Weaker than Always Encrypted.
RLS Predicate	CREATE FUNCTION … RETURNS TABLE WITH (NOEXPAND)	Must use SCHEMABINDING. One policy per table max.
Passwordless	Managed Identity → Azure AD token	No password in connection string. System-assigned vs user-assigned MI.
Auditing	Server-level > DB-level. Destinations: Blob, Log Analytics, Event Hub.	Action groups: SUCCESSFUL_DATABASE_AUTHENTICATION_GROUP etc.

⚡ D2 · Query Performance Toolkit

Execution Plan Red FlagsKey Lookup → add covering index. Sort operator → add ORDER BY to index. Clustered Index Scan on large table → check WHERE clause selectivity. Nested Loops on large sets → index or join hint.

Query Store WorkflowRegressed Queries → find plan change → Force Plan. Top Resource Consumers → spot recurring expensive queries. Query Performance Insight (Azure) → cloud UI overlay.

Top DMVssys.dm_exec_query_stats → CPU/reads per query
sys.dm_exec_requests → active sessions
sys.dm_os_wait_stats → bottleneck type
sys.dm_exec_query_plan → XML plan

Isolation Levels (low → high lock)READ UNCOMMITTED → READ COMMITTED → REPEATABLE READ → SERIALIZABLE → SNAPSHOT (optimistic, no locks on reads)

Blocking / DeadlockDeadlock = circular wait. Graph in Event Session / Extended Events. Resolution: retry logic, lock timeout, index to reduce hold time. RCSI eliminates most read blocks.

🔄 D2 · SQL DB Projects & CI/CD

.sqlproj (SDK-style)MSBuild project. Declares SQL objects. No scripts — declarative model.

dacpacCompiled artifact. Contains schema snapshot. Input to SqlPackage.

publish profile (.publish.xml)Target server, auth, deployment options (IgnorePermissions etc.)

Schema DriftSqlPackage /Action:Report compares dacpac → target. Returns drift XML.

tSQLtUnit test framework. FakeTable, AssertEquals, ExpectException. Runs in SQL.

🔁 Pipeline Stages

PR → Branch Policy

→

Build dacpac

→

Unit Tests (tSQLt)

→

Schema Drift Report

→

Deploy to Staging

→

Approval Gate

→

Prod

Secrets ManagementAzure Key Vault → pipeline variable injection. Never hardcode in .sqlproj.

Code OwnersCODEOWNERS file. Protects main/prod branches. Enforces PR review by owner.

🌐 D2 · Data API Builder (DAB)

// dab-config.json skeleton
{
  "data-source": { "database-type": "mssql", "connection-string": "@env('CONN')" },
  "entities": {
    "Product": {
      "source": "dbo.Products",
      "permissions": [{"role":"authenticated","actions":["read"]}],
      "rest": { "enabled": true, "path": "/products" },
      "graphql": { "enabled": true, "type": "Product" }
    }
  }
}

Config Key	Purpose
rest.path	URL path override. Default = entity name.
graphql.type	GraphQL type name. Enables relationships.
permissions.role	anonymous \| authenticated \| custom role
source.type = stored-procedure	Expose SP as REST/GraphQL endpoint
cache.enabled = true	In-memory cache, TTL configurable
pagination.$limit	Max page size. Use $after cursor for keyset.

📡 D2 · Change Detection & Integration

Method	Granularity	Best For	Overhead
Change Tracking	Row (changed/deleted flag)	Sync, lightweight embedding triggers	Low
CDC	Row + before/after column values	Audit, ETL, embedding pipelines	Medium
CES	Real-time event stream	Streaming analytics, event-driven AI	Higher
Az Functions SQL Trigger	Row-level, event-driven	Serverless embedding refresh on change	Low
Logic Apps	Connector-based polling	Low-code integration pipelines	Medium

📊 Azure Monitor Targets

Application InsightsDAB & app-layer telemetry. Request traces, dependency calls, exceptions.

Log Analytics WorkspaceSQL audit logs, diagnostics. KQL queries across resources.

SQL InsightsDMV-based monitoring. CPU, blocking, wait stats time series.

🧮 D3 · Vector Data & Embeddings

-- Declare vector column (up to 1998 dims)
CREATE TABLE docs (
  id INT, content NVARCHAR(MAX),
  embedding VECTOR(1536)   -- text-embedding-3-small dims
);
-- Cosine similarity search
SELECT TOP 5 id, content,
  VECTOR_DISTANCE('cosine', embedding, @queryVec) AS dist
FROM docs ORDER BY dist;

Function	Purpose
VECTOR_DISTANCE(metric, v1, v2)	cosine \| dot_product \| euclidean
VECTOR_NORMALIZE(v)	Unit vector — required before dot product
VECTORPROPERTY(col, 'Dimensions')	Returns dimension count of a VECTOR column
VECTOR_SEARCH()	ANN top-K via index (DiskANN). Fast at scale.

ANN (Approximate)

Uses vector index (DiskANN)

Fast — O(log n)

May miss some matches

Production scale

ENN (Exact)

Full table scan

Slow — O(n)

100% recall

Small datasets / testing

🔍 D3 · Intelligent Search Strategies

Type	Mechanism	Use When
Full-Text	BM25 TF-IDF scoring. CONTAINS / FREETEXT / FREETEXTTABLE. Requires full-text index.	Keyword search, exact phrase matching, stemming
Semantic Vector	VECTOR_DISTANCE on embeddings. Meaning-based retrieval.	Conceptual match, multilingual, paraphrase detection
Hybrid Search	Full-text score + vector score merged via RRF.	Best of both — production RAG retrieval

🎯 Reciprocal Rank Fusion (RRF)

-- RRF score formula
score = 1 / (k + rank_in_list)   -- k=60 (constant to smooth outliers)
-- Combine: full_text_score + vector_score
-- ORDER BY combined_rrf_score DESC

k=60 prevents top-rank results from dominating. Higher k = more uniform distribution.

CONTAINS(col, 'term')Exact / inflectional / thesaurus / proximity

FREETEXT(col, 'phrase')NLP tokenization — more forgiving than CONTAINS

FREETEXTTABLEReturns RANK score — joinable for RRF hybrid

🤖 D3 · External Models & Embedding Maintenance

-- Create external model (Fabric SQL)
CREATE EXTERNAL MODEL myEmbedModel
  WITH (LOCATION = 'https://…/openai',
        CREDENTIAL = oaiCred,
        TASK = 'EMBEDDINGS',
        MODEL = 'text-embedding-3-small');

Maintenance Method	Trade-off
AFTER INSERT/UPDATE Trigger	Synchronous — blocks write. Simple. Risk: trigger failure = write failure.
Change Tracking + batch job	Asynchronous — no write overhead. Slight lag. Good for bulk tables.
CDC + Azure Function	Event-driven. Near real-time. Serverless scale. Recommended for prod.
CES	Streaming. Lowest latency. Most complex to manage.

🗂️ Chunking Strategies

Fixed-sizeSimple. Risk: breaks mid-sentence. Use overlap (10–15%).

SentenceNLP sentence boundaries. Better semantic units.

SemanticTopic-based splits. Best quality. Most complex.

Include which cols?Content cols + context cols (category, date). Exclude IDs, foreign keys.

⚡ D3 · RAG Pattern in SQL

User Query

→

Embed Query

→

Hybrid Search

→

RRF Rank

→

Build Prompt (FOR JSON)

→

sp_invoke LLM

→

OPENJSON Response

-- Core RAG stored procedure pattern
EXEC sp_invoke_external_rest_endpoint
  @url = N'https://<openai>/chat/completions',
  @method = N'POST',
  @credential = [myCredential],
  @payload = @jsonPayload,   -- built via FOR JSON PATH
  @response = @result OUTPUT;

-- Extract answer from response
SELECT JSON_VALUE(@result, '$.result.choices[0].message.content');

FOR JSON PATHConverts SQL rows → JSON context for LLM prompt

FOR JSON AUTOAuto-infers structure — less control, quick prototype

OPENJSON(@resp,'$.choices')Parse LLM response array back to rows

@credentialDATABASE SCOPED CREDENTIAL — stores API key securely

🎯 Scenario Speed-Decide · Exam Pattern Recognition

Security Scenarios

"Users see only their own rows"→ Row-Level Security (RLS) with FILTER predicate

"Hide last 4 digits of SSN"→ Dynamic Data Masking, PARTIAL function

"Encrypt SSN — app must query it"→ Always Encrypted DETERMINISTIC (allows equality filter)

"Encrypt but no search needed"→ Always Encrypted RANDOMIZED (stronger)

"No passwords in code"→ Managed Identity + Azure AD authentication

Performance Scenarios

"Slow aggregation on 500M rows"→ Add clustered columnstore index

"Query plan regressed after stats update"→ Query Store → Force previous plan

"Readers blocking writers"→ Enable RCSI (READ_COMMITTED_SNAPSHOT)

"Need point-in-time data audit"→ Temporal table + FOR SYSTEM_TIME AS OF

"Deadlock on two tables"→ Enforce consistent lock order + add index

AI / Search Scenarios

"Find semantically similar documents"→ VECTOR_DISTANCE cosine on embedding column

"Best search across keyword + meaning"→ Hybrid search + RRF ranking

"Generate answer from DB content"→ RAG: hybrid search → FOR JSON → sp_invoke → OPENJSON

"Embeddings must stay current"→ Azure Function SQL trigger (CDC-based, async)

"100% recall needed on small set"→ ENN (exact). Large set → ANN + DiskANN index

Deployment / Integration

"Expose table as REST + GraphQL"→ Data API Builder (DAB) with entity config

"DB schema out of sync with code"→ SQL DB Projects + SqlPackage schema drift report

"Real-time row change triggers AI update"→ Azure Functions SQL trigger binding

"Need Copilot to understand SQL conventions"→ .github/copilot-instructions.md instruction file

"GraphQL endpoint must use Managed Identity"→ DAB + Managed Identity auth, no API key