Domain 3: Understand GitHub Copilot Data and Architecture (10–15%)

Exam Tip

This domain tests your understanding of what happens behind the scenes when you type code and Copilot responds. Focus on the data flow, where filtering happens, and the key limitations of LLMs.

What Data Copilot Uses

When you trigger a suggestion, Copilot sends a prompt to the model. The prompt includes:

Your current file content — surrounding code, imports, comments
Open file tabs — files currently open in the IDE contribute context
Cursor position — Copilot sees what's before and after the cursor (prefix + suffix)
File type — determines which language model and behavior to apply
Recent edits — recently changed code has higher weight in context

What Microsoft Does (and Doesn't) Do With Your Data

Claim	Reality
Microsoft trains public models on your code	❌ False — Copilot Business/Enterprise: your code is NOT used for training
Copilot Individual may use prompts for product improvement	✅ Depends on user settings — can opt out
Data is transmitted to GitHub/Azure servers	✅ Prompts are sent to Copilot proxy for inference
Suggestions are logged	✅ Accepted suggestions may be stored temporarily for filtering/safety

Critical

For Business and Enterprise plans: Microsoft does NOT use your organization's code to train the foundational Copilot model. This is a common exam distractor.

Input Processing and Prompt Building

Prompt Construction

When you trigger Copilot, the system builds a prompt automatically:

Gather context: Scans open files, cursor position, and recent edits
Truncate: LLMs have context window limits — context is truncated to fit
Assemble: The prompt is assembled as [context before cursor] + [cursor marker] + [context after cursor]
Send: The assembled prompt is sent to the Copilot proxy

Context Window Limits

Every LLM has a maximum token capacity (input + output)
When your codebase is large, Copilot prioritizes the most relevant context (current file, recently opened files)
Code far from the cursor or in closed tabs may not be included

Proxy Filtering and Post-Processing

Copilot doesn't send your code directly to the LLM — it goes through a Copilot proxy managed by GitHub.

What the Proxy Does

Pre-filtering: Strips sensitive information if content exclusions are configured
Routes the request: Sends the sanitized prompt to the appropriate model
Post-filtering: Receives the model's response and applies:
- Duplication detection: Filters suggestions that match public repos
- Security warnings: Flags patterns matching known vulnerabilities
- Content safety: Filters harmful or inappropriate output
Returns the suggestion to your IDE

Code Suggestion Lifecycle

User types code in IDE
        ↓
IDE extension captures context (prefix, suffix, open files)
        ↓
Context is assembled into a prompt
        ↓
Prompt sent to Copilot proxy (HTTPS)
        ↓
Proxy pre-filters (content exclusions applied)
        ↓
Prompt sent to LLM (GPT-4 / Copilot model)
        ↓
LLM returns completion
        ↓
Proxy post-filters (duplication detection, security warnings)
        ↓
Suggestion displayed in IDE (ghost text)
        ↓
User accepts or dismisses

LLM Limitations (What Copilot Can't Do)

Limitation	Implication
No real-time knowledge	Copilot doesn't know about APIs released after its training cutoff
Probabilistic, not deterministic	Same prompt can produce different suggestions — output is never guaranteed
No execution environment	Copilot doesn't run or test the code it generates (unless in Agent Mode with terminal access)
Context window cap	Very large files or projects may exceed context limits, reducing quality
Language support varies	Best performance in popular languages (Python, JS, TypeScript, Java, C#); weaker in niche languages
Hallucination risk	Especially for obscure APIs, uncommon libraries, or complex logic

Domain 3 Quick Quiz

1 / 4

❓

Does Microsoft train Copilot models on Copilot Business customers' code?

(Click to reveal)

💡

No. For Business and Enterprise plans, your code is not used to train the foundational model. Only Individual users may have their data used for improvement (opt-out available).

← Domain 2 · Next Domain →

Domain 3: Understand GitHub Copilot Data and Architecture (10–15%) ​

Data Usage, Flow, and Sharing ​

What Data Copilot Uses ​

What Microsoft Does (and Doesn't) Do With Your Data ​

Input Processing and Prompt Building ​

Prompt Construction ​

Context Window Limits ​

Proxy Filtering and Post-Processing ​

What the Proxy Does ​

Code Suggestion Lifecycle ​

LLM Limitations (What Copilot Can't Do) ​