Domain 3: Understand GitHub Copilot Data and Architecture (10โ15%) โ
โ Domain 2 ยท Next Domain โ
Exam Tip
This domain tests your understanding of what happens behind the scenes when you type code and Copilot responds. Focus on the data flow, where filtering happens, and the key limitations of LLMs.
Data Usage, Flow, and Sharing โ
What Data Copilot Uses โ
When you trigger a suggestion, Copilot sends a prompt to the model. The prompt includes:
- Your current file content โ surrounding code, imports, comments
- Open file tabs โ files currently open in the IDE contribute context
- Cursor position โ Copilot sees what's before and after the cursor (prefix + suffix)
- File type โ determines which language model and behavior to apply
- Recent edits โ recently changed code has higher weight in context
What Microsoft Does (and Doesn't) Do With Your Data โ
| Claim | Reality |
|---|---|
| Microsoft trains public models on your code | โ False โ Copilot Business/Enterprise: your code is NOT used for training |
| Copilot Individual may use prompts for product improvement | โ Depends on user settings โ can opt out |
| Data is transmitted to GitHub/Azure servers | โ Prompts are sent to Copilot proxy for inference |
| Suggestions are logged | โ Accepted suggestions may be stored temporarily for filtering/safety |
Critical
For Business and Enterprise plans: Microsoft does NOT use your organization's code to train the foundational Copilot model. This is a common exam distractor.
Input Processing and Prompt Building โ
Prompt Construction โ
When you trigger Copilot, the system builds a prompt automatically:
- Gather context: Scans open files, cursor position, and recent edits
- Truncate: LLMs have context window limits โ context is truncated to fit
- Assemble: The prompt is assembled as
[context before cursor] + [cursor marker] + [context after cursor] - Send: The assembled prompt is sent to the Copilot proxy
Context Window Limits โ
- Every LLM has a maximum token capacity (input + output)
- When your codebase is large, Copilot prioritizes the most relevant context (current file, recently opened files)
- Code far from the cursor or in closed tabs may not be included
Proxy Filtering and Post-Processing โ
Copilot doesn't send your code directly to the LLM โ it goes through a Copilot proxy managed by GitHub.
What the Proxy Does โ
- Pre-filtering: Strips sensitive information if content exclusions are configured
- Routes the request: Sends the sanitized prompt to the appropriate model
- Post-filtering: Receives the model's response and applies:
- Duplication detection: Filters suggestions that match public repos
- Security warnings: Flags patterns matching known vulnerabilities
- Content safety: Filters harmful or inappropriate output
- Returns the suggestion to your IDE
Code Suggestion Lifecycle โ
User types code in IDE
โ
IDE extension captures context (prefix, suffix, open files)
โ
Context is assembled into a prompt
โ
Prompt sent to Copilot proxy (HTTPS)
โ
Proxy pre-filters (content exclusions applied)
โ
Prompt sent to LLM (GPT-4 / Copilot model)
โ
LLM returns completion
โ
Proxy post-filters (duplication detection, security warnings)
โ
Suggestion displayed in IDE (ghost text)
โ
User accepts or dismissesLLM Limitations (What Copilot Can't Do) โ
| Limitation | Implication |
|---|---|
| No real-time knowledge | Copilot doesn't know about APIs released after its training cutoff |
| Probabilistic, not deterministic | Same prompt can produce different suggestions โ output is never guaranteed |
| No execution environment | Copilot doesn't run or test the code it generates (unless in Agent Mode with terminal access) |
| Context window cap | Very large files or projects may exceed context limits, reducing quality |
| Language support varies | Best performance in popular languages (Python, JS, TypeScript, Java, C#); weaker in niche languages |
| Hallucination risk | Especially for obscure APIs, uncommon libraries, or complex logic |
Domain 3 Quick Quiz
1 / 4
โ
Does Microsoft train Copilot models on Copilot Business customers' code?
(Click to reveal)๐ก
No. For Business and Enterprise plans, your code is not used to train the foundational model. Only Individual users may have their data used for improvement (opt-out available).