Skip to content

Domain 3: Understand GitHub Copilot Data and Architecture (10โ€“15%) โ€‹

โ† Domain 2 ยท Next Domain โ†’

Exam Tip

This domain tests your understanding of what happens behind the scenes when you type code and Copilot responds. Focus on the data flow, where filtering happens, and the key limitations of LLMs.


Data Usage, Flow, and Sharing โ€‹

What Data Copilot Uses โ€‹

When you trigger a suggestion, Copilot sends a prompt to the model. The prompt includes:

  1. Your current file content โ€” surrounding code, imports, comments
  2. Open file tabs โ€” files currently open in the IDE contribute context
  3. Cursor position โ€” Copilot sees what's before and after the cursor (prefix + suffix)
  4. File type โ€” determines which language model and behavior to apply
  5. Recent edits โ€” recently changed code has higher weight in context

What Microsoft Does (and Doesn't) Do With Your Data โ€‹

ClaimReality
Microsoft trains public models on your codeโŒ False โ€” Copilot Business/Enterprise: your code is NOT used for training
Copilot Individual may use prompts for product improvementโœ… Depends on user settings โ€” can opt out
Data is transmitted to GitHub/Azure serversโœ… Prompts are sent to Copilot proxy for inference
Suggestions are loggedโœ… Accepted suggestions may be stored temporarily for filtering/safety

Critical

For Business and Enterprise plans: Microsoft does NOT use your organization's code to train the foundational Copilot model. This is a common exam distractor.


Input Processing and Prompt Building โ€‹

Prompt Construction โ€‹

When you trigger Copilot, the system builds a prompt automatically:

  1. Gather context: Scans open files, cursor position, and recent edits
  2. Truncate: LLMs have context window limits โ€” context is truncated to fit
  3. Assemble: The prompt is assembled as [context before cursor] + [cursor marker] + [context after cursor]
  4. Send: The assembled prompt is sent to the Copilot proxy

Context Window Limits โ€‹

  • Every LLM has a maximum token capacity (input + output)
  • When your codebase is large, Copilot prioritizes the most relevant context (current file, recently opened files)
  • Code far from the cursor or in closed tabs may not be included

Proxy Filtering and Post-Processing โ€‹

Copilot doesn't send your code directly to the LLM โ€” it goes through a Copilot proxy managed by GitHub.

What the Proxy Does โ€‹

  1. Pre-filtering: Strips sensitive information if content exclusions are configured
  2. Routes the request: Sends the sanitized prompt to the appropriate model
  3. Post-filtering: Receives the model's response and applies:
    • Duplication detection: Filters suggestions that match public repos
    • Security warnings: Flags patterns matching known vulnerabilities
    • Content safety: Filters harmful or inappropriate output
  4. Returns the suggestion to your IDE

Code Suggestion Lifecycle โ€‹

User types code in IDE
        โ†“
IDE extension captures context (prefix, suffix, open files)
        โ†“
Context is assembled into a prompt
        โ†“
Prompt sent to Copilot proxy (HTTPS)
        โ†“
Proxy pre-filters (content exclusions applied)
        โ†“
Prompt sent to LLM (GPT-4 / Copilot model)
        โ†“
LLM returns completion
        โ†“
Proxy post-filters (duplication detection, security warnings)
        โ†“
Suggestion displayed in IDE (ghost text)
        โ†“
User accepts or dismisses

LLM Limitations (What Copilot Can't Do) โ€‹

LimitationImplication
No real-time knowledgeCopilot doesn't know about APIs released after its training cutoff
Probabilistic, not deterministicSame prompt can produce different suggestions โ€” output is never guaranteed
No execution environmentCopilot doesn't run or test the code it generates (unless in Agent Mode with terminal access)
Context window capVery large files or projects may exceed context limits, reducing quality
Language support variesBest performance in popular languages (Python, JS, TypeScript, Java, C#); weaker in niche languages
Hallucination riskEspecially for obscure APIs, uncommon libraries, or complex logic

Domain 3 Quick Quiz

1 / 4
โ“

Does Microsoft train Copilot models on Copilot Business customers' code?

(Click to reveal)
๐Ÿ’ก
No. For Business and Enterprise plans, your code is not used to train the foundational model. Only Individual users may have their data used for improvement (opt-out available).

โ† Domain 2 ยท Next Domain โ†’

Happy Studying! ๐Ÿš€ โ€ข Privacy-friendly analytics โ€” no cookies, no personal data
Privacy Policy โ€ข AI Disclaimer โ€ข Report an issue