When your monthly premium request limit (300/300) is fully used up, Copilot should stop giving you premium-quality answers. Instead, it keeps going. The server even admits you're over the limit in its own response data, but serves you anyway.
GitHub Copilot Pro gives you 300 premium requests per month. "Premium" means you get answers from powerful AI models like GPT-4o or Claude 3.5 Sonnet, instead of the smaller, free model (GPT-4o-mini).
Once you've used all 300, Copilot is supposed to either stop giving you premium answers and switch you to the free model automatically, or show you an error saying you've hit your limit.
After using all 300 premium requests, Copilot kept serving premium-quality answers with no errors, no warnings, and no automatic downgrade. 26 additional premium requests were served past the limit before a single rejection appeared. The server's own response data included a field that literally says "this user is over the limit and did NOT opt in to pay extra" but it served the response anyway.
These numbers come from a 70-minute monitoring session where every request Copilot made was recorded using a custom tool.
The one rejection came from a different backend system than the one serving most requests. The main system that handles GPT-4o requests never rejected a single one, even though the quota was fully exhausted.
This bug only shows up when a specific set of conditions are all true at the same time. If any single one is missing, the bug won't appear and everything will look normal.
| Requirement | Why it matters | How to check |
|---|---|---|
| Copilot Pro subscription (the individual plan, not a company/team plan) | The Pro plan has a hard cap of 300 premium requests. Company plans may work differently. | Go to github.com/settings/copilot and look for "Copilot Pro" |
| You've used all 300 requests (quota at 100% or higher) | The bug is about what happens AFTER you run out. If you still have requests left, everything works normally. | Same page. Look at the premium requests counter. It needs to show 300/300 or higher. |
| Overage billing is OFF (this is the default setting) | If you've opted in to pay for extra requests, then getting served past the limit is expected behavior, not a bug. The bug is that you get served past the limit without paying. | Same page. Look for "Overage" or "Additional usage" and confirm it's off or that you never enabled it. |
| VS Code Desktop app (not the browser version) | The desktop app uses a different internal networking system than the browser version. The bug was observed and tested on the desktop app only. | If you launched VS Code as a desktop application (not through a browser tab), you're good. |
| A premium model is selected in Copilot Chat | The free model (GPT-4o-mini) doesn't use premium quota at all, so testing with it won't show the bug. | Open Copilot Chat and check the model name at the bottom. It should say something like "GPT-4o" or "Claude 3.5 Sonnet". |
Ctrl+Shift+I.
What is 1+1? and press Enter.
100%. Every single test with an exhausted quota reproduced this bug. It is not random or intermittent. The server consistently serves past the limit.
The Copilot system routes different AI models through different backend servers. These different servers enforce the quota limit differently, which affects how visible the bug is:
| Model | Backend | What happens past the limit | Best for testing? |
|---|---|---|---|
| Claude 3.5 / 3.7 Sonnet | Anthropic servers | Sometimes rejects (402), sometimes serves anyway. The inconsistency itself is evidence of the bug. | Best |
| GPT-4o | OpenAI servers | Never rejects. All 30+ requests past the limit were served with HTTP 200. Zero rejections. | Good |
| GPT-4o-mini | OpenAI servers | This is the free model. It doesn't use premium quota, so testing with it proves nothing. | Useless |
Best choice: Claude 3.5 Sonnet. With Claude, you'll occasionally see the server reject a request (error code 402) only to successfully serve the very next one seconds later. This inconsistency makes the bug extremely obvious: the server clearly knows you're over the limit (it just rejected you!) but then serves the next request like nothing happened.
When your 300 premium requests are used up, Copilot should either show you an error message, automatically switch to the free model (GPT-4o-mini), or return an HTTP 402 error code telling the system "payment required, quota exhausted."
Copilot serves your request at full premium quality, returns a success code (HTTP 200), and the server attaches metadata to its response that openly says "this user is 1.5 requests over the limit and did not enable overage billing" while still fulfilling the request.
There are two independent failures happening at the same time. Each one alone might not be a problem, but together they create a situation where no part of the system actually stops you from using premium requests after your quota runs out.
The Copilot extension running on your computer is supposed to check your quota before sending each request. If you're out of premium requests, it should automatically downgrade you to the free model. Here's why it doesn't:
Every time you send a message, the server sends back two separate pieces of quota information in its response. Think of them as two different "buckets":
Limit: 300 requests
Remaining: 0%
Over the limit by: 1.5
Overage billing: OFF
Verdict: "You're out. Stop."
Limit: Unlimited (-1)
Remaining: 100%
Over the limit by: 0
Overage billing: OFF
Verdict: "You're totally fine. Keep going."
Both of these arrive in the same response from the server. The Copilot extension reads them one at a time and saves each one into the same storage spot in memory. Whatever it reads last wins.
Bucket 2 ("Chat", unlimited) is always processed second. So the extension's memory ends up storing the "unlimited" verdict. When it checks "is the user out of quota?", it reads the "Chat" bucket and concludes: "Nope, unlimited, let them through." The actual premium limit (Bucket 1) gets thrown away.
Here's the simplified version of what the code does:
// The extension saves each bucket into the same variable.
// Whatever is saved last is what the system reads when checking quota.
// First: Premium bucket arrives (limit = 300, used = 300)
quotaInfo = "premium" // Correctly says: exhausted!
// Then: Chat bucket arrives (limit = unlimited, used = 0)
quotaInfo = "chat" // Overwrites! Now says: unlimited!
// Later, when deciding whether to block the user:
if (quotaInfo.limit == -1) return false; // -1 means unlimited
// ^ This reads the "chat" bucket, sees unlimited, and says "not exhausted"
// The premium limit is completely invisible now
Even if the extension on your computer correctly detected the exhaustion (which it doesn't, due to Failure #1), the server itself should be the final line of defense. It's the server that decides whether to actually process your request or refuse it. But the server also fails to enforce the limit:
The server appears to use a "fail-open" design. This is a software pattern where, if the system isn't sure whether to block or allow, it chooses to allow. The idea is: "it's better to accidentally give a few free requests than to accidentally block a paying customer." This is reasonable in isolation, but combined with Failure #1, it means neither the local software nor the server ever says "no."
Here's the full sequence of what happens when you send a message after your quota is used up:
All of this data was captured by intercepting the actual network requests that Copilot makes behind the scenes. A custom monitoring tool logged every request and response during a 70-minute session.
Every response from the server included hidden metadata. Here's what that metadata said, translated into plain language:
| Field | Value | What it means |
|---|---|---|
ent |
300 |
Your plan allows 300 premium requests |
rem |
0% |
0% remaining. Completely used up. |
ov |
1.5 |
You are 1.5 requests OVER the limit |
ovPerm |
false |
You did NOT opt in to pay for extra usage |
| HTTP Status | 200 |
Request served successfully (but shouldn't have been) |
In plain English: the server is saying "This user used all 300 of their requests, they're 1.5 over the limit, they did not agree to pay extra, but here's their answer anyway."
Out of the entire 70-minute session, the server returned a single "402 Payment Required" error. But:
When using Claude as the AI model, this pattern repeated 14 times in 70 minutes:
The server rejected the request, confirmed the user was over the limit, and then the retry went through anyway. This happened 14 times.
| Metric | Count | What it means |
|---|---|---|
| Session duration | ~70 minutes | |
| Total data points captured | 168 snapshots | Every response was logged |
| Anomalies detected | 225+ | Moments where something was clearly wrong |
| Requests served past limit | 26 | Server said "success" (200) when it should have said "blocked" (402) |
| Server rejections (402) | 14 | All on Claude's backend, zero on GPT-4o's backend |
| Bucket switches | ~60 | The quota info flipped between "exhausted" and "unlimited" on every response |
Users can consume premium requests past their paid limit without paying for overage. Depending on how GitHub bills for this internally, it's either lost revenue or untracked compute costs.
Users see their quota at "100% used" in the GitHub settings page, expect to be blocked or downgraded, but keep getting premium answers. The quota counter in the UI bounces between different values on every request because the two buckets keep overwriting each other. The behavior looks random and broken.
The documented behavior (automatic switch to the free model when quota is exhausted) never triggers. The client-side check that's supposed to trigger the downgrade always reads the wrong bucket and concludes quota is not exhausted.
Instead of saving both quota buckets into a single variable (where the second one always overwrites the first), store each bucket separately. When checking whether the user is out of quota, always prefer reading the "premium_interactions" bucket over the "chat" bucket.
// Current code (broken):
// Each bucket overwrites the same variable. Last one wins = chat = unlimited.
this._quotaInfo = parseQuota(snapshotHeader);
// Fixed code:
// Store each bucket separately. Always prefer the premium bucket when checking.
this._quotaInfoByBucket[bucketType] = parseQuota(snapshotHeader);
this._quotaInfo = this._quotaInfoByBucket["premium_interactions"]
?? this._quotaInfoByBucket["chat"];
When the server's own response includes ovPerm=false (user did not opt into overage)
and ov > 0 (user is past the limit), the response should be HTTP 402 (rejected),
not HTTP 200 (success). The grace period / fail-open logic should not apply when the server's
own data confirms the user is over the limit without permission to go over.
The evidence above was captured using a custom VS Code extension called "Quota Sentinel" that watches all network traffic from Copilot and logs quota-related data. If you want to independently verify this bug and capture your own evidence, here's how to set up the monitoring tool.
You do NOT need this monitoring tool to reproduce the bug. The bug happens whether you're watching or not. The tool just makes the internal behavior visible so you can see exactly what's going wrong. The simple test in Section 03 above is enough to confirm the bug.
Copilot sends its requests using a special internal networking system called electron.net.fetch.
This is part of the Electron framework that VS Code is built on. It completely bypasses the normal
networking tools that developers typically use to inspect traffic (like the browser DevTools network tab
or Node.js HTTP libraries).
The only way to see these requests is to create a VS Code extension that runs in the same process as Copilot and intercepts calls to this internal networking system before they go out.
VS Code loads extensions from a specific folder on your computer. The folder name MUST follow
a specific format: publisher.extensionname-version. If the name is wrong, VS Code
will silently ignore it with no error message.
On Windows (run in PowerShell):
mkdir "$env:USERPROFILE\.vscode\extensions\sufficientdaikon.copilot-quota-sentinel-4.0.0"
On macOS or Linux (run in Terminal):
mkdir -p ~/.vscode/extensions/sufficientdaikon.copilot-quota-sentinel-4.0.0/
Create a file called package.json inside that folder. This tells VS Code basic
information about the extension (its name, when to activate it, etc.).
The file contents are provided in the accompanying source files.
Create a file called extension.js in the same folder. This is the actual code that
intercepts Copilot's network traffic and logs quota data. The full source (~660 lines) is
provided separately.
Press Ctrl+Shift+P (or Cmd+Shift+P on Mac), type
"Developer: Reload Window", and press Enter. VS Code will restart and
load the new extension.
copilot-quota-log.jsonl in your home folder| What you see | What's wrong | How to fix it |
|---|---|---|
| Extension doesn't appear in VS Code's extension list | The folder is named incorrectly | The folder must be named exactly sufficientdaikon.copilot-quota-sentinel-4.0.0. No variations. |
| Extension appears but the log file stays empty | The internal networking system isn't accessible. This can happen with Flatpak or Snap installs on Linux. | Use the official .deb, .rpm, or .tar.gz install of VS Code, not a sandboxed version. |
| Log only shows "started" events, nothing else | Copilot isn't making any requests | Open Copilot Chat and send a message. Make sure you're signed into GitHub. |
| Log shows data but no anomalies | Your quota isn't actually at 100% | Check github.com/settings/copilot. If you still have requests left, the bug won't be visible. |
| Copilot automatically switches you to the free model | The quota check actually worked correctly (rare) | Reload VS Code and immediately start chatting. The first request after a reload is the most likely to trigger the bug because the extension has no cached quota data yet. |
| On NixOS: extension loads but captures nothing | VS Code's Electron binary can't access net.fetch without proper library support |
Make sure nix-ld is configured with the required system libraries for the proprietary VS Code binary. |
After running a monitoring session, you can count specific markers in the log file to confirm the bug. Run these commands in your terminal (on macOS/Linux) or Git Bash / WSL (on Windows):
# How many times was a request served past the limit?
grep -c 'EXHAUSTED_BUT_WORKING' ~/copilot-quota-log.jsonl
# How many times did the quota bucket flip between "exhausted" and "unlimited"?
grep -c 'BUCKET_SWITCH' ~/copilot-quota-log.jsonl
# How many times did the server reject with 402?
grep -c 'HTTP_402' ~/copilot-quota-log.jsonl
If EXHAUSTED_BUT_WORKING is greater than 0 and BUCKET_SWITCH is
greater than 0, the bug is confirmed. The HTTP_402 count might be 0 (especially
if you used GPT-4o instead of Claude), and that's fine. The overserving is the bug, not the
rejections.