Active Bug

Copilot Keeps Serving Premium Requests After Your Quota Runs Out

When your monthly premium request limit (300/300) is fully used up, Copilot should stop giving you premium-quality answers. Instead, it keeps going. The server even admits you're over the limit in its own response data, but serves you anyway.

VS Code 1.111.0 copilot-chat 0.39.0 Copilot Pro Windows 11 300 premium/mo

What Happened

GitHub Copilot Pro gives you 300 premium requests per month. "Premium" means you get answers from powerful AI models like GPT-4o or Claude 3.5 Sonnet, instead of the smaller, free model (GPT-4o-mini).

Once you've used all 300, Copilot is supposed to either stop giving you premium answers and switch you to the free model automatically, or show you an error saying you've hit your limit.

The Bug

After using all 300 premium requests, Copilot kept serving premium-quality answers with no errors, no warnings, and no automatic downgrade. 26 additional premium requests were served past the limit before a single rejection appeared. The server's own response data included a field that literally says "this user is over the limit and did NOT opt in to pay extra" but it served the response anyway.

By the Numbers

These numbers come from a 70-minute monitoring session where every request Copilot made was recorded using a custom tool.

26
Requests served past limit
1
Actual rejection (HTTP 402)
200
HTTP status (should be 402)
14
Times server said "over limit" but served anyway

The one rejection came from a different backend system than the one serving most requests. The main system that handles GPT-4o requests never rejected a single one, even though the quota was fully exhausted.

How to Reproduce It Yourself

What You Need Before Starting

This bug only shows up when a specific set of conditions are all true at the same time. If any single one is missing, the bug won't appear and everything will look normal.

Requirement Why it matters How to check
Copilot Pro subscription (the individual plan, not a company/team plan) The Pro plan has a hard cap of 300 premium requests. Company plans may work differently. Go to github.com/settings/copilot and look for "Copilot Pro"
You've used all 300 requests (quota at 100% or higher) The bug is about what happens AFTER you run out. If you still have requests left, everything works normally. Same page. Look at the premium requests counter. It needs to show 300/300 or higher.
Overage billing is OFF (this is the default setting) If you've opted in to pay for extra requests, then getting served past the limit is expected behavior, not a bug. The bug is that you get served past the limit without paying. Same page. Look for "Overage" or "Additional usage" and confirm it's off or that you never enabled it.
VS Code Desktop app (not the browser version) The desktop app uses a different internal networking system than the browser version. The bug was observed and tested on the desktop app only. If you launched VS Code as a desktop application (not through a browser tab), you're good.
A premium model is selected in Copilot Chat The free model (GPT-4o-mini) doesn't use premium quota at all, so testing with it won't show the bug. Open Copilot Chat and check the model name at the bottom. It should say something like "GPT-4o" or "Claude 3.5 Sonnet".

Steps to Reproduce

  1. Confirm your quota is used up. Go to github.com/settings/copilot and verify it says 300/300 premium requests used (or more).
  2. Open VS Code Desktop. Just launch it normally. No special settings or developer modes needed.
  3. Open Copilot Chat. Click the Copilot icon in the sidebar, or press Ctrl+Shift+I.
  4. Make sure a premium model is selected. At the bottom of the chat panel, there's a model selector. It should show a premium model like "Claude 3.5 Sonnet" or "GPT-4o". If it says "GPT-4o-mini", switch it to a premium model.
  5. Send any message. Type something simple like What is 1+1? and press Enter.
  6. Look at the result. You'll get a full, detailed, premium-quality answer. No errors, no warnings, no mention that your quota is exhausted.
  7. Send several more messages. Try 5 to 10 more. All of them will be answered with the premium model at full quality.
Reproduction Rate

100%. Every single test with an exhausted quota reproduced this bug. It is not random or intermittent. The server consistently serves past the limit.

Which AI Model to Use for Testing

The Copilot system routes different AI models through different backend servers. These different servers enforce the quota limit differently, which affects how visible the bug is:

Model Backend What happens past the limit Best for testing?
Claude 3.5 / 3.7 Sonnet Anthropic servers Sometimes rejects (402), sometimes serves anyway. The inconsistency itself is evidence of the bug. Best
GPT-4o OpenAI servers Never rejects. All 30+ requests past the limit were served with HTTP 200. Zero rejections. Good
GPT-4o-mini OpenAI servers This is the free model. It doesn't use premium quota, so testing with it proves nothing. Useless

Best choice: Claude 3.5 Sonnet. With Claude, you'll occasionally see the server reject a request (error code 402) only to successfully serve the very next one seconds later. This inconsistency makes the bug extremely obvious: the server clearly knows you're over the limit (it just rejected you!) but then serves the next request like nothing happened.

What Should Happen vs. What Actually Happens

Expected behavior

When your 300 premium requests are used up, Copilot should either show you an error message, automatically switch to the free model (GPT-4o-mini), or return an HTTP 402 error code telling the system "payment required, quota exhausted."

What actually happens

Copilot serves your request at full premium quality, returns a success code (HTTP 200), and the server attaches metadata to its response that openly says "this user is 1.5 requests over the limit and did not enable overage billing" while still fulfilling the request.

Why This Happens

There are two independent failures happening at the same time. Each one alone might not be a problem, but together they create a situation where no part of the system actually stops you from using premium requests after your quota runs out.

Failure #1: The Copilot Extension Never Blocks You

The Copilot extension running on your computer is supposed to check your quota before sending each request. If you're out of premium requests, it should automatically downgrade you to the free model. Here's why it doesn't:

The "Two Buckets" Problem

Every time you send a message, the server sends back two separate pieces of quota information in its response. Think of them as two different "buckets":

Bucket 1: "Premium Interactions"

Limit: 300 requests
Remaining: 0%
Over the limit by: 1.5
Overage billing: OFF

Verdict: "You're out. Stop."

Bucket 2: "Chat"

Limit: Unlimited (-1)
Remaining: 100%
Over the limit by: 0
Overage billing: OFF

Verdict: "You're totally fine. Keep going."

Both of these arrive in the same response from the server. The Copilot extension reads them one at a time and saves each one into the same storage spot in memory. Whatever it reads last wins.

The Overwrite

Bucket 2 ("Chat", unlimited) is always processed second. So the extension's memory ends up storing the "unlimited" verdict. When it checks "is the user out of quota?", it reads the "Chat" bucket and concludes: "Nope, unlimited, let them through." The actual premium limit (Bucket 1) gets thrown away.

Here's the simplified version of what the code does:

// The extension saves each bucket into the same variable.
// Whatever is saved last is what the system reads when checking quota.

// First: Premium bucket arrives (limit = 300, used = 300)
quotaInfo = "premium"  // Correctly says: exhausted!

// Then: Chat bucket arrives (limit = unlimited, used = 0)
quotaInfo = "chat"     // Overwrites! Now says: unlimited!

// Later, when deciding whether to block the user:
if (quotaInfo.limit == -1) return false;  // -1 means unlimited
// ^ This reads the "chat" bucket, sees unlimited, and says "not exhausted"
// The premium limit is completely invisible now

Failure #2: The Server Doesn't Actually Block You Either

Even if the extension on your computer correctly detected the exhaustion (which it doesn't, due to Failure #1), the server itself should be the final line of defense. It's the server that decides whether to actually process your request or refuse it. But the server also fails to enforce the limit:

The server appears to use a "fail-open" design. This is a software pattern where, if the system isn't sure whether to block or allow, it chooses to allow. The idea is: "it's better to accidentally give a few free requests than to accidentally block a paying customer." This is reasonable in isolation, but combined with Failure #1, it means neither the local software nor the server ever says "no."

How the Two Failures Work Together

Here's the full sequence of what happens when you send a message after your quota is used up:

1 You send a message in Copilot Chat
|
2 Extension checks: "Is this user out of quota?"
|
3 Extension reads the "chat" bucket (unlimited) instead of the "premium" bucket (exhausted)
|
4 Extension decides: "Not exhausted, send it through"
|
5 Request goes to server
|
6 Server checks distributed counter, sees user is over the limit
|
7 Server decides to serve anyway (fail-open / grace period)
|
8 Response includes both quota buckets. The overwrite repeats.
|
9 Cycle starts over for the next message. Nothing ever blocks the user.

Evidence from Live Capture

All of this data was captured by intercepting the actual network requests that Copilot makes behind the scenes. A custom monitoring tool logged every request and response during a 70-minute session.

The Server's Own Headers Prove Overserving

Every response from the server included hidden metadata. Here's what that metadata said, translated into plain language:

Field Value What it means
ent 300 Your plan allows 300 premium requests
rem 0% 0% remaining. Completely used up.
ov 1.5 You are 1.5 requests OVER the limit
ovPerm false You did NOT opt in to pay for extra usage
HTTP Status 200 Request served successfully (but shouldn't have been)

In plain English: the server is saying "This user used all 300 of their requests, they're 1.5 over the limit, they did not agree to pay extra, but here's their answer anyway."

The 402 That Proves the Server Knows

Out of the entire 70-minute session, the server returned a single "402 Payment Required" error. But:

The 402-Then-Retry Cycle

When using Claude as the AI model, this pattern repeated 14 times in 70 minutes:

1 Request sent to Claude's servers
|
2 Server returns 402 "Payment Required"
|
3 Copilot immediately refreshes your account info (confirms: yes, you're over the limit)
|
4 Copilot retries the request anyway
|
5 This time: HTTP 200 Success! Answer served at full quality.
|
6 Response includes both quota buckets. The "unlimited chat" bucket overwrites the exhausted one.

The server rejected the request, confirmed the user was over the limit, and then the retry went through anyway. This happened 14 times.

Full Session Statistics

MetricCountWhat it means
Session duration~70 minutes
Total data points captured168 snapshotsEvery response was logged
Anomalies detected225+Moments where something was clearly wrong
Requests served past limit26Server said "success" (200) when it should have said "blocked" (402)
Server rejections (402)14All on Claude's backend, zero on GPT-4o's backend
Bucket switches~60The quota info flipped between "exhausted" and "unlimited" on every response

Why This Matters

Billing & Revenue

Users can consume premium requests past their paid limit without paying for overage. Depending on how GitHub bills for this internally, it's either lost revenue or untracked compute costs.

User Confusion

Users see their quota at "100% used" in the GitHub settings page, expect to be blocked or downgraded, but keep getting premium answers. The quota counter in the UI bounces between different values on every request because the two buckets keep overwriting each other. The behavior looks random and broken.

Broken Auto-Downgrade

The documented behavior (automatic switch to the free model when quota is exhausted) never triggers. The client-side check that's supposed to trigger the downgrade always reads the wrong bucket and concludes quota is not exhausted.

How to Fix It

Fix the Extension (Client-Side)

Instead of saving both quota buckets into a single variable (where the second one always overwrites the first), store each bucket separately. When checking whether the user is out of quota, always prefer reading the "premium_interactions" bucket over the "chat" bucket.

// Current code (broken):
// Each bucket overwrites the same variable. Last one wins = chat = unlimited.
this._quotaInfo = parseQuota(snapshotHeader);

// Fixed code:
// Store each bucket separately. Always prefer the premium bucket when checking.
this._quotaInfoByBucket[bucketType] = parseQuota(snapshotHeader);
this._quotaInfo = this._quotaInfoByBucket["premium_interactions"]
               ?? this._quotaInfoByBucket["chat"];

Fix the Server (Server-Side)

When the server's own response includes ovPerm=false (user did not opt into overage) and ov > 0 (user is past the limit), the response should be HTTP 402 (rejected), not HTTP 200 (success). The grace period / fail-open logic should not apply when the server's own data confirms the user is over the limit without permission to go over.

Full Reproduction with Monitoring Tool

The evidence above was captured using a custom VS Code extension called "Quota Sentinel" that watches all network traffic from Copilot and logs quota-related data. If you want to independently verify this bug and capture your own evidence, here's how to set up the monitoring tool.

Note

You do NOT need this monitoring tool to reproduce the bug. The bug happens whether you're watching or not. The tool just makes the internal behavior visible so you can see exactly what's going wrong. The simple test in Section 03 above is enough to confirm the bug.

Why a custom extension is needed

Copilot sends its requests using a special internal networking system called electron.net.fetch. This is part of the Electron framework that VS Code is built on. It completely bypasses the normal networking tools that developers typically use to inspect traffic (like the browser DevTools network tab or Node.js HTTP libraries).

The only way to see these requests is to create a VS Code extension that runs in the same process as Copilot and intercepts calls to this internal networking system before they go out.

Step-by-step: Installing the monitoring extension

Step 1: Create the extension folder

VS Code loads extensions from a specific folder on your computer. The folder name MUST follow a specific format: publisher.extensionname-version. If the name is wrong, VS Code will silently ignore it with no error message.

On Windows (run in PowerShell):

mkdir "$env:USERPROFILE\.vscode\extensions\sufficientdaikon.copilot-quota-sentinel-4.0.0"

On macOS or Linux (run in Terminal):

mkdir -p ~/.vscode/extensions/sufficientdaikon.copilot-quota-sentinel-4.0.0/

Step 2: Create the configuration file

Create a file called package.json inside that folder. This tells VS Code basic information about the extension (its name, when to activate it, etc.).

The file contents are provided in the accompanying source files.

Step 3: Create the extension code

Create a file called extension.js in the same folder. This is the actual code that intercepts Copilot's network traffic and logs quota data. The full source (~660 lines) is provided separately.

Step 4: Reload VS Code

Press Ctrl+Shift+P (or Cmd+Shift+P on Mac), type "Developer: Reload Window", and press Enter. VS Code will restart and load the new extension.

Step 5: Verify it's working

  1. Go to View in the top menu, then click Output
  2. In the Output panel, click the dropdown on the right and select "Quota Sentinel"
  3. You should see a startup message saying the extension is active
  4. A log file will be created at copilot-quota-log.jsonl in your home folder
Common problems and how to fix them
What you see What's wrong How to fix it
Extension doesn't appear in VS Code's extension list The folder is named incorrectly The folder must be named exactly sufficientdaikon.copilot-quota-sentinel-4.0.0. No variations.
Extension appears but the log file stays empty The internal networking system isn't accessible. This can happen with Flatpak or Snap installs on Linux. Use the official .deb, .rpm, or .tar.gz install of VS Code, not a sandboxed version.
Log only shows "started" events, nothing else Copilot isn't making any requests Open Copilot Chat and send a message. Make sure you're signed into GitHub.
Log shows data but no anomalies Your quota isn't actually at 100% Check github.com/settings/copilot. If you still have requests left, the bug won't be visible.
Copilot automatically switches you to the free model The quota check actually worked correctly (rare) Reload VS Code and immediately start chatting. The first request after a reload is the most likely to trigger the bug because the extension has no cached quota data yet.
On NixOS: extension loads but captures nothing VS Code's Electron binary can't access net.fetch without proper library support Make sure nix-ld is configured with the required system libraries for the proprietary VS Code binary.
What to look for in the captured data

After running a monitoring session, you can count specific markers in the log file to confirm the bug. Run these commands in your terminal (on macOS/Linux) or Git Bash / WSL (on Windows):

# How many times was a request served past the limit?
grep -c 'EXHAUSTED_BUT_WORKING' ~/copilot-quota-log.jsonl

# How many times did the quota bucket flip between "exhausted" and "unlimited"?
grep -c 'BUCKET_SWITCH' ~/copilot-quota-log.jsonl

# How many times did the server reject with 402?
grep -c 'HTTP_402' ~/copilot-quota-log.jsonl

If EXHAUSTED_BUT_WORKING is greater than 0 and BUCKET_SWITCH is greater than 0, the bug is confirmed. The HTTP_402 count might be 0 (especially if you used GPT-4o instead of Claude), and that's fine. The overserving is the bug, not the rejections.