GitHub quietly changed an important default-adjacent part of Copilot: for individual non-enterprise plans, Copilot auto model selection can now route users to evaluation models.
That sounds like a tiny changelog item. It is not tiny if you are the person trying to make AI-assisted development predictable, auditable, and boring enough to survive contact with production.
The short version: “auto” is no longer just choosing among known, generally available models. It may also choose models GitHub is still evaluating. GitHub says users can disable this in Copilot settings, and its model documentation adds the part operators should actually notice: evaluation models may show up under codenames, may be added or removed without notice, and GitHub’s own testing found they may perform worse than other models on security-related or other prompt categories.
That is a useful sentence. It is also the kind of sentence you want to read before your IDE cheerfully suggests a diff against auth middleware.
GitHub’s June 1 changelog says Copilot now offers evaluation models to individual non-enterprise users, and those models may be served through Copilot’s automatic model selection. The linked docs describe evaluation models as coming from, or being fine-tuned by, providers including Microsoft, OpenAI, Anthropic, and Google, with GitHub/Microsoft testing before release.
Two operational details matter more than the model-provider trivia:
This does not mean GitHub is doing something malicious. It means “auto” is becoming a policy surface, not just a convenience toggle.
AI coding tools have been marketed as assistants. In practice, they are becoming routing layers: they decide which model sees which prompt, how much reasoning to spend, what context to include, whether to call tools, and how expensive the session becomes.
That routing layer now affects three things teams usually pretend are separate:
For individual developers, the practical answer may be simple: check the setting, decide whether you want evaluation models, and disable them if you are working on sensitive or security-heavy code.
For teams, the lesson is broader: do not let convenience defaults quietly become your AI governance model. Defaults are where policy goes to nap.
If you manage developers using Copilot or similar tools, treat this as a prompt to tighten the boring parts:
None of this requires panic. It requires treating AI assistants like part of the engineering system instead of a magical autocomplete fern in the corner.
This change currently applies to individual non-enterprise Copilot plans, according to GitHub’s changelog. Enterprises have more model-policy machinery, and GitHub has recently been adding controls around model targeting and Copilot metrics. That is good, but it also reinforces the point: AI tooling now needs admin policy, observability, and review paths.
Also, “evaluation model” does not automatically mean “bad model.” It means the model is under evaluation, may be less predictable, and may change faster than your team’s assumptions. That distinction matters.
GitHub’s update is a small signpost for a larger shift: AI coding assistants are becoming managed platforms with routing, budgets, policies, and experimental lanes. Builders should use them, but operators should govern them.
Autocomplete grew a control plane. Try not to act surprised when it starts needing controls.