TL;DR
I had a Foundry agent throwing a flat 400 invalid_payload on every call. The message was blunt: Not allowed when agent is specified. It turned out my code was still thinking in old Foundry terms, passing instructions, temperature and top_p on the request while also pointing at a stored agent. The new Foundry will not let you do both. The agent definition is now the single source of truth. I moved the sampling parameters onto the versioned agent, folded my per-call instructions into the input, pinned the SDK, and the error vanished. The fix was small. The architectural lesson was the interesting part, and it is a good one.
I run a system called Significant Terms. It scans insurance policy documents, pulls out the clauses that actually change cover, and turns a pile of PDFs into something a human can act on without reading every page. The heavy lifting is done by a set of Azure AI Foundry agents called from a Python Function App.
It worked. It was fast. Then one morning the AI calls stopped, and the logs were not subtle about it:
400 - invalid_payload
ValidationError: Not allowed when agent is specified. (param: instructions)
ValidationError: Not allowed when agent is specified. (param: temperature)
ValidationError: Not allowed when agent is specified. (param: top_p)
Three parameters, all rejected, all for the same reason. No retry was going to fix that. The request itself was wrong. What I had on my hands was not a bug in my code so much as my code arguing with a newer version of Foundry that had quietly changed the rules. So I went and read the rules properly. The differences between old and new Foundry are worth writing down, because if you built anything on the first generation of the Agents API, you are going to meet them.
What old Foundry looked like
The first generation of the Azure AI Foundry Agents service borrowed its shape from the OpenAI Assistants API. If you built on it, you will remember the rhythm. You created a thread. You added a message. You created a run. Then you sat in a polling loop asking are we done yet until the run reached a terminal state, and only then did you read the messages back off the thread.
It was a state machine, and it leaked into your code whether you wanted it to or not. You owned the polling. You owned the timeouts. You owned the retry logic for runs that came back queued or in_progress for longer than you liked. And critically for this story, every run let you override the assistant. You could pass instructions, temperature and top_p per run, on top of whatever the stored assistant already had. That flexibility felt great at the time. It also meant behaviour lived in two places at once: a little on the stored assistant, and a little scattered across every call site.
What new Foundry looks like
The new Foundry, built on azure-ai-projects 2.x and the Responses API, takes a different view. Two things change, and both of them are improvements.
First, the calls are synchronous. responses.create() returns the answer. There is no thread, no run, no poll. The state machine that used to live in my Function App is gone. I deleted a meaningful amount of orchestration code and did not miss any of it.
Second, agents are referenced, not overridden. Instead of carrying a behaviour payload on every request, you point at a stored, versioned agent definition with an agent_reference in the request body. The definition holds the model, the instructions, the sampling parameters and the tools. The request holds the input. That is the whole contract.
Here is the client setup, which is worth noting because the method names moved between versions:
from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import PromptAgentDefinition
from azure.identity import DefaultAzureCredential
project = AIProjectClient(endpoint=endpoint, credential=DefaultAzureCredential())
# In 2.x this returns an authenticated OpenAI client pointed at
# {endpoint}/openai/v1. The old 1.x get_azure_openai_client() is gone.
openai_client = project.get_openai_client()
The error, decoded
With that context, the 400 stops being mysterious and starts being helpful. My _run_agent helper was a hybrid. It pointed at a stored agent with an agent_reference, which is the new way, but it also set instructions, temperature and top_p on the request, which is the old way. Foundry looked at that payload, saw an agent reference, and refused to also accept request-level overrides for the three things the agent already owns.
The official guidance is explicit about this. When you connect to a service managed Foundry agent, the agent’s own instructions are authoritative, and system or developer messages are stripped before the request is sent. Sampling parameters are expected to live on the agent definition. The endpoint I was hitting simply enforces that with a hard 400 instead of silently dropping the fields, which I would argue is the better behaviour. A silent drop would have left me debugging why my carefully tuned temperature was being ignored. A loud rejection told me exactly what to fix.
Old versus new, side by side
| Concern | Old Foundry (1.x Threads and Runs) | New Foundry (2.x Responses API) |
|---|---|---|
| Call model | Create thread, add message, create run, poll to completion, read messages | One synchronous responses.create() call |
| Where behaviour lives | Split between the stored assistant and per-run overrides | Entirely on the versioned agent definition |
| instructions, temperature, top_p | Allowed per run | Rejected when an agent is referenced |
| Per-call variation | Override fields on the run | Goes into the input, or into a dedicated agent version |
| Client method | get_azure_openai_client() |
get_openai_client() |
| State you own | Polling, timeouts, run status handling | None of it |
The fix, in three parts
1. Stop sending what the agent owns
The broken helper added three fields it was no longer allowed to send. The fix was to drop them from the request and fold the per-call instructions into the input instead. My callers each pass task specific instructions (heading extraction, term extraction, filtering, summarisation), and a single agent only carries one stored system prompt, so the per-call prompt belongs in the input rather than on a forbidden field.
# Before: hybrid payload that Foundry rejects
kwargs = {
"input": content,
"extra_body": {"agent_reference": {"name": agent_id, "type": "agent_reference"}},
}
if temperature is not None:
kwargs["temperature"] = temperature # rejected
if top_p is not None:
kwargs["top_p"] = top_p # rejected
if instructions:
kwargs["instructions"] = instructions # rejected
# After: input carries the task prompt, agent owns everything else
agent_input = f"{instructions}\n\n---\n\n{content}" if instructions else content
kwargs = {
"input": agent_input,
"extra_body": {"agent_reference": {"name": agent_id, "type": "agent_reference"}},
}
response = openai_client.responses.create(**kwargs)
2. Move the sampling parameters onto the agent
I was running my extraction calls at temperature=0.1 and top_p=0.9 for a reason. I wanted focused, low variance output. If I just deleted those fields and walked away, the agent would fall back to its default sampling and my output would drift. So the values had to land somewhere, and the right place is now the agent definition itself, written through create_version on cold start.
client.agents.create_version(
agent_name=agent_id,
definition=PromptAgentDefinition(
model=agent_model,
instructions=system_prompt,
temperature=float(os.environ.get("AGENT_TEMPERATURE", "0.1")),
top_p=float(os.environ.get("AGENT_TOP_P", "0.9")),
tools=[],
),
)
This is the part that turned a fix into an upgrade. The sampling configuration is no longer a magic number copied across half a dozen call sites. It is a property of a named, versioned agent, set once, overridable by environment variable, and visible in the Foundry portal. If I want a more creative agent later, that is a new version, not a code change scattered through the app.
3. Pin the SDK so this does not happen by surprise again
The whole episode was triggered by a version change underneath me. My requirements file used open floors, which is exactly how you end up on a new major without noticing. I switched the Foundry packages to compatible release pins, which allow bug fixes inside the major but block the next breaking jump.
azure-ai-projects~=2.2 # 2.x fixes, no surprise 3.0
openai~=1.66
azure-identity~=1.15
Why this is genuinely better, not just different
It would be easy to read all of this as Microsoft taking flexibility away. I do not see it that way, and if you are a CIO or an architect signing off an agent platform, neither should you. Three things stand out.
Behaviour now has one home. In the old model, the effective behaviour of an agent was the sum of its stored config plus whatever every call site decided to override. That is configuration drift waiting to happen, and it is very hard to audit. In the new model, the agent definition is the single source of truth. What you see in the portal is what runs. For anyone who has to answer the question what exactly was this AI configured to do when it produced that output, that is a real governance win.
Agents are versioned artefacts. create_version is not a throwaway detail. It means an agent is a first class, versioned object with a model, instructions, sampling and tools, the same way you would version any other piece of production configuration. You can promote a version, compare versions, and reason about change. That is the kind of control an enterprise actually needs before it puts an agent in front of regulated content, which in my case is exactly what insurance policy wording is.
Less code is less risk. Deleting the Threads and Runs state machine removed an entire class of failure from my Function App. No more runs stuck in progress, no more polling timeouts, no more reading messages back in the wrong order. The synchronous Responses API gives me the answer or an error, and nothing in between for me to babysit.
A short migration checklist
- Move to
azure-ai-projects2.x and swapget_azure_openai_client()forget_openai_client(). - Replace the thread, run and poll loop with a single
responses.create()call. - Reference agents through
agent_referencerather than passing a behaviour payload. - Remove
instructions,temperatureandtop_pfrom any request that names an agent. - Put per-call instruction differences into the input.
- Put model, sampling and tools onto a versioned agent definition through
create_version. - Pin the SDK with compatible release operators so the next major is a decision, not an incident.
Closing thought
The fix that cleared my 400 was a few lines. The reason I am glad it happened is that the error forced me to stop treating a Foundry agent like a bag of request parameters and start treating it like a governed, versioned artefact. That is the direction the platform is going, and it is the right direction. A stricter contract at the edge bought me a cleaner architecture in the middle.
If you are working through a similar migration and want to go deeper on the agent versioning strategy, the cold start sync pattern, or how I keep sampling deterministic across a fleet of extraction agents, leave a comment. I am happy to get into the detail.