Azure AI · Axis 3 — Build tooling & agents

Managed agents vs. rolling your own

On raw Anthropic/OpenAI APIs you assemble the agent stack yourself. Here's exactly what Foundry hands you pre-built — and the surprise: it runs your SDK code too.

← Back to the decision map (axis overview)

In the orientation map, this axis was summarised as one trade: control vs. time-to-production. That's true, but it hides the most important fact — so let's correct it before anything else.

The reframe: Foundry's Agent Service is not a rival framework

It's tempting to picture "Foundry agents" as Microsoft's competitor to the OpenAI Agents SDK and the Anthropic (Claude) Agent SDK — a third framework you'd have to learn and get locked into. That's wrong. Foundry Agent Service is a managed runtime that explicitly hosts agents written with Agent Framework, LangGraph, the OpenAI Agents SDK, the Anthropic Agent SDK, the GitHub Copilot SDK, or your own code.1

So the question isn't "their framework or mine." It's "how much of the surrounding plumbing do I want to operate myself?" — and Foundry lets you pick a point on that spectrum without rewriting your agent logic.

First: what "rolling your own" actually means

When you build an agent on the raw Anthropic messages or OpenAI responses API, the model call is the easy 10%. These are the layers you end up owning — each is a real component to build, secure, scale, and keep alive:

Orchestration loopThe reason-act-observe cycle, tool-call dispatch, multi-step planning, retries. DIY: LangGraph / provider Agent SDK / hand-rolled.
Tool integrationsWiring each tool (search, code exec, your APIs), plus auth for each. DIY: write + host every tool, or run MCP servers yourself.
Retrieval / RAGChunking, embeddings, a vector store, query rewriting, reranking, citations. DIY: a whole subsystem.
MemoryPersisting and recalling facts across sessions. DIY: your own store + extraction logic.
EvaluationQuality scoring, regression catches before you ship a prompt change. DIY: build an eval harness + datasets.
ObservabilityTracing every model call + tool invocation, metrics, debugging. DIY: instrument it yourself.
Identity & safetyWho can call the agent, what it can reach, prompt-injection defence. DIY: your auth + content filtering.
Hosting & scaleWhere the agent runs, autoscaling, stable endpoints, versioning. DIY: containers + infra.

Anthropic and OpenAI do hand you some of this — both ship agent SDKs, built-in tools (web search, file search, code/computer use), handoffs, guardrails, and tracing.4 But the hosting, enterprise identity, governance, evaluation pipeline, and multi-provider portability are still yours to assemble. That gap is precisely what Foundry sells.

How Foundry fills the gap — a spectrum, not a switch

Foundry exposes three levels of "how much do I hand over," and you can move up the spectrum later without rewriting your agent.1

Hand over the most

Prompt agents

Defined by config alone — instructions + model + tools, authored in the portal or via SDK/REST. Foundry runs it.

You operate: nothing. No code, no compute, no containers.
Hand over hosting

Hosted agents (preview)

Your code (OpenAI/Anthropic SDK, LangGraph…), packaged as a container or zip. Foundry runs it with a managed endpoint, autoscale, per-agent Entra identity.

You operate: your agent logic. Foundry handles the rest.
Hand over the least

Responses API only

Keep your agent running wherever it is; just call Foundry's Responses API for models + platform tools.

You operate: everything except inference + tools access.
Anchor to what you know

The Responses API is Foundry's single entry point for all agent types — it's the successor to OpenAI's Assistants API (being sunset in 2026) and unifies chat + tools.1,4 Mentally: it's where Foundry put the orchestration primitives, the way Anthropic's Messages API + Agent SDK is where Anthropic put theirs. The win is that one endpoint fronts every model in the catalog — "swap models without changing your agent code."1

The managed building blocks

Whichever level you pick, these come pre-built. Map each back to a layer in the DIY stack above:

Built-in tools — web search, file search, code interpreter, function calling, memory — attach without hosting them.2
MCP servers — connect any Model Context Protocol server (open standard; 5,000+ exist) from the Add Tools catalog, with Entra / OAuth-OBO auth.1,4
Foundry IQ — managed RAG: decomposes a query into sub-queries, runs keyword/vector/hybrid search, semantic reranking, returns a synthesized answer with source citations.3
Memory (preview) — extracts & stores long-term memory, surfaced via a memory-search tool or store APIs — no DIY store.2
Observability — end-to-end tracing of every model call and tool invocation, metrics, Application Insights integration.1
Evaluations + Agent Optimizer — run evals to catch regressions; auto-improve a hosted agent's instructions.1
Per-agent Entra identity — each agent gets a scoped identity; OAuth On-Behalf-Of passthrough; no shared credentials.1
A2A protocol (preview) — expose/consume agents as Agent-to-Agent endpoints; publish to Teams & M365 Copilot.1
Two open standards keep this from being a closed box

A fair worry for a buy decision: "does committing to Foundry's agent layer lock me in?" Two things cut against it. MCP (tools) and A2A (agent-to-agent) are open, model-agnostic standards Foundry speaks natively1 — and Toolbox exposes your curated tools through a single MCP-compatible endpoint that any runtime can consume.1 Combined with "bring your own SDK," your agent logic and tool definitions stay portable. What's Foundry-specific is the managed hosting + governance, not your code. (The genuinely sticky, Foundry-exclusive tools are the Microsoft-data ones: SharePoint, Work IQ, Fabric IQ.1)

Decision table: DIY vs. Foundry, layer by layer

"DIY" = raw Anthropic/OpenAI APIs + open-source frameworks you host and operate.
LayerRoll your own DIYFoundry Agent Service managed
Agent frameworkYour choice (you host it)Same choices — but Foundry can host them for you
OrchestrationYou write the loop / use an SDKPrompt agents need none; hosted agents reuse your SDK
RAG / groundingBuild the whole pipelineFoundry IQ — managed, cited, hybrid search
MemoryYour own store + extractionBuilt-in memory tool / APIs (preview)
ToolsHost each + run MCP serversBuilt-in catalog + managed MCP connections
Evals & tracingBuild harness + instrumentBuilt-in evals, optimizer, end-to-end tracing
Identity & safetyYour auth + content filteringPer-agent Entra identity, RBAC, XPIA guardrails
Hosting / scaleYour containers + infraManaged endpoint, autoscale, versioning, publishing
Multi-modelIntegrate each providerOne Responses endpoint over the whole catalog
Best when…Max control, niche orchestration, no Azure footprint, avoid platform feesWant speed + enterprise governance, already on Azure, multi-model, don't want to operate infra

Check yourself — 3 scenarios

Judgement, not code. Instant feedback on each.

1. Your team already has a working agent built on the Anthropic Agent SDK and likes it. A stakeholder says "but if we go Foundry we'll have to rebuild it on Microsoft's framework." The accurate response is:

Hosted agents explicitly support the Anthropic Agent SDK, OpenAI Agents SDK, LangGraph, and custom code. You hand over hosting, not your agent logic.

2. An org wants Foundry's managed RAG so answers come back grounded with citations, without building a vector pipeline. Which component does that — and via what mechanism?

Foundry IQ is the managed knowledge/grounding layer (the evolution of Azure AI Search); agents reach it over MCP, and it returns synthesized, source-referenced answers.

3. A CTO worries adopting Foundry's agent layer means total lock-in. What's the strongest counter — and the honest caveat?

Honest framing matters for a decision: the portability story (open protocols + BYO SDK) is real, but the sticky bits are the connectors into Microsoft's own data estate.
Where to go next

You've now mapped model access (0001) and build tooling & agents (this one). The two axes left to deep-dive are enterprise & compliance (the residency/identity/data-flow teeth) and the commercial model (PTU vs per-token economics with real break-even math). Say which, or throw me a scenario from your own decision and I'll pressure-test it.