TL;DR: Microsoft 365 Copilot is a user productivity license. Copilot Studio is an orchestration and channel layer. Azure AI Foundry is where custom AI workloads are built and run. Each serves a different purpose, and understanding where one ends and another begins will save you significant cost and rework down the line. There are also some licensing dependencies and add-ons worth knowing about before you commit to an architecture.
Why this tends to cause confusion
Microsoft’s AI portfolio is broad, and the naming convention does not make it easy. When everything carries the “Copilot” label, it is reasonable to assume the products overlap more than they do. In practice, they address quite different parts of the solution stack.
The three services worth understanding independently are:
- Microsoft 365 Copilot – a per-user SaaS license that layers AI capabilities across your existing M365 estate.
- Copilot Studio – a low-code agent builder billed via Copilot Credits, used for orchestration and channel delivery via Teams or web.
- Azure AI Foundry + Azure OpenAI + Azure AI Search — consumption-billed Azure services that power custom AI workloads.
These are not interchangeable, and they are not a menu you pick from. They sit at different layers of a solution, and the decisions you make about where to place your workload have a direct impact on cost, scalability, and governance.
What Microsoft 365 Copilot includes
The M365 Copilot license (£23.10/user/month at current pricing) provides AI-assisted experiences across the Microsoft 365 suite:
- Teams: Meeting summaries, live transcription, recap of missed meetings, and in-context message drafting.
- Outlook: Email summarisation, thread catch-up, and draft reply suggestions.
- Word, Excel, PowerPoint: Document generation, natural language data analysis, and presentation drafting.
- Microsoft 365 Chat (formerly Business Chat): A grounded conversational interface that reasons across your emails, calendar, Teams messages, and SharePoint content via Microsoft Graph.
Microsoft 365 Chat is the most relevant capability for document assistant scenarios. It can answer questions grounded in SharePoint-resident content using Graph-based retrieval, with security trimming inherited from your existing permissions model. There is no custom ingestion pipeline to build and no separate index to maintain.
It is also worth noting that users with an M365 Copilot license receive a substantial allocation of Copilot Studio interactions as part of that license. Microsoft’s guidance positions many agent interactions as included for licensed users, which has a meaningful impact on the cost model for any Copilot Studio deployment where the user base is already licensed.
What the M365 Copilot license does not cover:
- Custom document ingestion for user-uploaded content
- Fine-grained control over retrieval behaviour
- Azure-side telemetry and audit logging
- Complex multi-step agentic pipelines with external system connectors beyond what Studio natively supports
For SharePoint-resident, permission-governed content with knowledge workers who will benefit from the broader M365 AI experience, the license makes good commercial sense. For workloads that require more control or custom document handling, the conversation moves into Azure.
The underlying license dependency
One detail that is easy to overlook: the M365 Copilot license requires an underlying Microsoft 365 E3 or E5 license per user. If your organisation is not already licensed at that tier across the relevant user population, the true per-user cost is higher than the headline £23.10/month figure suggests. For operational or frontline worker populations who may be on F1 or F3 licenses, this dependency can significantly change the commercial picture and is worth confirming early in the planning process.
Separately, Copilot Studio includes a baseline tenant-level credit allocation with certain M365 plans. The size of that allocation varies depending on your agreement, and it is worth checking what is already included before purchasing additional messaging capacity packs. In some cases, low-volume pilots can run within the included allocation, which can make the initial business case look more favourable than the production deployment will be.
Copilot Studio messaging capacity: a cost to plan for
Copilot Studio is billed using Copilot Credits. For users without an M365 Copilot licence, every interaction draws from a credit pool that is either provisioned via a tenant-level capacity pack or billed on consumption.
| Interaction type | Credit cost |
|---|---|
| Simple (rule-based) answer | ~1 credit |
| Generative answer | ~2 credits |
| Agent action | ~5 credits |
For agents that call Azure OpenAI, query SharePoint, or invoke Power Automate flows, agent action pricing applies to a significant portion of interactions.
The scenario that catches teams out: a Copilot Studio agent is built and piloted with a small group of M365 Copilot licensed users, where consumption is largely covered. The agent is then rolled out to a broader population of unlicensed operational or frontline workers, and monthly credit consumption increases significantly because every meaningful interaction is consuming 5 credits. Microsoft does offer Copilot Studio messaging capacity add-ons to address this, but they carry real cost at scale and are frequently missing from initial business case estimates.
The key takeaway here is to model costs based on your full production user population rather than your pilot cohort. The licensing profile of those two groups is often quite different, and the cost difference can be substantial.
Environment strategy in Copilot Studio
Another operational consideration that tends to be underestimated is environment management. Running a production-grade Copilot Studio agent properly means maintaining separate DEV, UAT, and PRD environments, with solutions, environment variables, and connection references managed consistently across them. This is non-trivial in Studio and requires deliberate planning. Teams that treat Studio as a simple drag-and-drop tool in the early stages often find themselves with unmanageable deployments by the time they reach production scale. If you are building something intended to last, it is worth applying the same environment hygiene you would expect from any other enterprise software deployment.
A practical scenario: document Q&A for operational staff
Consider 500 operational staff who regularly need to query internal documents — policies, procedures, guidance notes — and want a conversational interface to do so rather than reading through documents manually.
The architecture diverges depending on one key question: how do those documents reach the system?
Pattern A – SharePoint-resident documents
Documents already live in SharePoint. They are governed, permissioned, and indexed via Microsoft Graph. Security trimming is inherited from your existing model. There is no custom ingestion pipeline to build. This is exactly the pattern M365 Copilot and Graph-grounded retrieval are designed for, and if users are already licensed, the Copilot Studio interaction costs are largely covered.
Pattern B – User-uploaded documents
Users upload documents directly into the assistant — PDFs, Word files, content that may not exist in SharePoint. Those documents need chunking, indexing, and retrieval at query time, often on a session basis. This is an Azure workload. It requires Azure AI Search for indexing and hybrid retrieval, Azure AI Foundry and Azure OpenAI for reasoning over retrieved chunks, and Logic Apps or Azure Functions to orchestrate the ingestion pipeline. In this pattern, Copilot Studio sits in front as a thin channel layer rather than doing the heavy lifting.
The end user experience can look almost identical in both patterns. The underlying cost model and build complexity are quite different.
The document ingestion pipeline
For Pattern B, the ingestion pipeline deserves more attention than it typically gets during initial design. The quality of your retrieval results is directly determined by decisions made at ingestion time, and those decisions are difficult to change once an index is in production.
The key choices to work through:
- Chunk size and overlap. Smaller chunks (300-500 tokens) with moderate overlap (10-15%) tend to work well for policy and procedural documents where precise answers to specific questions are expected. Larger chunks reduce retrieval precision but can improve coherence for summarisation workloads. There is no universal right answer — it depends on your document structure and query patterns.
- Metadata enrichment. Storing metadata alongside chunks — document title, section heading, creation date, source URL, document category — significantly improves the ability to filter and rank results. A chunk retrieved without context is much harder to reason over than one that carries structured metadata the model can reference.
- Hybrid retrieval. Azure AI Search supports combining vector search (semantic similarity) with keyword search (BM25) in a single query. For enterprise documents, this combination consistently outperforms either approach in isolation. Vector search handles paraphrased or conceptual queries well; keyword search handles precise terminology, product codes, and proper nouns. Using both gives you better coverage across query types.
- Semantic ranker. Azure AI Search includes a semantic ranker capability that re-scores retrieved results using a language model to improve relevance ordering. It is worth knowing that the semantic ranker is an additional cost on top of the base Search tier — it is not included in the S1 pricing and needs to be budgeted separately. At the query volumes in this example it is not prohibitively expensive, but it is a line item that regularly gets missed in initial cost estimates.
Getting the ingestion pipeline right upfront is considerably less effort than rebuilding an index after users have started reporting poor retrieval quality in production.
A worked example: 500 users, 10 queries/day
Assumptions: 500 users, 10 queries per day, 22 working days/month, approximately 110,000 queries/month. Split of 70% generative answers and 30% agent actions.
Copilot Studio – unlicensed users
// Credit consumption breakdown
generative answers = 110,000 x 0.7 x 2 credits = 154,000 credits
agent actions = 110,000 x 0.3 x 5 credits = 165,000 credits
total = ~319,000 credits/month
// At ~£0.034/credit
monthly cost = ~£10,850/month
// Note: Azure OpenAI inference costs are additional if wired via custom connector
Microsoft 365 Copilot – licensed users
per user license = £23.10/month
user count = 500
monthly cost = £11,550/month
// Requires underlying M365 E3 or E5 license per user
// Studio interactions largely covered for licensed users
At this scale, the unlicensed Studio consumption cost and the M365 Copilot licence cost are within about £700/month of each other. The difference is that the M365 licence also gives those users Copilot across Teams, Outlook, and the full Office suite. For knowledge workers who would genuinely benefit from those capabilities, the document assistant becomes commercially neutral within the licence, and the question shifts to whether a full M365 Copilot rollout makes sense for the user group.
It is worth noting that the M365 Copilot licence does not address the upload-heavy pattern. If users need to query documents that are not in SharePoint, Azure is still required regardless of licensing.
Azure AI Foundry + AI Search – direct model
// GPT-4o-mini RAG query: ~2K input tokens, ~500 output tokens
cost per query = ~£0.0006
monthly queries = 110,000
inference cost = ~£66/month
// Azure AI Search S1 tier (base index and query costs)
search cost = ~£250/month
// Semantic ranker (additional, not included in S1 base pricing)
semantic ranker = ~£30-50/month at this query volume
// Logic Apps / Function App orchestration
orchestration cost = ~£20-40/month
total azure = ~£366-406/month
// Approximately 27-30x lower than unlicensed Studio consumption at this volume
The Azure model is significantly cheaper on a per-query basis at this volume, and the gap increases as usage grows. The trade-off is upfront build investment, which is covered in the next section.
Reasons to consider Azure AI Foundry for the intelligence layer
Even where users are fully M365 Copilot licensed, there are good reasons to consider moving the intelligence layer into Azure AI Foundry rather than relying on Copilot Studio’s native generative capabilities.
- Retrieval control. Azure AI Search supports hybrid semantic and keyword retrieval with configurable scoring profiles, chunk size tuning, and re-ranking. Copilot Studio’s generative answers offer limited visibility into what was retrieved, how it was ranked, and how confident the result is.
- Model selection and versioning. Azure AI Foundry lets you choose the model, pin to a specific version, and manage upgrades on your own schedule. With Studio’s native generative answers, the underlying model can change without notice.
- Observability. Queries and responses can be logged to a Log Analytics workspace, giving you token consumption metrics, latency data, retrieval scores, and full Azure Monitor integration. In regulated environments, this level of audit capability is a baseline requirement rather than a nice-to-have.
- Prompt engineering. In Foundry, you own the system prompt, grounding instructions, and safety layers. Copilot Studio provides limited control over how generative answers are constructed.
- Cost at scale. As shown in the worked example, Azure consumption pricing for GPT-4o-mini RAG workloads is considerably cheaper per query than Copilot Credits at volume.
- Platform direction. Microsoft is actively building out the Azure AI Foundry layer — Foundry Agent Service, richer SDK support, deeper Bicep/IaC integration. Organisations building on Azure now are reasonably well positioned for where the platform is heading, and are less dependent on the roadmap decisions Microsoft makes for the Copilot Studio product specifically.
Why GPT-4o-mini for RAG workloads
The worked example uses GPT-4o-mini rather than GPT-4o, and it is worth explaining that choice. For retrieval-augmented generation over well-structured internal documents – policies, procedures, guidance notes – the task is primarily one of reading, extracting, and summarising relevant content from retrieved chunks. GPT-4o-mini handles this well and does so at a fraction of the cost of GPT-4o. The reasoning capability advantage of GPT-4o becomes more relevant for complex multi-step tasks, ambiguous queries, or workloads requiring nuanced judgment. For a straightforward document Q&A scenario with well-indexed content and a clean retrieval pipeline, GPT-4o-mini is the practical choice. If your query patterns become more complex over time, switching models within Foundry is a configuration change rather than an architectural one.
Model deprecation and version pinning
One aspect of operating on Azure OpenAI that deserves explicit attention is model deprecation. Azure OpenAI model versions are retired on a published schedule, and when a version reaches end of life, deployments using it will stop functioning. In regulated environments where change control processes can take weeks or months, an unexpected model retirement can create a significant operational risk.
The mitigation is straightforward: pin your deployments to a specific model version rather than a floating alias, monitor the retirement schedule for your pinned version, and build model upgrades into your change management process well in advance. Azure AI Foundry makes this manageable, but it requires treating model versions with the same lifecycle discipline you would apply to any other infrastructure dependency.
A recommended architecture
In this pattern, Copilot Studio handles user input, calls an APIM-fronted action, and renders the response. The intelligence sits entirely in Azure. Keeping Copilot Studio as a thin orchestration layer — rather than the place where retrieval and reasoning happen – preserves observability, keeps costs predictable, and makes the solution significantly easier to govern.
The honest trade-off is build time. A Foundry-backed solution requires more upfront investment: ingestion pipelines, index schema design, prompt engineering, APIM policy configuration. If the goal is a fast deployment over SharePoint content for a fully licensed user base, Copilot Studio with Graph-grounded retrieval is a reasonable and pragmatic starting point. If the workload is high-volume, compliance-sensitive, and intended to run in production over multiple years, the Azure architecture is the more durable investment.
Governance as a design consideration
In regulated environments, governance requirements tend to push the architecture toward Azure:
- Data residency: Azure UK South keeps data within the required boundary.
- Audit logging: A Log Analytics workspace captures query and response pairs with token-level metadata.
- Security trimming: For user-uploaded documents, the permission model is fully under your control.
- HITL escalation: Human-in-the-loop escalation paths are easier to build as a first-class part of the orchestration layer than to retrofit later.
- Connector control: APIM policy and managed identity give you precise control over what Copilot Studio can reach.
Governance can be applied to a Studio-heavy architecture, but it tends to require more effort and produces less clean audit trails. An Azure-backed design makes governance a natural output of the architecture rather than something applied on top of it.
Things worth considering during planning
- Check the underlying license dependency. M365 Copilot requires E3 or E5 per user. If your target population is on F1 or F3 licenses, the true cost is higher than the Copilot add-on price alone suggests.
- Understand what is already included in your tenant. Some M365 plans include a baseline Copilot Studio credit allocation. Check what you have before purchasing add-ons, particularly for pilot phases.
- M365 Copilot does not eliminate Azure costs. The license covers Studio interaction costs for licensed users, but if you are calling Azure OpenAI directly from Studio via a custom connector, that inference still bills as Azure consumption.
- Pilot cost models often do not reflect production. A pilot running on 20 licensed users will look very different from a production rollout to 500 unlicensed operational staff. It is worth modelling the production population explicitly during the planning phase.
- Retrieval and reasoning logic belongs in Azure. Copilot Studio is not designed for hybrid search, document chunking, or cross-corpus aggregation. Placing that logic in Studio creates maintainability and cost problems down the line.
- The messaging capacity add-on is a real cost. If your deployment includes users without M365 Copilot licences, the add-on needs to be in the budget from the start.
- The semantic ranker is a separate line item. If you are budgeting for Azure AI Search, remember that the semantic ranker is not included in the base S1 tier pricing and needs to be accounted for separately.
- Plan for model deprecation. Pin your Azure OpenAI deployments to specific model versions and monitor the retirement schedule. Build model upgrades into your change management process well before end-of-life dates.
- Treat Copilot Studio environments like any other enterprise software. DEV, UAT, and PRD environment separation with proper solution management is non-trivial but necessary for production-grade deployments.
Summary
| Scenario | Recommended approach |
|---|---|
| SharePoint content, licensed knowledge workers | M365 Copilot + Studio (Graph-grounded) |
| Upload-heavy or dynamic documents | Azure AI Foundry + AI Search |
| Regulated environment, audit required | Azure backbone regardless of licensing |
| Unlicensed operational users at volume | Azure direct, or evaluate full M365 Copilot licensing |
| Model versioning or prompt control required | Azure AI Foundry |
| Long-term production workload | Azure AI Foundry with version-pinned deployments |
The model itself is rarely the most expensive component of an AI solution. The commercial and architectural decisions that sit around it usually are. Getting clarity on your user licensing profile, your document access patterns, and your operational requirements early on tends to produce better architecture and more predictable costs.
