My first week at SSA Marine, I had a clear technical direction. The procurement team needed an AI agent that could answer common policy questions for stakeholders across the company, and when it couldn't, draft an escalation email to the procurement team on the stakeholder's behalf so the question got routed cleanly instead of disappearing into someone's inbox. The company was Microsoft-native. The documents lived in SharePoint. Microsoft Copilot Studio gave me Teams integration and SSO essentially for free.
The path was obvious. I prototyped in Copilot Studio and the RAG functionality worked beautifully. SharePoint documents indexed in minutes, and the answers were grounded and useful. By the end of week two, I was confident the project would ship on time.
Then I tried to build the email approval workflow.
The wall
The requirement was simple to describe and hard to implement. When the RAG layer couldn't ground an answer to a stakeholder's question, the agent had to draft an escalation email to the procurement team on the stakeholder's behalf, but it could never dispatch one autonomously. The stakeholder had to approve, edit, or revise the draft before anything actually went to procurement. This was non-negotiable for governance reasons.
Copilot Studio could not do this cleanly. The conversational state required to support multi-turn revisions (RAG comes up empty, the agent drafts an escalation email, the user edits it, the agent regenerates with the edits, the user approves, the agent dispatches) was beyond what the platform's branching logic could express. I spent two days trying to make it work. The closest I got involved encoding state in the conversation history and parsing it on every turn, which was fragile and would have failed silently in production.
This was the moment I had to decide.
The decision
I had two options. The first was to descope the email approval workflow to something simpler (a one-shot draft with no revision) and ship on the original timeline. The second was to abandon Copilot Studio entirely and build a custom agent with LangGraph and FastAPI, which would mean rebuilding the RAG pipeline, configuring SSO manually, and shipping a custom Teams tab application instead of a native one.
I chose the second option, knowing it would cost me at least two weeks of work I'd already done.
The lesson I took from this isn't about Copilot Studio specifically. It's about prototyping order: always prototype the hardest workflow first, not the easiest one.
I had inverted that. RAG was the visible feature, the one stakeholders would see in the demo, so I built it first. The HITL workflow was the structurally harder requirement, but I'd treated it as a follow-on. By the time I discovered the platform couldn't support it, I'd invested two weeks in the wrong foundation.
What I built instead
The custom architecture was a coordinator agentic pattern in LangGraph. A top-level coordinator agent routed each question to one of two specialized sub-agents: a RAG sub-agent for policy Q&A grounded in Azure AI Search, and an email sub-agent that took over when the RAG layer came up empty, drafting an escalation email and managing the multi-turn approval state. FastAPI served as the backend. Azure Logic Apps handled the actual email dispatch, gated strictly by the agent's tool calls. The Teams tab application was a React/TypeScript frontend, which gave me an admin dashboard with health checks and node traces I never could have built in Copilot Studio.
It took two extra weeks to get back to where I'd been. The HITL workflow worked correctly. The system shipped on time, with one week of buffer.
What I'd do differently
The decision was right. The order was wrong. If I were building this again, I would have spent the first two days of the project specifically on the HITL flow, not the RAG pipeline, not the indexing strategy, not the agent prompts. The hardest workflow first.
The other thing I'd change: I should have written down the architectural assumption I was making. Copilot Studio can handle multi-turn approval flows was an assumption I'd inherited from the platform's marketing materials, and I never tested it before committing. Writing assumptions down has a way of making them visible enough to interrogate.
This is, broadly, what I think about when I'm building production AI systems now. The capability layer is impressive and improves quickly. The constraints (governance, audit, approval, validation, the parts that look unglamorous) are usually where the actual difficulty lives. Prototype them first.