Before You Let AI Run Your AP, Let It Audit Your AP

Before You Let AI Run Your AP, Let It Audit Your AP

Before You Let AI Run Your AP, Let It Audit Your AP

The most useful first AI deployment in D365 Finance isn't autonomous. It's diagnostic.

A Finance Manager asked me recently: "Can we build an AI agent that flags anomalies in our expenses?"

Yes. And it's a sharper question than most of what the market is asking right now.

Most finance leaders are asking the wrong AI question. They're asking how to deploy Copilot. The sharper question is this: can we build an agent that flags anomalies in our expenses? Our vendors? Our journals? That's where AI in finance actually earns its keep first.

Here's the part most people don't know. Microsoft has already published the architecture.

The reference architecture nobody's read

In the Power Platform Well-Architected library, Microsoft has published a solution idea called Build anomaly detection with Copilot Studio and Fabric. It describes a working architecture for a spend anomaly agent that does three things.

It identifies duplicate invoices (same vendor, amount, date proximity).

It flags vendor activity anomalies (volume spikes, unusual category spread, new-vendor risk).

It catches policy threshold violations.

The architecture moves ERP transaction data into a Microsoft Fabric Lakehouse, applies KQL detection logic in a Fabric Eventhouse, scores each detection 0 to 1 for severity, posts alerts into a finance team's Microsoft Teams channel, and lets finance users investigate in natural language through a Copilot Studio agent. The full architecture is published here: Build anomaly detection with Copilot Studio and Fabric.

If you've never seen this page, you're not alone. Most finance leaders I talk to haven't. Most CIOs haven't either.

The gap between Microsoft having published the architecture and most finance teams not knowing it exists is exactly the opportunity worth talking about.

Use what's already in your D365 environment

Before you build anything, know what you already have.

Dynamics 365 Finance has Audit Policies. They've been in the product for years. They support a "Duplicate" query type that catches duplicate expense lines, duplicate vendor invoices, and duplicate purchase orders, run in batch mode against a defined date range. They're rule-based, deterministic, and they work. (Audit policy violations and cases.)

If you're not already running these, that's your starting point. Don't build an agent to do what F&O ships with out of the box. Configure the audit policies, run them on the right cadence, route the cases to the right reviewers, close the loop.

That gets you rule-based duplicate detection. What it doesn't get you is the part where AI changes the game: pattern-based detection of things the rules can't see.

Where the AI layer adds value

Rule-based audit policies catch what you knew to look for. Pattern-based detection catches what you didn't.

Examples

  1. An employee submitting expenses for a vendor your company has never paid before.
  2. An approver signing off on their direct report's expenses.
  3. A category of spend that drifted up 40% in one cost centre without anyone noticing.
  4. A vendor whose invoice volume tripled in a quarter.
  5. A journal entry posted at 11pm on the last day of the month by a user who's never posted that type of entry before.

These are the patterns AI is good at. The audit policy rule engine isn't.

Required setup

Microsoft's reference architecture lays out the technical components for this layer. Microsoft Fabric provides the analytical backbone, with a Lakehouse for ERP transaction history and vendor records, and an Eventhouse running KQL-based detection logic in real time. Copilot Studio agents provide the conversational interface, including a primary "Spend Anomaly" agent that finance users interact with through Microsoft Teams or Microsoft 365 Copilot, and a Fabric Data Agent connected to the analytics layer underneath. Severity scoring between 0 and 1 lets the team prioritise. Human-in-the-loop review sits before any action gets taken against a vendor. Agent flows and connectors handle resolution workflows: escalating, holding payments, requesting documentation, updating vendor risk ratings.

The architecture works. It's documented. A capable partner could stand it up in a few weeks if the environment is ready for it.

That's the catch. Most environments aren't.

What the architecture diagram doesn't tell you

This is where the advisory work actually sits. Microsoft's published architecture assumes a set of things about your environment that, in practice, often aren't true.

It assumes your vendor master is clean enough that fuzzy matching produces signal rather than noise. If you have the same supplier in F&O as "Acme Ltd", "Acme Limited", and "ACME LTD" with three different bank accounts, duplicate detection will flag every transaction across all three. The agent will work as designed. The output will be useless.

It assumes your security model can support an agent that reads transaction data across legal entities and cost centres without breaking your segregation of duties posture. The Dynamics 365 ERP MCP server (production-ready preview) makes it easier than ever to give an agent broad data access. That same accessibility is where a careless deployment causes most damage.

It assumes your finance team has the operational pattern to deal with the alerts. An agent that posts twenty Teams notifications a day about possible duplicate vendors, with no triage process behind it, gets muted in a week. The architecture isn't the deployment.

It assumes you've decided what the agent is allowed to do versus what it's allowed to flag. Auto-holding a payment against a vendor based on an anomaly score sounds powerful until the vendor is your largest critical supplier and the anomaly is a false positive on the 30th of the month.

It assumes you've thought about false-positive rates and feedback loops. Microsoft's own Responsible AI guidance for this scenario calls out that anomaly detection "isn't always deterministic, and false allegations can significantly impact a vendor." Translation: build the human-in-the-loop process before you turn the agent loose.

None of these are reasons not to build. They're the reasons most builds fail to land.

Why diagnostic is the right first AI deployment

Most of the autonomous agent narrative right now is pointed at execution: autonomous AP, autonomous procurement, autonomous reconciliation. Microsoft is investing in this direction and they're right to. The Finance Agents available through Copilot Studio for Financial Reconciliation and Variance Analysis are real, and they're going to keep getting better.

But the right first deployment for most D365 finance environments isn't an agent that acts. It's an agent that observes.

Diagnostic agents have four properties that make them the right place to start.

They're low operational risk. The agent surfaces patterns. Humans decide. Nothing autonomous touches the GL on day one.

They're high learning value. After a quarter of running diagnostic agents, you know things about your data quality, your control gaps, and your operational pattern that you couldn't have learned any other way. That intelligence is what makes the next AI deployment safe.

They produce business cases. Once you've quantified what diagnostic agents are catching, in real numbers, the ROI conversation for the next stage of automation gets easier.

They build trust. Finance teams that have seen AI catch real problems for them are easier to bring along on the autonomous deployment that comes later. Finance teams whose first AI experience was a Copilot demo that hallucinated a journal entry are not.

What I'd tell the Finance Manager who asked

Yes, you can build an agent that flags anomalies in your expenses. Microsoft has published the architecture. The components are real, current, and mostly at production-ready preview or better.

The work that determines whether the deployment succeeds isn't in the architecture diagram. It's in the housekeeping: vendor master cleanup, the security model, the operational pattern for handling alerts, the false-positive feedback loop, and the deliberate decision about what the agent flags versus what it acts on.

Get those right and the autonomous agents Microsoft is building become safe to run later. Skip them and you've automated your control weaknesses at machine speed.

The autonomous future is real. The on-ramp to it is diagnostic.