D365 F&O performance testing: the playbook programme directors should be running, not skipping
If you cannot tell me, today, what 200 of your users will actually be doing on day one of go-live, and whether the system holds up while they do it, your programme is exposed.
I have sat on too many recovery calls where the same line comes out: "We were going to run a performance test. We never got around to it." Then month-end arrives, or peak season, and what looked like a clean go-live turns into a board-level escalation and an SI relationship that may not survive the post-mortem.
Microsoft's own FastTrack team is blunt about this in the Dynamics 365 Implementation Guide: performance is one of the most common reasons D365 F&O projects escalate. The patterns are well known. The tooling is well documented. Most programmes still treat performance testing as an optional checkbox somewhere near UAT, and most programmes pay for it later.
This is the playbook I give to programme directors before they sign off on a cutover plan. It is grounded in the Microsoft Implementation Guide (Chapter 17, "A performing solution, beyond infrastructure") and the FastTrack TechTalk on Performance Testing Setup and Execution, plus what I have actually watched work on D365 F&O rollouts.
Most teams use these terms interchangeably. They are not the same thing, and the confusion is half the problem.
Here is the current Microsoft picture, and the one your test plan should use too:
Microsoft's tool matrix for D365 F&O testing. RSAT and PerfSDK do different jobs - confusing them is where most performance plans go wrong.
Here is the current Microsoft picture, and the one your test plan should use too:
- RSAT is for regression testing in a single-user context. It tells you the build still works. It tells you nothing about how the system behaves under load.
- Manual testing is for individual UI scenarios where you time a single user's click path against legacy.
- Apache JMeter is now Microsoft's recommended tool for concurrent UI and integration load testing in D365 F&O. It is third-party (Apache, not Microsoft-supported), but Microsoft publishes a four-part blog series and a dedicated TechTalk on running it against finance and operations apps.
- StresStimulus is a credible third-party alternative for teams that want a more turnkey UI than raw JMeter.
- DMF and Data Task Automation handle file-based integration testing.
- PerfSDK is no longer supported. As of September 2024, Microsoft Learn confirms the Performance SDK is deprecated, following the end of load testing features in Visual Studio versions after 2019. If you are reading older guidance that recommends PerfSDK as the default for multi-user testing, that guidance is out of date.
If your test plan says "we will do performance testing in RSAT", that is regression testing wearing a different hat. The two serve different purposes. You need both. You need to budget for both separately.
2. How do you set acceptance criteria for D365 F&O performance testing?
Performance testing without acceptance criteria is theatre. You need it agreed in writing with the business before anyone touches a tool:
- Which scenarios matter. PO create and confirm, sales order create and post, invoice post, AP payment proposal, period close, financial reporting, and the integrations that fire on every transaction.
- Normal and peak transaction volumes for each scenario.
- Number of concurrent users, by persona.
- Target response times. Ideal and maximum acceptable, separately.
- Day-in-the-life workload mix at peak hour.
- Which batch jobs and integrations run concurrently.
A real, testable target looks like this: "The system must support 15 concurrent buyers creating 150 purchase orders in 30 minutes, each PO posting in under 4 seconds, with the EDI sales order import batch running in parallel." That is the kind of statement Microsoft uses in their own demo scenarios. It can pass. It can fail. The business can sign it off.
"The system should be fast" cannot be tested. Do not let that into your scope.
3. Should you run manual performance tests before load testing?
Yes. Before anyone touches JMeter or any other load testing tool, walk a tester through your top ten scenarios manually. Time each step. Compare against legacy where you can.
This is the lowest-effort, highest-value performance work in the entire programme, and most teams skip it.
It surfaces the obvious bottlenecks early. A 14-second click path for one user becomes unusable for fifty, and you do not need a load testing tool to find that out. You also build the case for architectural fixes while they are still cheap. A performance bug spotted in week 8 is a code review. The same bug found in week 38 is a recovery exercise with the partner ecosystem watching.
The other thing manual testing buys you is a baseline. You cannot claim "the load test showed degradation" if you do not know what good looks like for one user.
Use the tools that are already in the box. For existing tenants, that includes the Performance Timer in LCS, the Query Store, and Trace parser. For new D365 F&O projects routed through Power Platform Admin Center since February 2026, the same telemetry flows through Application Insights and the Monitoring and telemetry experiences that are replacing LCS monitoring. Most teams have never opened any of them. They are free, they are first-party, and they tell you exactly where the time is going.
4. Which load testing tool should you use for D365 F&O?
For programmes above ~50 concurrent users, this is the decision that drives setup cost and learning curve. Here is the reality.
PerfSDK was the recommended tool for years. It is now deprecated. Microsoft's guidance moved to Apache JMeter for D365 F&O load testing, and Microsoft has published a four-part blog series and a TechTalk walking through how to run it. JMeter is third-party (Apache, not Microsoft-supported), so the tooling is free but the responsibility for setup, scripts, and result analysis sits with you. Some teams prefer StresStimulus for a more guided UI experience. Both work. Both need someone who has done it before.
The principles below apply whichever tool you pick. They are not tool-specific. They are what separates a meaningful load test from a slide deck full of charts:
- A test mix that reflects what users actually do, not a flat distribution. The accounts payable team is not creating sales orders. Model the behaviour by persona.
- A network mix (LAN, WLAN, metered, mobile) so the test reflects how users will really connect. Performance testing from the corporate office does not predict performance for the warehouse pickers.
- A step load pattern to ramp users up gradually rather than slam the system. The breaking point matters less than the curve before it.
- Duration-based testing, not iteration-based. Run for an hour at peak load, not 200 iterations as fast as the agent can fire them. Real users do not work that way.
- Realistic think time between actions. Default settings on most tools assume robotic clicking. Real users pause to read, verify, and switch context, and that changes the load profile dramatically.
Skip any of these and your report is wallpaper. The numbers will not predict production. The tool you used to generate them does not change that.
5. What environment, data and security setup is needed for D365 F&O performance testing?
This is where most "performance tests" quietly fail and nobody notices.
Use a dedicated, isolated, production-like environment for the run. Tier 4 or Tier 5 sandbox at minimum, in the same Azure region as your eventual production deployment. A cloud-hosted dev box is not a performance testing environment, no matter what the timeline says.
Use migrated production-volume data, or a representative dataset that matches it. Testing AP performance against 50 vendors when production will have 50,000 is worthless, and slightly insulting to the people you will be asking to sign off the results.
Run the tests under real security roles, not System Administrator. SysAdmin bypasses most of the security overhead that real users hit, and that overhead is one of the most common causes of unexpected slowdown after go-live.
Every extension, integration, and ISV solution needs to be deployed and active for the run. A clean baseline test does not predict your real performance, because your real solution is not clean baseline.
6. How many performance test iterations should a D365 F&O programme run?
At least two. Expect to run the test once to find the issues, once to verify the fixes. Three or four iterations is normal on programmes that take it seriously.
The teams that do this well automate the loop through Azure DevOps release pipelines. One button. Overnight runs. Trend reports comparing this week's numbers against last week's. That is the bar to aim for. Anything less and your team will skip iterations under deadline pressure, which is the same as not testing at all.
7. How should you report performance test results to steering?
A screenshot of load test results is not a report. It is evidence. The report sits on top.
What works for steering: a one-page summary per scenario showing target response time, observed response time, percentile distribution (p50, p95, p99), pass or fail, and any blocking defects with owners. The detailed traces, query plans, and trace parser output sit underneath for the engineers. Programme directors and CFOs need the verdict and the risk profile, not the diagnostics.
If you cannot put your performance results on one slide per scenario, you have not finished the analysis.
The honest bit
Performance testing is the test that protects every other piece of work in the programme. Architecture, build, data migration, training, change management. None of it matters if the system grinds to a halt the first Monday after go-live and the CFO is on the phone before lunch.
It also has the worst return on neglect of any test type in the programme. A performance issue caught in week 8 is a redesign and a code change. The same issue uncovered in week 38 is a stop-the-clock recovery exercise. The same issue arriving in week 1 of hypercare is an existential threat to the cutover, and to the SI.
If your D365 F&O programme plan does not have a named performance testing workstream, a dedicated environment in the right Azure region, a tooling decision per scenario type, criteria signed off by the business, and at least two iterations baked into the schedule before UAT closes, that is not a finalised plan. That is a hopeful one.
Hope is not a control.
Frequently asked questions
Is PerfSDK still supported in 2026?
No. Microsoft Learn confirms the Performance SDK has been unsupported since September 2024, following the end of load testing features in Visual Studio versions after 2019. Apache JMeter is now Microsoft's recommended replacement.
What does Microsoft now recommend for D365 F&O load testing?
Apache JMeter. Microsoft has published a four-part blog series and a dedicated TechTalk on running JMeter against finance and operations apps. JMeter is third-party and not Microsoft-supported, but Microsoft documents the integration pattern.
Can you do performance testing in a Tier 2 sandbox?
No. Tier 2 sandboxes do not have the resources to produce meaningful load test results. Microsoft recommends Tier 4 or Tier 5 sandbox environments, in the same Azure region as the planned production deployment, for performance testing.
How many iterations of performance testing should we plan?
At minimum, two. One run to find the issues, one to verify the fixes. Three to four iterations is normal on programmes that take performance seriously, and automation through Azure DevOps release pipelines makes that practical.
What is a realistic D365 F&O performance test scenario?
"15 concurrent buyers creating 150 purchase orders in 30 minutes, each PO posting in under 4 seconds, with the EDI sales order import batch running in parallel." That is the kind of statement that can pass, can fail, and that the business can sign off. Compare to "the system should be fast", which cannot.
If your D365 F&O programme is approaching cutover and performance testing is still a "we will get to it" line item, message me. An hour of independent review now buys you a quieter cutover and a CFO who still trusts the team in week one.