01The problem with typical AI metrics

The most commonly reported AI KPIs share a characteristic: they measure activity rather than value. Number of active users tells you how many people have logged in. It tells you nothing about whether those people are doing more valuable work as a result. Licence utilisation tells you what proportion of licences are being used. It says nothing about whether those uses are generating business benefit.

User satisfaction scores tell you whether users like the tool. They do not tell you whether the tool is producing better outcomes than alternatives would. They are also systematically biased upward: people who use a tool frequently tend to report satisfaction with it regardless of its objective impact, because they have cognitively adjusted their expectations to match the reality.

A CFO who receives these metrics as evidence of AI programme success will, quite correctly, not find them compelling. The question they are answering is whether the business is better off for having made this investment. Utilisation and satisfaction metrics do not answer that question.

02The three KPI categories that CFOs respect

Business-outcome AI KPIs fall into three categories that CFOs understand and respect.

Productivity KPIs measure the output produced per unit of input (typically time) for AI-enabled versus non-AI-enabled processes. They require a baseline measurement before AI deployment, a consistent measurement methodology, and conversion of time savings to financial value using loaded cost rates. The output is a cost-equivalent value that can be set against the AI investment cost.

Quality KPIs measure the accuracy, completeness, or compliance of AI-enabled outputs compared to the baseline. Error rates in financial processing, compliance rate in customer communications, completeness scores in due diligence reports: these are quality metrics that translate directly into financial value through the cost of errors, rejections, or non-compliance.

Revenue KPIs measure the direct impact of AI on revenue-generating activities: sales cycle length, conversion rates, average deal value, customer retention rates, or new product development speed. These require more sophisticated measurement design (usually controlled comparison or A/B testing) but produce the most compelling ROI evidence when they can be attributed to AI deployment.

03Designing for measurement from the start

The most common reason AI programmes cannot produce CFO-credible KPIs is that measurement was not designed into the programme from the start. The AI was deployed, users adopted it, and six months later the question was asked: how do we demonstrate the value? By that point, the baseline data that would have enabled before-and-after comparison no longer exists, the control group that would have enabled attribution is not available, and the metrics that are being tracked are the ones that happened to be available rather than the ones that are relevant to the business case.

Designing measurement from the start means: identifying the specific business outcomes the AI is intended to improve before deployment, establishing baseline measurements of those outcomes before deployment, defining the measurement methodology that will be used to assess post-deployment performance, and identifying who is responsible for maintaining the measurement system and reporting against it.

This design work takes two to four weeks and is one of the highest-value investments an AI programme can make, because it determines whether the programme can demonstrate its value or must rely on advocacy.

04The benchmark question

Beyond absolute performance, CFOs often want to know how AI investment compares to alternatives. If the same budget had been invested in additional headcount, in process improvement, or in a different technology, would the return have been higher?

Addressing this question requires knowing the counterfactual: what would have happened without the AI investment? For most AI programmes, this is genuinely difficult to establish because no controlled experiment has been run. The best available answer is typically a combination of sector benchmarks (what ROI are comparable organisations achieving from comparable AI investments?), internal comparison (what productivity improvement would have been achievable through alternative means at the same cost?), and honest attribution (what portion of the observed improvement is attributable to the AI specifically, rather than to other factors that changed at the same time?).

A CFO who sees that an AI programme has honestly grappled with these questions, even if the answers are imperfect, has more confidence in the programme's credibility than one who is presented with metrics that have not engaged with the counterfactual at all.

Key Takeaways

1.Typical AI metrics (utilisation, satisfaction, prompt volume) measure activity rather than value and will not persuade CFOs or finance committees.
2.CFO-credible AI KPIs fall into three categories: productivity (output per time unit), quality (accuracy and error rate improvement), and revenue (direct commercial impact).
3.Measurement must be designed before deployment to establish the baselines that enable before-and-after comparison; retrofitting measurement after deployment produces inadequate evidence.
4.Honest engagement with the counterfactual question (what would have happened without the AI?) builds CFO credibility more than metrics that ignore alternative explanations.
5.The measurement design work (two to four weeks) is one of the highest-return investments an AI programme can make.

References & Further Reading

[1]
Forrester Total Economic Impact MethodologyForrester
[2]
Microsoft Copilot Business Value CalculatorMicrosoft

Want to discuss this with an expert?

Book a strategy call to explore how these insights apply to your organisation.

Book a Strategy Call