Beyond the buzzwords: What does good quantitative evaluation actually look like?

This blog is the first in a series of short reflections on quantitative practice by Ciaran McDonald. The series starts by setting out what good quantitative evaluation looks like in practice, before exploring some of the areas where it can break down – including common pitfalls in survey design, how outcomes are translated into indicators, how to think about counterfactuals, and how to assess the strength of evidence when moving from data to claims.

‘Robust’, ‘data-driven’, ‘evidence-based’: these are terms that are used constantly in evaluation. They appear in proposals, reports and funding bids, often as shorthand for quality – but they are rarely defined. And in practice, they are often used to describe work that varies significantly in how much it can actually tell us.

This matters because quantitative evidence carries particular weight. Numbers often create a sense of objectivity and certainty that is not justified by the underlying design, analysis or interpretation. Weak quantitative practice can therefore appear convincing, while stronger work – which is more careful about what can be claimed – can appear less definitive by comparison.

The risk is not just technical, but practical: decisions are often made on the basis of evidence that may not be as robust as it appears.

So what does ‘good’ quantitative evaluation actually look like?

In practice, good quantitative evaluation is not defined by the methods it uses, but by how clearly it connects three things: the question being asked, the data that is collected, and the claims that are made.

Where quantitative evaluation often falls short

Many of the most common issues in quantitative evaluation are not technical, but conceptual.

We see surveys that collect large amounts of data, but are only loosely connected to evaluation questions. We see outcomes reported as ‘improvements’ without a clear sense of what they are being compared against. We see analysis that is statistically sound, but doesn’t meaningfully inform a decision. And, most commonly, we see findings presented with a level of confidence that the data does not fully support.

None of these issues are unusual. They arise because quantitative methods are often treated as tools to be applied, rather than as part of a broader process of reasoning. The result is evaluation that looks rigorous on the surface, but is less clear about what can actually be concluded.

A more useful way to think about it

Good quantitative evaluation follows the same principles as good evaluation more broadly. The difference is not in the underlying evaluation logic, but in how that logic is translated into something we can measure and analyse.

At its core, this can be understood as a chain:

What are we trying to understand? → What data do we need to answer that? → What can we reasonably conclude from that data?

Where those links do not hold, the resulting conclusions are often weaker than they appear.

1. Start with the question – not the method

Quantitative methods are particularly well suited to answering questions about scale, distribution and change – for example, how many people were reached by a programme, how outcomes vary across groups, or how things have shifted over time. They are less well suited, on their own, to exploring experiences, understanding mechanisms or answering complex ‘why’ questions.

A common starting point, however, is not the question, but the method:

“We’ll run a survey.”

The risk is that data collection begins before there is a clear sense of what the data needs to do in relation to what the evaluation is trying to understand. The result is often a dataset that is detailed, but only loosely connected to the core questions and therefore not especially useful.

Good practice starts by being explicit about purpose:

What is the evaluation trying to inform?
What would a useful answer actually look like?

Only then does it make sense to decide what role quantitative evidence should play.

2. Make sure you’re measuring the right thing

Many evaluation outcomes are abstract – for example, confidence, wellbeing, resilience, community cohesion. What we measure are not these outcomes directly, but indicators of them.

The challenge is that the choice of indicator shapes the story the data can tell.

A single survey question may only partially capture a broader outcome. Satisfaction is often used as a proxy for impact. And measures are sometimes reused for consistency, even when they are not well aligned to the context.

Good quantitative design involves being deliberate about this translation:

If this outcome had genuinely changed, what would we expect to observe?

That question alone can significantly improve both data collection and interpretation.

3. Be clear about what you are comparing against

Most quantitative findings rely on a comparison, even if it isn’t made explicit.

When we say “outcomes improved” or “participants reported higher confidence”, the implicit question is: compared to what?

Without some form of counterfactual – an understanding of what would have happened anyway – it is difficult to interpret change with confidence. In some cases, this can be addressed through more robust designs, such as quasi-experimental approaches. In others, this is not feasible.

Even then, making the comparison explicit (even conceptually) strengthens the analysis. It helps clarify what the data can and cannot tell us.

4. Match your claims to the strength of your evidence

If design is about asking the right questions in the right way, interpretation is about being clear on what the answers mean.

Quantitative data is very good for describing patterns: for example, how outcomes vary, how they change over time, and how different groups compare.

But moving from description to explanation – particularly to claims about impact – requires careful judgement.

An observed change does not, on its own, demonstrate that an intervention caused that change. And yet, this is a step that is often taken implicitly.

Good evaluation practice involves calibrating claims carefully:

What can we say with confidence?
What is indicative?
What remains uncertain?

This is less about being cautious for its own sake, and more about being clear about the strength of the evidence.

5. Be transparent about limitations

All quantitative data comes with limitations – sample size, response bias, measurement choices, and, in many cases, the absence of a clear counterfactual.

These limitations do not invalidate the findings, but they do shape how those findings should be interpreted.

Being transparent about these limitations is not a weakness. In fact, it is central to credibility. Decision-makers are better served by a clear understanding of the strength of the evidence than by over-confident claims that may not be fully supported.

A simple test

A useful way to bring these elements together is to ask a simple question early in the evaluation process:

What would we expect to see in the absence of this intervention?

If that question is difficult to answer, it is a signal that interpreting change – and making claims about impact – will also be difficult. That does not mean the evaluation cannot proceed, but it does mean that conclusions should be framed with appropriate caution.

Beyond methods

Good quantitative evaluation is not a matter of complexity. It is less about the specific methods used, and more about the quality of thinking that underpins them.

Advanced techniques can strengthen an evaluation when they are well applied, but they cannot compensate for unclear questions, weak alignment between data and purpose, or over-interpretation of results. Conversely, even relatively simple data can be used effectively when it is well aligned to the evaluation question and interpreted with care.

Good quantitative evaluation is not defined by methodological sophistication, but by disciplined thinking that connects questions, data and claims, and is honest about what the evidence can genuinely support.

In next month’s blog, Ciaran will explore why a survey can look perfectly sensible yet still fall short in answering your evaluation question – and what stronger survey design looks like in practice.