Skip to content


Harnessing the potential of natural language processing for MEL

Working in partnership, Itad and CASM Technology have been exploring the power of natural language processing (NLP) techniques to generate evidence for use in evaluations since 2020. Here they share lessons for others considering a similar approach.

When undertaking evaluations of a particular initiative we are sometimes interested in understanding the discourse around it – as expressed in online news or social media. However, making sense of such unstructured data at a meaningful scale using traditional methods is cost and time prohibitive.

Instead, we can use Natural Language Processes (NLP) to translate large volumes of such text into quantitative data. Trends and patterns can then be identified and then, after being triangulated with other data, used to develop evaluation insights.

Using natural language processing within our evaluations

Itad is working with CASM Technology to apply such techniques. For example, we are currently collaborating to use NLP to examine social media activity related to a global advocacy campaign. We are applying Topic modelling to identify the main themes being used during advocacy campaigns by different advocates, the overlap between them, and their reach.

Prior to this, Itad and CASM Technology partnered as part of a study to evaluate an organisation’s mobilisation of investors in frontier markets in Asia and Africa. We used CASM Technology’s own ‘Method52’ platform to explore how investor sentiment in these markets had changed over time – combining this with more traditional evaluation data related to the mobilisation efforts.

In both cases, we trained a machine learning algorithm using data scraped from online news or social media – using a small subset that had been classified by researchers within the evaluation team. Once the algorithm had been trained, it was used to process a large volume of news or social media – the results of which were used to identify topics of interest, or determine to what extent positive or negative sentiment was being expressed among different groups.

In our sentiment analysis work, we were able to identify several key findings related to the application of the NLP technology. Firstly, whilst investor sentiment is subtle – it is possible to codify, and to successfully train an algorithm to identify positive and negative sentiment. Secondly, we were able to compare sentiment between countries and markets and to identify differences between them (e.g. in terms of strength of sentiment, and level of volatility).

Thirdly, to rule out the possibility our analysis was capturing noise or discussion of previous events, we assessed whether sentiment was predictive of investment, stock market prices and bond yields, and exchange rates. The strongest finding was that sentiment is predictive of net investment flows in all three countries. As net investment inflows (i.e. liabilities) is the most comprehensive measure of investment we have, this gives us confidence that our sentiment index is capturing something ‘real’ and is a leading indicator of investment.

Learnings and recommendations for those considering a similar approach

Is using NLP in this way, there is much we have learned that may benefit others who are interested in this approach:

  • NLP is most effective when the ‘human’ touch is retained: The use of NLP techniques appear to work best when members of the evaluation team are actively involved in the training and model refinement process – helping to reduce the ‘black box’ effect of machine learning. This has particularly been the case with the use of CASM Technology’s Method52 platform. This can help increase an evaluation team’s confidence and understanding of the methodology and findings, and avoid feeling that the analysis was being handed over to the ‘machine’, as ‘humans’ were still involved in the process.
  • NLP is of value when exploring online media at scale: Applying NLP technology in evaluations provides an opportunity to explore data that is otherwise difficult to access at scale and to draw findings from. It provides valuable insights into what online media and social media is saying, and provides visibility of the actual impact of a topic or programme of interest in spaces which are traditionally difficult to measure at scale. For example, for the sentiment analysis work, we applied NLP to a dataset of 1.8 million online articles from 41 publications, spanning 25 years. From this, we were able to classify 165,000 articles relevant to investment, and used this for the analysis.
  • NLP should be used to complement, and not replace other methods: NLP is effective for understanding online outputs (both quantitative and qualitative), but may be less suited for providing evidence of outcomes in and of itself. It can therefore be difficult to evaluate behavioural change or establish causality through NLP research alone, which should be complemented with other research methods in the evaluation process.
  • There remains a tension between innovating and practical application: Within the evaluations where we have used NLP, there has been some scepticism of the technology used – particularly as it is unknown to many people. At the same time, there is often a desire to jump to results – without fully appreciating the innovation involved. This focus on the end product and its practical application can constrain the potential to fully evolve and refine an approach.
  • Ensure the NLP approach is fully embedded in the evaluation design from the outset: A siloed approach to using NLP should be avoided, and should be properly sequenced within the evaluation – to ensure NLP and non-NLP research components sufficiently build on one another. As part of this, the evaluation team should be fully briefed on the role and possibilities of NLP in the evaluation and how this will complement other methods being used.
  • Keep in mind the ‘so what’: The technical nature of NLP takes up a lot of the focus, but it is important to keep coming back to the: ‘so what does this analysis tell us?’ and ‘how can this analysis be used?’. The ability to use NLP to capture and process thousands (even millions) of media articles to assess sentiment and model topics is a significant advance. But, an isolated assessment in one location (in our example, a country’s investment market) may not alone provide enough granular insights to be sufficiently superior to a locally-based research analyst. The value instead comes processing vast quantities of data in near real-time and comparing across markets and countries – then narrowing on patterns around which to ask the so what questions.

Catalysing conversations for systemic change

NLP technology has undergone dramatic changes over the last few years, and continues to advance at a rapid pace. It is increasingly clear that this technology can play an important role in the collection and analysis of very large online datasets.

As this capability develops, it will be crucial that we ensure greater transparency in the use of NLP techniques. The risk is that NLP (and other data science) is handed over to data experts using very technical approaches – which are both hard to replicate and difficult to challenge from the outside.

We need to find better ways that allow non-NLP experts (including both evaluators and commissioners) to interrogate the data used and the analysis process, so that we all have more confidence in the findings.

At the same time, we need to improve the way we blend methods – including through their sequencing within evaluations. Ideally, this should result in us being able to identify patterns and themes that are the focus for more in-depth enquiry that explore ‘how’ and ‘for whom’ questions (potentially using more traditional methods).

Meanwhile, we look forward to sharing our experiences for the wider sector. We also look forward to discussing the approach further in the upcoming panel event hosted by Itad on technical innovations in MEL for systemic change. And would love to hear from others who are employing similar approaches, so please share your experiences and perspectives by joining the event and contributing your views on the value of using NLP techniques within evaluation.

Chris Perry is a Principle Consultant at Itad, Chris Barnett is a Partner at Itad, and Jon Jones is an Analyst at CASM Technology.