An-Phriomh-Oifig-Staidrimh-logo

The CSO’s household budget survey transformed through intelligent data capture

Discover how Inpute partnered with Ireland's Central Statistics Office (CSO) to intelligently capture and validate a variety of printed store till receipts. The unstructured nature of data being captured made this project particularly challenging.

Project at a glance

Challenge we helped to solve:

The CSO carries out extensive research on economic and social activities in Ireland. It found that capturing and collating data on larger projects was becoming increasingly difficult. Data accuracy was a major concern as they needed to capture and index 100,000 pieces of unformatted till receipt data that varied hugely from store to store.

Solution delivered:

Leveraging expertise in intelligent data capture and forms recognition,  the Inpute team built a solution to automate and intelligently extract data from a variety of document types in variable locations based on key words and phrases. 

Solution for:

Capabilities deployed:

Business results we delivered:

  • Eliminated the need for manual data entry which took months and extra resourcing
  • Ability to accurately handle a high variety of till receipts with improved data quality 
  • Integration with existing systems
  • Positive feedback from survey staff surrounding ease of use
CSO - intelligent data capture

“The solution has helped to transform the survey processing operation... The system doesn't require a template for receipt type. Instead, it can intelligently search the content, extract the relevant data and ignore the irrelevant items.”

John O’Reilly
|

Central Statistics Office

Challenge

Every five years a random sample of 10,000 households are polled about their expenditure patterns

The household budget survey (HBS) is among the largest and most important of the data collection programmes which the Central Statistics Office in Ireland carries out. Every five years a random sample of 10,000 households are polled about their expenditure patterns. The aim is to determine in detail the pattern of household expenditure in order to update the consumer price index.

Inpute was tasked with devising a system which would automate and streamline the processing of 100,000 pieces of highly varied, unformatted data and deliver substantial cost and resource savings.

John O’Reilly of IT corporate systems at the CSO explains that each household member over the age of 16 is asked to maintain a detailed diary of their expenditure over a two-week period.

Participants were encouraged to return till receipts to CSO in-lieu of entering handwritten detail in the diary booklet. This reduces the burden on respondents, while also enhancing the accuracy of the information collected. “It does however add another layer of complexity to the data processing operation,” says O’Reilly.

Information from the expenditure diaries was already being captured by a form recognition software solution which Inpute had implemented some years earlier. In the context of structured templates like the diaries, it worked perfectly. As the system stood however, it was not a viable option to automatically capture data from till receipts. Prior to this, 2015 receipts data was manually keyed in by a team of data entry operators over a period of months.

Variety is the issue here. Till receipts vary hugely from store to store. Totals, discounts, dates and numbering sequences appear in different places on different parts of the receipt. Identical products are described in different ways, while many receipts also carry marketing messages unrelated to the underlying price data. In short, no two till receipts are the same, while the paper itself is invariably of poor quality which fades quickly and creases easily.

Solution

Tailored intelligent capture software that merges seamlessly with existing systems

“A solution which could deal with the unstructured nature of the till receipts seemed highly unlikely,” says John O’Reilly.

Following a competitive tendering process, Inpute was selected as the preferred solution provider. “They proposed the introduction of software with advanced character recognition capabilities, designed to transform information from unstructured documents, such as receipts, into machine readable data.”

Integrating with existing systems was a key deliverable for the CSO. It was vital that the intelligent capture solution which Inpute provided worked with the existing form recognition software which would continue to capture the diary data. These twin data sets – receipts and diaries – needed to be linked and directly traceable to the source documents.

Inpute began working closely with the CSO to agree the specification, and then set about tailoring their intelligent capture software so that it merged seamlessly with the CSO’s existing systems. Implementation went very smoothly; minor issues which arose during the UAT phase were quickly and effectively dealt with.

Result

Difference we delivered

“The solution has helped to transform the survey processing operation,” says John O’Reilly. “The system doesn’t require a template for each receipt type. Instead, it can intelligently search the content, extract the relevant data and ignore the irrelevant items. Now, instead of keying every line item, you only have to correct those items flagged by the software. The recognition capabilities of the solution meant that the resources needed to process the survey were significantly reduced. It’s definitely a much more efficient process.”

Feedback from survey staff has been very positive, with ease of use being the key benefit. An image of the receipt is presented side-by-side with a table of what the software has read, making the interface easy to navigate and correct.

“The project was delivered on time and within budget,” says John O’Reilly. "The capabilities of the software coupled with the edits and checks which Inpute built into the system have contributed to the overall data quality. The system is now more flexible and robust.”

He concludes: “Inpute’s solution has transformed our data processing operation for the better. We were impressed with the quality of the product and the support provided by the Inpute team. ”

Chris Howard, CEO

This was a fascinating project. Extracting and validating data based on keywords or phrases from documents which are in variable sizes from A4 down to till receipts. The data is also in variable locations on each document. So many user cases could benefit from this type of solution.

Chris Howard
|

CEO, Inpute