Central Statistics Office

Automating the collection of information from store receipts

Central Statistics Office

The Challenge

The Household Budget Survey (HBS) is among the largest and most important of the data collection programmes which the Central Statistics Office in Ireland carries out.  Every five years a random sample of 10,000 households are polled about their expenditure patterns.  The aim is to determine in detail the pattern of household expenditure in order to update the Consumer Price Index.

Inpute was tasked with devising a system which would automate and streamline the processing of 100,000 pieces of highly varied, unformatted data and deliver substantial cost and resource savings.

John O’Reilly of IT Corporate Systems at the CSO explains that each household member over the age of 16 is asked to maintain a detailed diary of their expenditure over a two-week period.

Participants were encouraged to return till receipts to CSO in-lieu of entering handwritten detail in the diary booklet. This reduces the burden on respondents, while also enhancing the accuracy of the information collected.

“It does however add another layer of complexity to the data processing operation,” says O’Reilly.

Information from the expenditure diaries was already being captured by Teleform, a form recognition software solution which Inpute had implemented some years earlier.  In the context of structured templates like the diaries, it worked perfectly. As the system stood however, it was not a viable option to automatically capture data from till receipts. Prior to HBS 2015 receipts data was manually keyed in by a team of data entry operators over a period of months.

Variety is the issue here. Till receipts vary hugely from store to store. Totals, discounts, dates and numbering sequences appear in different places on different parts of the receipt. Identical products are described in different ways, while many receipts also carry marketing messages unrelated to the underlying price data.  In short, no two till receipts are the same, while the paper itself is invariably of poor quality which fades quickly and creases easily.

The Solution

A solution which could deal with the unstructured nature of the till receipts seemed highly unlikely,” says John O’Reilly.

Following a competitive tendering process, Inpute was selected as the preferred solution provider. “They proposed the introduction of software with advanced character recognition capabilities, designed to transform information from unstructured documents, such as receipts, into machine readable data.”

Integrating with existing systems was a key deliverable for the CSO. It was vital that the intelligent capture solution which Inpute provided worked with the existing form recognition software which would continue to capture the diary data. These twin data sets – receipts and diaries – needed to be linked and directly traceable to the source documents.

Inpute began working closely with the CSO to agree the specification, and then set about customising their intelligent capture software so that it merged seamlessly with the CSO’s existing systems.  Implementation went very smoothly; minor issues which arose during the UAT phase were quickly and effectively dealt with.

The Benefit

“The solution has helped to transform the survey processing operation,” says John O’Reilly. “The system doesn’t require a template for each receipt type. Instead, it can intelligently search the content, extract the relevant data and ignore the irrelevant items. Now, instead of keying every line item, you only have to correct those items flagged by the software. The recognition capabilities of the solution meant that the resources needed to process the survey were significantly reduced. It’s definitely a much more efficient process.”

Feedback from HBS survey staff has been very positive, with ease of use being the key benefit.  An image of the receipt is presented side-by-side with a table of what the software has read, making the interface easy to navigate and correct.

“The project was delivered on time and within budget,” says John O’Reilly. “The capabilities of the software coupled with the edits and checks which Inpute built into the system have contributed to the overall HBS data quality. The system is now more flexible and robust.”

He concludes: “Inpute’s solution has transformed our data processing operation for the better.  We were impressed with the quality of the product and the support provided by the Inpute team.”