Monday, 09 September 2019
The new Cognitive Services Forms Recognizer API has a pre-built receipt recognition model. I investigated it for a customer and this is what I found out in terms of how it works, what works well, what is not so good.
#cognitive-services#document-recognition#ocr
This article was published at GitHub. It is open source and you can make edits, comments etc.
One of the latest Cognitive Services is Form Recogniser which is a preview API built to help extract data from electronic forms.
Specifically, Form Recogniser can extract text, key/value pairs and tables from documents and receipts.
There are two modes of operation for Forms Recogniser:
I've been working with a customer that uses general retail receipts for market research purposes and had a requirement to extract specific data points from photos of receipts. I did some research around the Analyze Receipt function and this article is what I learnt ....
There are common fields found in most retail receipts such as retailer, amount, date etc.
The receipts model looks to extract these common fields, specifically it can identify the following:
Interestingly, the receipts model does not contain any special functionality around recognizing line items within a receipt.
The raw OCR data for lines items is contained, but they are not explicitly extracted out as 'understood fields'. See the Missing Features section later in this article for details.
The OCR results are the same result that you'd get from the Cognitive Services Computer Vision API.
Because Forms Recogniser is still in private preview, you must request access via the Form Recognizer access request form. When access is granted, you should receive an email with a specific link to create the Form Recogniser resource in your Azure subscription.
When I wrote this article (September 2019), the link was as follows: https://portal.azure.com/?microsoft_azure_marketplace_ItemHideKey=microsoft_azure_cognitiveservices_formUnderstandingPreview#create/Microsoft.CognitiveServicesFormRecognizer
Note: You will not find Forms Recogniser by searching or browsing in the 'Create' menu whilst it is still a preview service.
Once you have a provisioned the service, you'll have your own service endpoint which will be something like https://whateveryoucalledit.cognitiveservices.azure.com
. There will also be a key that you'll need; you can get both of these by looking in the Azure Portal in the Quick Start
section for the resource.
The Analyze Receipt function uses a two part approach where you initially post the Analyze Receipt request and then check Get Receipt Result until you get a 200
response (I suspect that this may be tidied up into a single call before the service goes into general availability).
You can POST
an image containing a receipt with your key in the Ocp-Apim-Subscription-Key
header to Analyze Receipt. The response will contain a header called Operation-Location
which contains a URL which you will need to get the result.
When you have Operation-Location
you can send a GET
request to whatever the value of Operation-Location
was (see Get Receipt Result). The response will be a JSON payload containing all the recognized data.
The response is split into two sections:
understandingResults
section from this image"understandingResults": [
{
"pages": [
1
],
"fields": {
"Subtotal": null,
"Total": {
"valueType": "numberValue",
"value": 28.4,
"text": "28.40",
"elements": [
{
"$ref": "#/recognitionResults/0/lines/23/words/0"
},
{
"$ref": "#/recognitionResults/0/lines/23/words/1"
},
{
"$ref": "#/recognitionResults/0/lines/23/words/2"
}
]
},
"Tax": null,
"MerchantAddress": null,
"MerchantName": {
"valueType": "stringValue",
"value": "The Curator",
"text": "The Curator",
"elements": [
{
"$ref": "#/recognitionResults/0/lines/0/words/0"
},
{
"$ref": "#/recognitionResults/0/lines/0/words/1"
}
]
},
"MerchantPhoneNumber": null,
"TransactionDate": {
"valueType": "stringValue",
"value": "2019-02-19",
"text": "19 Feb 2019",
"elements": [
{
"$ref": "#/recognitionResults/0/lines/7/words/0"
},
{
"$ref": "#/recognitionResults/0/lines/7/words/1"
},
{
"$ref": "#/recognitionResults/0/lines/7/words/2"
}
]
},
"TransactionTime": {
"valueType": "stringValue",
"value": "14:46:00",
"text": "14:46",
"elements": [
{
"$ref": "#/recognitionResults/0/lines/7/words/3"
}
]
}
}
}
]
You can see the official quick-start steps for testing the service out here: https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/curl-receipts
I tested the receipts API with several different receipts from general purpose retailers in UK ranging from major high street stores like Halfords to airport bars and restaurants. You can see them all for yourself in my GitHub Content respository
NOTE: My testing was done on the preview API in July 2019 and I expect the results and accuracy to improve over time as the service matures. Your mileage may vary!
I found that generally speaking, the API was able to accurately extract some of the understandingResults
information from each receipt but it was rarely able to extract every field.
I also found that the fields that were identified tended to vary by receipt and were inconsistent.
The accuracy was generally good, but not perfect. I noticed some of the following mistakes:
Total
being mistaken for SubTotal
The receipts API is a great start but has some missing features for me which would really make it useful and more complete.
The big gap in the existing API is that it does not extract the line items on the receipt as part of the understandingResults
data set.
I'd love to see each item extracted into an array with the following properties:
At the time of writing, the only way to deal with line items is to use the basic OCR results (the recognitionResults
) and build some custom logic to determine line items within the receipt.
The Analyze Form function of the Forms Recognizer API can deal with table recognition and extraction so you may be able to combine both functions to get the line items as well as the understandingResults
. However, this requires model training and would only be possible for specific receipt 'shapes'; it would not be possible for all receipts (which is the whole point of the receipts API).
In bars and restaurants, a tip is often incorporated into the overall bill and many expenses systems require that this is itemized separately from the overall bill.
It would be helpful if the receipts API were able to extract a tip/gratuity as part of the understandingResults
data set.
There are other fields that could be useful too, including:
Other Cognitive Services such as Luis uses a concept called Resolutions to provide alternative options for data points that it cannot fully resolved, the classic example is that if you say "Saturday", Luis will offer the Saturday just gone and next Saturday as resolutions.
Resolutions empower developers to write logic that chooses the right result for the app based on some business logic.
It would be great to see the receipt API offer a similar feature for the understandingResults
data set.
For example, Total
and SubTotal
often get mixed up. If there were Resolutions available, there could be logic that says that Total
must always be more than SubTotal
thus allowing the application to pick the right results.
The Forms Recognizer is a very powerful new API which is able to extract meaningful information from documents and receipts.
At the time of writing there are limitations which will hopefully get resolved but even with the limitations, the receipts API can form the basis of a receipt data extraction system.
The fact that the receipt API also give you the OCR results as well as the specially recognized data points means that you can add logic to extract additional information as required.
You may find these resources useful.
Got a comment?
All my articles are written and managed as Markdown files on GitHub.
Please add an issue or submit a pull request if something is not right on this article or you have a comment.
If you'd like to simply say "thanks", then please send me a .