The many vision features of Cognitive Services

Microsoft Cognitive Services is a set of APIs that enable developers of any platform or language to add machine learning smarts to their applications. In this article, well explore the many features of the Vision family of APIs

#Cognitive Services #Computer Vision

Monday 06 Jun 2016

This article was written on GitHub. You can raise issues, create pull requests or even fork the content... its open source

Microsoft Cognitive Services is a set of 21 REST APIs that enable developers of any platform or language to add machine learning smarts to their applications.

Whilst there are 21 APIs, many of the APIs have several distinct functions within them. For example, the Computer Vision API facilitates the following capabilities:

You would never assume that the Computer Vision API has all of these distinct but powerful capabilities from its name. But it does and it is easy to overlook a feature that might take your application to the next level because you never knew it was there.

This article is part 1 of a series. In the series, I'll provide a very brief overview of each individual function across all of the Cognitive APIs. This should be your 'go to' article if you've ever asked yourself "I wonder if Microsoft Cognitive can do ....". For this article, I'll focus just on the Vision APIs. Speech, Language, Knowledge and Search APIs will be covered in subsequent articles.

Just so you know ... some of the text in this article has been extracted from the Microsoft Cognitive Services website but I've added to it and re-worded it where i think it makes sense.

Computer Vision

Extracts rich information from images to categorize and process visual data—and protect your users from unwanted content.



The Emotion API takes an facial expression in an image as an input, and returns the confidence across a set of emotions for each face in the image, as well as bounding box for the face, using the Face API. The emotions detected are anger, contempt, disgust, fear, happiness, neutral, sadness, and surprise.



The Face API can detect human faces in an image and return a plethora of interesting data about them. This facilitates a wide range of features including detection, verification, grouping, similar face matching.

There are several entity concepts in the Face API, they include:


The individual functions within the Face API are as follows:


The video provides intelligent video processing, produces stable video output, detects motion, creates intelligent thumbnails, and detects and tracks faces.

Due to the large file size and associated latency with video files, all four functions operate a model whereby a video is POSTed (via a URL or binary) to the API and then processed in the background, no immediate results are given. You can poll for results using the GET Operation Result api and then get the final result using GET Result Video api which will return the processed video file as a binary application/octet-stream.


The video api contains the following functions

In summary

We've covered 15 individual functions which span 4 APIs in this article.

This is just the 'Vision' category in the wider Microsoft Cognitive Services suite.

In forthcoming articles, I'll cover the Speech, Language, Knowledge and Search APIs and their individual functions in a similar level of detail ... unless no-one reads or comments on this, in which case I shan't bother! :)

Got a comment?

All my articles are written and managed as Markdown files on GitHub.

Please add an issue or submit a pull request if something is not right on this article or you have a comment.

If you'd like to simply say "thanks", then please send me a so the rest of Twitter can see how awesome my work is.