Object recognition in Augmented Reality

What exactly is object recognition in AR? This is when a digital 3D model is placed onto a real-world object. By attaching a 3D model to a real-world object, users can pick up the object and manipulate it. They can take it apart, explore its parts, and reassemble it.  

So when we think of augmented reality, one of the key elements to consider is the object recognition technology, sometimes referred to as object detection. This term relates to the ability of identifying the form and shape of different objects and their position in an environment which can be captured by the device’s camera. Augmented reality is the enhancement of the view of the real world aided by computer generated imagery (CGI). Incorporating overlays such as graphics, text, video or sound across all AR applications, object recognition is particularly important. The majority of apps are marker-rich, which means they use a special type of image, picture, or object to trigger a pre-defined 3D visualization, animation, video, or soundtrack. In other words, they use object detection and tracking to determine what relevant information should be added to the real world view.

There are 3 types of AR technology:

  • Object recognition — This is where a digital 3D model is fixed to a real-world object that can be picked up and moved around. Real-world 3D objects can be scanned by a camera, and then a 3D simulated model is attached to it. For example, we can target a real-world object, such as a cube, and place a simulated 3D model onto this object. We could then pick up the actual cube and physically manipulate and interact with the model we have placed onto it. If this model were a car for example, we could manipulate it by opening the doors or windows etc.
  • Image targeting — Video, text, images, or 3D objects are overlaid onto a real-world 2D image. Image targeting uses an image in the real-world. It scans and places the image into the app, and attaches 3D objects to it.
  • Plane detection — A digital 3D model is fixed to a real-world flat surface. This is surface is typically stationary, such as the floor, so that you can move around it. After scanning a real-world flat surface, a simulated 3D model is placed in the real environment. We can then examine the details of the 3D model, as well as walk around and even inside of it. For example, we can place our 3D car model into a parking lot, walk up close to a tire, and learn how to change the calliper or disk.

Augmented reality software

There are 2 major software development kits for AR. There are others available but we will explore the most popular.


“Vuforia is an augmented reality software development kit for mobile devices that enables the creation of augmented reality applications. It uses computer vision technology to recognize and track planar images and 3D objects in real time.”

Vuforia powers the majority of AR applications in the App Store and Google Play store. It is a standalone that allows applications to recognize images, boxes, cylinders, text, and arbitrary objects in any environment. This tool is extremely fast, robust, and one of the more user friendly options. It is extremely well-integrated into Unity, who are a leading development engine. Unity and Vuforia have developed a strong partnership and even have a shared R&D lab. Together they create amazing augmented reality experiences.

Vuforia works by matching images captured from a camera with a pre-defined reference image. As both images are byte arrays, sometimes searching for similar elements between the reference image and the image presented can be difficult. One of most prominent advantages of Vuforia is that the tool analyses both images while searching for a specific feature point. A feature point has unique elements within each image. These are typically high-contrast spots, curves, or even edges that will not significantly change, even when you are looking at them from different angles.

Vuforia works by processing a reference image just once when searching for these feature points. So, if the image does not have enough feature points, it is unlikely to not be detected accurately. Therefore, the main aim of reference images are to have plenty of feature points that may be used as a type of anchor for object recognition technology. Vuforia will remember relative positions of all the feature points and bring them together into something that pretty much resembles the shape of star constellations. Then, it can run the exact same feature extraction algorithms on every camera frame and try to match two sets of feature points. If the majority of reference feature points are found one of the camera frames, then the image is recognized as the marker point. By comparing relative positions of a reference point (from the ‘constellation’) with recognized point from the other, Vuforia can understand the marker orientation in the physical world.

Vuforia is also capable of recognizing 3D objects too. In order for them to be recognized, real objects must be opaque, solid, and contain no moving parts. It can work with if the object has few moving parts but it can reduce the quality in which they are able to be captured. They should also have a contrasting coloration, for example black and white (best option) or at least contain contrast-based features. If the objects have holes, they may be easily deformed will not be recognized by Vuforia.

Any object that is scanned using the Vuforia built-in application is done by Object Scanner. The principle of scanning volumetric objects is similar to scanning plane ones. In order to ensure seamless recognition, you should traverse the object you wish to scan from all sides. Vuforia Object Scanner will search for contrast feature points and, if it is able to find enough of them, the object will be quickly recognized.


OpenCV is a library of programming functions mainly aimed at real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage then Itseez. The library is cross-platform and free for use under the open-source Apache 2 License. 

OpenCV, a shortened name for Open Source Computer Vision, is an open-source library of programming functions. It is mainly aimed at real-time computer vision and image processing. This library is cross-platform and is free to use under a Berkeley Source Distribution (BSD) license. Essentially, OpenCV is a set of filters and operations that can be applied to any 2D image.

Its most noteworthy features include edge and transition detection, circle and line detection, blurring, smoothing, perspective recovery, feature point extraction and face detection.

OpenCV has a C# wrapper which can be built for PC, iOS or Android. A popular choice too, given that OpenCV already has a fully-developed plug-in for Unity integration. This plugin has some useful tracking abilities, such as marker-based AR, facial recognition, hand position tracking, and multi-object tracking based on color scheme.

OpenCV is able to do many things necessary for a seamless functioning AR application. However a task that goes beyond simple image recognition requires writing additional code. Perhaps not as user friendly as Vuforia, OpenCV is quite complex especially for inexperienced or novice developers since it requires a lot of learning and some additional knowledge stack. However unlike Vuforia, OpenCV can work with big or very small and sometimes damaged images.


AR learning is especially useful for training exercises as it avoids the need for real-world equipment or on-site training. We can learn from anywhere – proving a very cost effective approach to traditional training methods. Not only can it save businesses money, but it can also be a great tool to help us learn, by practicing working with simulated objects in a safe environment. For example, we can learn about electronic circuit boards and how they work, without an actual live current running through.

AR technology is also cheaper alternative to learning in virtual reality (VR). If the organization already has tablets or the employees have phones they can use, there is no need to purchase additional equipment – only software. Also, if the organization already has the hardware, it allows them to make a program launch more scalable. 

AR object recognition is particularly beneficial for microlearning, as it focuses on one definable object, skill, or process. AR is useful across all processes – from prototyping and construction, to maintenance and operations.

This same technology can also be used at a consumer level. IKEA’s Place app is a good example of this. A customer may see a chair they like but they are unsure if it will work in their home. Users can now take a photo of the chair, and place is digitally in their own room. This helps the customer make a better choice, and helps the retailer to increase their sales.