Automating grocery shopping

Ocado uses a 3D vision system to identify the optimal grasp point for a suction cup to attach to and pick up an item

The start of the year heralded a new era for food retail with the opening of Amazon’s first Go store, a shop in Seattle where, through a network of hundreds of ceiling-mounted 3D imaging cameras, customers can simply walk in, pick up items off the shelves, and walk out without even reaching for their wallet. The store is able to track individual customers, automatically updating their online shopping cart as they go, and charge the bill to their Amazon account once they leave.

Behind the scenes, food retailers such as British online supermarket Ocado are using machine vision to automate their warehouses, with an innovative combination of robotics and machine vision helping pick and pack around 260,000 orders per week from a range of 50,000 different types of product.

With the use of 3D imaging in agriculture becoming more common, soon the food in our supermarkets will have been under the watchful eye of machine vision from the moment of harvest to the point when it enters our shopping bags.

Automated warehouses

Whether it’s for retail or supermarket warehouses, the cost of a camera system has to be kept down to make it a viable solution. Low-cost 3D imaging technology is becoming available – one example is Intel’s new Realsense product line of depth cameras with an onboard ASIC chip – that could open up a range of opportunities in the food industry, both on the retail side and behind the scenes in supermarket warehouses, such as the robotic system employed by British online supermarket Ocado.

‘Robotics is going to play a large role in the food industry,’ commented Mark Williamson, director of corporate market development at Stemmer Imaging, one of the industrial distributors of the Realsense cameras, along with Framos. ‘What’s interesting about Ocado compared to other supermarkets – which are starting to come around to it themselves – is that they’re focusing more on the automation of picking and packing online deliveries. They have done a lot of work to improve the automation and robotics of what they’re looking to do, and are now looking to market this to others.’

Ocado’s automated warehouses stock over 50,000 items and fulfil more than 260,000 orders per week. To accomplish this, a generalised automated picking solution is needed that can retrieve a large variety of items from storage and prepare them for delivery.

The company’s researchers have designed a robotic picking station featuring a suction cup attached to the end of an articulated arm. By creating a vacuum seal, the cup is capable of picking up a wide range of items regardless of shape or how rigid or flexible it is. The only requirements are that each item must be within a particular weight and that the suction cup is able to create an airtight seal with its surface. Successfully picked items are then placed carefully into a delivery tote.

In order for this process to run smoothly, the algorithm controlling the robot needed to have an understanding of where the storage totes were located and what the optimal grasp points were of the items within them.

Ocado uses a 3D vision system to identify optimal grasp points that are big enough, flat enough and horizontal enough for the suction cup to attach to, allowing it to pick up an object, rotate it to its optimal orientation, and transfer it to a delivery tote. Additional built-in sensors confirm the strength of the vacuum seal and avoid the risk of crushing or damaging products during packing.

‘This is a model-free approach,’ explained Dr Graham Deacon, robotics research team leader at Ocado. ‘We’ve seen a lot of other people using 3D cameras to build a model of an object, match the model with what they’ve got in their scene, and then use that to calculate how to move a robot to pick up the object.

‘With the model-based approach we would need to first create thousands of models, which would be incredibly time consuming,’ Deacon continued. ‘We’ve managed to circumvent that problem.’

Ocado’s system requires no training and is able to pick completely new, previously unseen products by simply identifying a suitably flat surface for the suction cup to form a seal.

The Soma robotic soft hand is an alternative to a suction cup for gripping items. Credit: Ocado

‘The fact we found a way to bypass modelling our stock keeping units, also meant that we could pick a greater range of items than many industrial picking systems,’ the company commented. ‘All in all, the system is streamlined and flexible.’

The cameras used in Ocado’s automated handling solution use point clouds to identify the optimal grip points, meaning they can either use time-of-flight (ToF) or a projected speckle pattern to obtain the required depth information.

Until recently, 3D imaging cameras such as Microsoft’s Kinect – originally a video game component valued at around £100 – have been chosen for this application, according to Deacon, as well as other cameras from Primesense – the Israeli 3D sensing company behind the first generation Kinect, which has since been purchased by Apple and had its technology form the basis of the iPhone X’s 3D camera.

While the first-generation Kinect and Primesense cameras used by Ocado operated via projected speckle patterns, the second-generation Kinect uses time-of-flight, with both technologies having been used in Ocado’s automated handling solution.

The cameras are located over the top of the storage and delivery totes in each picking station. ‘Both of these cameras are currently Microsoft Kinect cameras,’ said Deacon. ‘However, we need to start moving away from this technology now, as Microsoft recently ceased its production.’

In October the software firm announced that it had stopped manufacturing the Kinect because of declining sales figures, selling 35 million units over seven years in total.

‘The Kinect was a remarkable device, especially for its price point,’ Deacon commented. ‘We haven’t yet been able to find a suitable ToF camera that matches the price of the Kinect, although Intel has recently released its relatively low-cost Realsense technology, which we are going to look into for our application.’

The camera demands of Ocado’s picking solution are reasonably low. A high frame rate isn’t required as, rather than performing real-time tracking, the system captures an image, processes it, then moves the robot depending on the result. While the cameras need to be accurate, a high pixel count isn’t always desirable, as a trade-off exists between pixel count and processing rate. ‘We need it to cover the region of interest accurately enough, while using as few pixels as possible,’ Deacon said.

By identifying grip points in real time, rather than storing them in a model and trying to match the model to the scene, Ocado’s flexible solution allows it to even pick up deformable objects such as packets of crisps.

With an array of products in different orientations in each storage tote, the robot won’t be able to pick every item initially. Ocado’s system therefore identifies which products are possible to pick, ranks them in order from easiest to hardest, and then starts with what it considers the easiest. As these products are removed, products more difficult to grasp tend to fall into a more suitable orientation, according to Deacon, which then allows them to be handled accordingly.

Intel's Realsense depth camera is one low-cost option for retail and supermarket warehouses

For objects such as bags of lentils and rice, where the centre of mass can shift, as long as the vacuum seal of the cup is strong enough, the mass can redistribute itself without falling away from the cup. Robotic hands, however, often struggle with items such as these, as their fingers tend to lock in place when picking objects up, with the redistribution of mass causing items to fall from between the fingers.

While the suction cup offers a suitable solution for a large portion of Ocado’s product range, it cannot account for everything; for example, porous and corrugated surfaces where a vacuum seal is harder to form. Ocado is therefore exploring alternative end-effectors – tools on the end of robotic arms – using the same principles as the suction cup technique.

‘Using the same imaging principle … we won’t need a model of these objects, but we can characterise an affordance,’ explained Deacon. ‘An affordance is where objects present certain opportunities to be manipulated depending on the capabilities of the end-effector – for example the handle of a cup is an affordance for a human hand (but not a suction cup). The geometry on an object in conjunction with the capabilities of an end-effector, dictates what kind of grasping functionality is possible. Whatever we choose to use as our next end-effector, we want our vision system to be able to look at the object and identify where the affordances are for the end-effector in use.’

Robot see, robot do

To ensure that the object selected by the robot is the correct product, Ocado is looking at using barcode scanners between the totes. Fairly regularly, however, the suction cup can obscure the barcode, according to Deacon, so one of the things that the company is considering is putting an additional camera on the robot so it can see what it’s holding.

‘We could then train a neural network to identify each product by what can be seen by the robot,’ Deacon explained.

Deep learning and machine learning have become increasingly important in the modern food industry, having applications anywhere from the identification and harvesting of produce, to its eventual packaging. Because of the sheer variety of products in the food industry, however, deep learning systems can be quite prohibitive, according to Williamson, as they require powerful GPUs and large amounts of time and sample objects to train.

‘We instead use machine learning for the food industry, which takes elements of deep learning and uses other techniques that mean we can teach it a lot quicker and use far fewer samples,’ he said.

Stemmer Imaging’s Polimago software offering, a polymorphic object recognition tool, achieves high accuracies with food, for example detecting when an object is an apple in addition to the type of apple.

‘This has the potential of automating the checkout systems in supermarkets, but at the moment is mostly used for automated farming – checking whether the right sized products have been picked and if they have been placed and packaged correctly,’ said Williamson.

When combined with Intel’s Realsense offerings, a powerful result is achieved with Polimago, according to Williamson.

‘A depth image captured by a Realsense camera, while it is 3D data, is represented as 2D data, so effectively the brightness of the image is the height,’ he said. ‘From that, you can use 2D processing capabilities, like machine learning, to recognise shape. If you combine that with a colour image, you can achieve fantastic recognition rates, identifying product type and quality, as well as whether defects, such as bruising, have occurred.’

A second pair of hands

Together with experts from the Karlsruhe Institute of Technology (KIT), École polytechnique fédérale de Lausanne (EPFL), University College London (UCL) and the Sapienza University of Rome, British online supermarket Ocado’s researchers are developing a robot equipped with machine vision and deep learning that can provide assistance to its maintenance technicians.

‘In our warehouses at around two o’clock every day, everything stops and some routine and preventive maintenance is carried out,’ said Deacon. ‘We thought that we could make this maintenance process more efficient by having a robot assist our technicians.’

Equipped with a Primesense camera in its head, as well as a Flea3 stereovision system from Flir’s Point Grey, and a further fairly new stereovision device from robotic vision firm Roboception, the Horizon 2020-funded SecondHands robot will be capable of, among other things, passing tools to technicians, supporting equipment while its being disassembled, or shining a light on a piece of equipment being inspected.

The SecondHands robot is being developed to help with maintenance in Ocado's warehouses

‘The whole point of the robot is to anticipate what a person is trying to do and proactively offer assistance,’ Deacon explained. ‘While there is a natural language interface that a technician could use to issue a command, we want the robot to have an intuitive understanding of what the task is about and what it can do that will be useful to the technician.’

The High Performance Humanoids Group at KIT designed and built the robot, along with all the low-level controlling software to go with it, while the institute’s Interactive Systems Lab developed the natural language interface.

UCL is building the vision components for the robot, according to Deacon, and the Sapienza University of Rome is working on the artificial intelligence planning and activity recognition.

Lastly, EPFL is studying human-human interaction and replicating it between a human and a robot. ‘One example of this is designing a handover mechanism when tools need to be passed to an engineer,’ said Deacon. ‘Rather than having the robot just open its hand, the EPFL researchers have studied how humans hand over objects – including how one person lessens the grip on a load gradually as it is taken up by another person – which will enable very natural interactions with the robot.’

Automating grocery shopping

Automated warehouses

Robot see, robot do

A second pair of hands

Topics

Read more about:

Editor's picks

Putting machine vision in focus: highlights from automatica 2025

Beyond the visible: imaging in IR, NIR, SWIR, and hyperspectral

On-demand webcast: Embracing edge computing for image processing

On-demand webcast: Overcoming lighting challenges: How to get the best out of light sources for imaging

Design and deployment advantages for 3D imaging devices

Selecting line scan camera technology: multi-sensor vs single-sensor solutions

Decoding the dilemma: build vs. buy in vision AI