Eyes on the aisle

Online retail sales in the US exceeded $453 billion in 2017, according to the US Department of Commerce. Although this may seem like a substantial amount, it only accounts for 13 per cent of the total retail sales made in the region throughout the year, meaning the majority of transactions still take place via the millions of customers walking through their doors and aisles every day.

Store managers could be set to gain more than just sales from these customers in the future, however, thanks to a combination of camera, sensor, computer vision and deep learning technology beginning to penetrate the trillion-dollar retail sector in order to revolutionise the way customers shop in stores – all while capturing a swathe of data on their shopping behaviours, item preferences and the amount of remaining stock on the shelves.

‘Retail is one of the industries and vertical markets, respectively, that are starting to see a massive ramp-up in computer vision-based systems,’ confirmed Gerrit Fischer, head of product market management at camera manufacturer Basler. ‘Many applications, like people tracking, for example, which have already widely been used in other camera-related areas – such as security and surveillance – are nowadays being adapted to retail to come up with analytics. These new benefits of camera-based applications are now widely recognised in the retail sector as they provide possibilities for optimising the margin-critical retail business.’

According to Fischer, even though the benefits of computer vision-based solutions have been identified in retail, ‘the numbers are still very low at the moment – compared to security and surveillance – because the new set of analytics and opportunities [enabled] by using cameras still need to be adopted by this vertical.’

This could soon be set to change, however, as this year a number of new, futuristic stores have been established across the US that are using an array of ceiling-mounted cameras, sensors, computer vision and deep learning algorithms to automate the way customers pay for products. In these stores, customers no longer have to queue up and pay for their items using a standard checkout system; instead they are simply billed automatically as they exit the premises, according to whichever items they have picked up and kept during their trip.

This extraordinary development has been made possible due to the combination of vision, sensor and deep learning technology being able to keep track of individual customers and detect exactly what items they pick up after they enter and move throughout the store. A personal online shopping cart is created and continuously updated for each customer as they accumulate items during their visit, which is then automatically purchased and charged to a pre-existing online account as they leave.

The first of such stores to open in the US was Amazon’s first ‘Go’ store in Seattle, Washington, at the start of the year, which has since been followed by two further stores in Seattle and another in Chicago, Illinois. The shops operate using hundreds of overhead cameras – which initial patent filings suggested could include depth sensing cameras, RGB cameras, and infrared sensors – in combination with weight sensors, computer vision and deep learning algorithms. Together, the systems are able to identify exactly when and what items are picked up or placed back on shelves by individual customers, enabling their Amazon accounts to be billed accurately as they leave. Amazon is considering a plan to open up to 3,000 of these stores across the US by 2021, according to media outlet Bloomberg, which would make it one of the largest retail chains in America.

Tracking customers

Meanwhile, however, in San Francisco, California, a number of budding startups are racing to develop their own take on cashier-less technology in pursuit of a bigger prize. Rather than looking to compete with the 155,000 convenience stores in the US by introducing their own chain of automated stores, the firms are looking to augment the existing stores with their own suite of computer vision technology.

Standard Cognition, for example, hopes to equip convenience stores with its solution that, unlike Amazon’s, relies solely on camera technology – without needing additional sensors – to track individual customers and items around the premises. According to the firm, this makes the technology scalable, reduces its operational complexity and the amount of required maintenance, and enables a lightweight installation process that causes minimal disruption to the store – it can be done overnight and without having to rearrange any shelves or products. The firm raised a total of $10.6 million across three separate funding rounds between August 2017 and July 2018, and is now demonstrating its solution at a prototype store open to the public in San Francisco.

Not only is Standard Cognition’s camera network capable of tracking multiple customers each picking up individual items, but the solution can also identify when multiple items have been picked up simultaneously by individual customers; detect when items have been put down by customers around the store (and remove these items from their online shopping cart); and identify when items have been passed or even thrown between customers. In doing this, the solution also produces a wealth of data on customers’ shopping preferences and behaviour for the retailer to examine.

^{Trigo Vision’s highly sophisticated algorithms enable it to offer a cashier-less shopping experience using simple IP cameras.}

Standard Cognition assures that this data is captured anonymously and without collecting any biometric information – such as facial recognition data – from customers. Each year US retailers experience $130 billion loss in profit due to cashier-related overheads and shrinkage such as shoplifting, administrative errors and damaged goods. In addition to rising minimum wages and rent, on top of already thin profit margins, many retailers are struggling to compete with larger chains and online options such as Amazon, and could eventually go out of business.

‘We want to help retailers, both small and large, thrive and eliminate the cumbersome, expensive checkout experience as it exists today,’ commented Michael Suswal, co-founder and chief operating officer of Standard Cognition. ‘We saw the need in the market for a better commerce solution for brick-and-mortar retailers that would leverage the latest AI technology to help them dramatically cut costs, get better analytics, get insight into inventory and shrinkage, and improve the checkout experience for their customers.’

The firm announced in July that it is installing systems for multiple retailers around the world, with the first revealed as Paltac – the largest Japanese wholesaler of fast-moving consumer goods and over-the-counter drugs, which has over $8.6 billion in annual revenues, and services thousands of retail stores across the country.

‘In Japan, retailers considering autonomous checkout have [so far] really only had RFID or scan-and-go type options,’ said Yohei Nishiyama, general manager of Standard Cognition’s recently established Japanese office. ‘Both are rather old fashioned. Standard’s approach is very different – it has a much lighter footprint, it’s easier for shoppers, and it provides a lot of data to retailers without compromising shoppers’ privacy. Standard’s solution has been very well received by Japanese retailers.’

‘When we show them what Standard can do with very little hardware and no scanning at all, they are generally blown away,’ Suswal added.

Also in San Francisco, another startup, Zippin, which as of August had secured $3 million in seed investment, has opened a prototype store to demonstrate its cashier-less concept, which like Amazon’s uses shelf-based sensors, in addition to cameras, to confirm purchases.

Meanwhile, AiFi, another startup and a finalist at this year’s Vision Tank Award – bestowed by the Embedded Vision Alliance – also has plans to open a very large demo store in the San Francisco Bay Area later this year. AiFi’s cashier-less technology solution combines a network of sensors and ‘sophisticated camera technology’, which it claims can adapt to any size of store and track hundreds, or even thousands, of shoppers and products using low-power mobile devices and edge-based processing.

Additionally, a fourth startup, Aipoly – also based in San Francisco – is, like Standard Cognition, developing a camera-only cashier-less store concept. The firm trains its system by using a simulated environment to generate data of a store and its products. ‘All we have to do is place each product inside a machine for five minutes, and voilà, the AI will learn to identify it in all plausible situations,’ the firm explained.

Lastly, a fifth startup in the San Francisco Bay Area, Inokyo, has also opened a prototype store to demonstrate its solution and gather data in order to train its AI technology. For this concept, however, rather than using weight sensors on the shelves – as Amazon Go does – to confirm whether a product has been taken, Inokyo instead equips its shelves with another set of cameras, adding them to those already on the walls of its shop.

Cashier-less technology has not only caught the attention of startups in Silicon Valley, however. Earlier this year in Israel, startup Trigo Vision announced its successful $7 million seed funding round and the development of a camera-only cashier-less solution that requires a significantly lower number of cameras compared to the other solutions currently being developed. Speaking to Imaging and Machine Vision Europe, Jenya Beilin, COO of Trigo Vision, explained that the firm is able to achieve this thanks to the unique and proprietary algorithm that it uses to process the images from its camera network. ‘Other systems invest in advanced hardware to overcome the challenges of computer vision,’ he said. ‘Yet we are able to reduce the number of cameras used by enhancing our software.’

Trigo Vision is also targeting a scalable cashier-less solution that can be implemented in retail outlets, ranging from small convenience stores to large supermarkets. To do this it is using cameras that are both basic and affordable, in order to minimise cost and enable a swift and simple deployment. ‘We are using basic IP cameras without any sophisticated technology, primarily for scalability and cost-effectiveness,’ Beilin confirmed.

IP (internet protocol) cameras are commonly employed for security and surveillance, and can send and receive data via computer networks and the internet. Such devices are commonly available for under €50. While these systems are indeed very affordable and thus suit the price-critical characteristic of the retail sector; according to Beilin, issues caused by factors such as changes in lighting, fast-moving people, occlusions in a busy shop, and other vision-based challenges – which have already proven to be a challenge for computer vision and AI systems – are even more evident when using basic cameras. ‘However, with Trigo Vision, we have managed to overcome this by developing highly sophisticated algorithms, rather than leveraging advanced expensive hardware,’ he said. ‘We believe this is what makes our system unique and where our competitive advantage lies.’

Penetrating the market

In order for the cameras of the machine vision industry to penetrate the cost-sensitive retail market, the cost of the technology has to first come down, according to Fischer at Basler.
‘Typical manufacturers of machine vision cameras hardly meet the expected price point [for retail], as the technical requirements [of machine vision] do not fit with the ones from the retail market,’ he confirmed. ‘They need a frugal design and a product concept for a dedicated market approach. Cost to design and price/performance need to be optimised, ideally through a disruptive innovation.’

According to Mark Williamson, managing director of corporate market development at Stemmer Imaging, for 3D imaging cameras such decreases in cost are already underway, thanks to a recent increase in the use of 3D imaging in mobile phones that has resulted in the mass production of powerful image processing chips.

^{The Intel Realsense D400 offers 3D stereovision at a reduced price thanks to a powerful, mass-produced ASIC chip designed for consumer electronics. (Credit: Stemmer Imaging)}

‘Intel have developed an ASIC that can perform image processing that they are producing in high volumes and selling to mobile phone companies for applications like facial recognition, such as the functionality in the new iPhone X,’ Williamson explained. This mass-production allows the price of the ASIC to be brought down, which in turn brings down the price of the cameras using it.

Stemmer Imaging was recently appointed as a supplier for Intel’s Realsense technology product line, a range of vision processors, depth cameras, and depth modules that together offer a combination of depth sensing and full colour HD images via Intel’s high-performance, mass-produced ASIC chips. One particular Realsense system that Williamson believes has a future in retail is the Intel Realsense D400, which comprises a pair of stereo infrared cameras combined with a pattern projector and an additional RGB camera. Released in January, the D400 – through its use of low-cost ASIC technology – is available at £170 as a house product and below £100 at the module level, making it much more suitable for retailers looking to deploy the technology in multiple instances.

‘With the D400 Intel are looking to get the price point down, enabling it to be very disruptive and have applications everywhere,’ Williamson commented. ‘The camera delivers 3D data in full HD at up to 90fps. A lower cost rolling shutter version is also available, along with a higher cost global shutter version for fast-moving objects.’ These cameras could all be very significant in the retail sector, according to Williamson, especially for producing rapid results when identifying what products are being picked off a shelf.

‘The Realsense D400 could play a big role in the food [retail] industry and is a major step up from any stereovision technology in the past,’ said Williamson. ‘While time-of-flight technology could achieve similar depth sensing results, its price point is currently in the thousands. In a controlled environment Realsense gives you excellent results at fraction of the price, which is the main factor for supermarkets looking to purchase multiple systems. It’s a product that touches the old machine vision world and the new embedded machine vision world.’

‘I think that this could be very interesting for retail, going forward,’ Williamson continued. ‘I’m aware of a lot of research departments within supermarkets that are exploring this sort of technology. They’ve started off by trialling more automated checkouts and hand-held scanners, but they need to investigate in a different way to work out the best options for automating their retail experience.

‘I think that with the high level of quality that can be achieved with modern 3D sensing technology, in addition to the price point that it is coming down to, [it] gives the potential to open up massive opportunities in the food [retail] industry.

‘Looking at the new Amazon supermarkets, for example, if a Realsense camera was used every three metres along the aisles to see when customers pick items up, with an additional embedded board each system could be built for around $500. At these price points this use of [3D] vision in retail suddenly becomes a lot more viable.’

In regard to how the introduction of cashier-less supermarkets could impact the machine vision market, Fischer concluded: ‘We will see that the machine vision market will adopt the technology which is nowadays already being used in the retail market – this is embedded vision technology, where you enable customers to design very cost-effective systems and lean solutions by utilising technology like ARM-based processing capabilities and consumer-driven interfaces like MIPI.’

^{Top image: Trigo Vision}

Retailers in the US and UK are expected to trial an age verification vision system at checkouts for customers buying age-restricted items.

NCR, a technology firm specialising in financial transaction hardware, has integrated a camera-based age verification platform that uses facial recognition software into its self-checkout system. The US and UK retailers will trial the solution in the coming months.

The self-checkout’s built-in camera system, with age detection technology from Yoti, can estimate a customer’s age and flag any customers determined to be under the required age limit for a product.

This threshold is configurable by the retailer and is generally ten years above the legal limit.

Waiting for age approval at self-checkouts is a source of frustration for many shoppers, who just want to get home as quickly as possible,’ said Robin Tombs, CEO and co-founder of Yoti, a company offering digital solutions for proving identity. ‘Our integration with NCR delivers a frictionless and innovative way for customers to prove their age in seconds. It’s a simple process that helps retailers meet the requirements of regulators worldwide. Customers will spend less time at the self-checkout, and employees can assist with other tasks, improving the overall shopping experience.’

Eyes on the aisle

Tracking customers

Penetrating the market

Topics

Read more about:

Editor's picks

How to train a vision system in five minutes: “It can pick up the tiniest smudges you can barely see…"

On-demand webcast: Embracing edge computing for image processing

Beyond the visible: imaging in IR, NIR, SWIR, and hyperspectral

On-demand webcast: Overcoming lighting challenges: How to get the best out of light sources for imaging

Unveiling the invisible: How SWIR imaging enhances food and beverage quality control

How does prism technology help to achieve superior colour image quality?

Vision system essentials: key components and camera power accessory insights