Turning Grand Theft Auto into a deep learning dataset

Share this on social media:


A group at the University of Bologna is trying to make images from Grand Theft Auto more realistic so that they can act as training data for neural networks. Greg Blackman listens to Pierluigi Zama Ramirez’s presentation at the European Machine Vision Forum in Bologna in September

What can computer vision learn from video games? Researchers at the University of Bologna in Italy have trained neural networks using images from the video game Grand Theft Auto (GTA). The idea is to see if the computer graphics from GTA can be made to seem like real images - from the neural network’s perspective at least.

Pierluigi Zama Ramirez, a PhD student at the University of Bologna, described the work at the European Machine Vision Association’s machine vision forum, held in Bologna from the 5 to 7 September.

One of the big problems with deep learning, Zama Ramirez explained, is annotating the large amount of image data needed to get an accurate output from a neural network. He said that a task like semantic segmentation – classifying each pixel of the image – mostly has to be done manually, which can take two to six hours for each image.

The advantage of training a neural network on synthetic data, such as those produced by computer graphics, is that ‘you can obtain the labels almost for free’, Zama Ramirez said, as well as having access to a lot of images.

The downside is that the models trained on synthetic data cannot achieve the same performance as models trained on real data.

The researchers therefore set about trying to make GTA images look more realistic using generative adversarial networks (GANs). This is a framework that consists of two neural networks: a generator and a discriminator. The generator takes a synthetic image from the video game and tries to transform it into a realistic image. The discriminator then takes the adapted images and a real image dataset, and tries to classify which is real and which is fake. Over time the generator gets better at producing more realistic images, while the discriminator becomes more adept at flagging synthetic data. This process produces a realistic image.

There are two branches of GAN: the pixel-level approach, like Cycle-GAN, and the feature-level method. Pixel-level approaches don’t exploit any semantic information, i.e. the context of the image, so a framework like Cycle-GAN can introduce a lot of artefacts, such as trees sitting in the sky.

Zama Ramirez worked with a new pixel-level GAN approach that exploits semantic information during the generation process. Here, the discriminator does not only classify if the image is real or fake, but also performs semantic segmentation of the image. This leads the generator to produce images that have the same semantic content as the source synthetic images.

The group worked with a dataset of 20,000 training images with semantic labels from Grand Theft Auto V, and a validation training set of 2,975 cityscape images without labels. A network was trained on GTA adapted images, and then the performance evaluated against the cityscape validation set.

The performance of the network trained on GTA adapted images increased from 18.23 per cent to 31.4 per cent mean intersection over union (mIoU), and from 60.43 per cent to 80 per cent pixel accuracy, compared to just using GTA raw synthetic data.

‘Training a network on our adapted images can achieve almost double that from training a network on just synthetic data,’ Zama Ramirez commented. ‘The adapted images belong much more to the real distribution than the synthetic images.’

However, he added that the adapted images ‘still can’t reach the same accuracy of performance as when trained on real data.’

The group is now employing a Cycle-GAN approach consisting of two generators and two discriminators to try and achieve even better performance. Whether Grand Theft Auto can be made to appear completely real is yet to be seen.

Related analysis & opinion

02 December 2019

Takashi Someda, CTO at Hacarus, on the advantages of sparse modelling AI tools

26 July 2019

Limited data is a common problem when training CNNs in industrial imaging applications. Petra Thanner and Daniel Soukup, from the Austrian Institute of Technology, discuss ways of working with CNNs when data is scarce

Zeiss's Smartzoom 5 digital microscope can remove glare from images by using angular illumination

23 May 2019

Reporting from the EMVA’s business conference in Copenhagen, Greg Blackman discovers how angular illumination and computational imaging can dramatically improve the resolution of a system

05 April 2019

Greg Blackman reports on the complexities of training AllGo Systems' driver monitoring neural networks, which the firm's VP of engineering, Nirmal Kumar Sancheti, spoke about at the Embedded World trade fair

18 March 2020

Michał Czardybon, CEO of Adaptive Vision, on how to handle data and image annotation when working with deep learning

Related features and analysis & opinion

A setup for photometric stereo imaging in which multiple lights are used to illuminate an object from different directions. Credit: Advanced illumination

04 June 2020

Matthew Dale explores the power of computational imaging, all made possible by clever illumination

02 December 2019

Takashi Someda, CTO at Hacarus, on the advantages of sparse modelling AI tools

MVTec’s Halcon software library includes a deep learning OCR tool with pre-trained fonts from a wide range of industries. Credit: MVTec

03 August 2020

Matthew Dale explores vision solutions for code reading and inspection in pharmaceutical production

18 March 2020

Michał Czardybon, CEO of Adaptive Vision, on how to handle data and image annotation when working with deep learning

27 January 2020

Prior to speaking at the Embedded World trade fair, The Khronos Group’s president, Neil Trevett, discusses the open API standards available for applications using machine learning and embedded vision

13 January 2020

Vassilis Tsagaris and Dimitris Kastaniotis at Irida Labs say an iterative approach is needed to build a real-world AI vision application on embedded hardware

03 December 2019

Following speaking at Embedded Vision Europe, Pierre Gutierrez, lead machine learning researcher at Scortex, writes about the challenges of deploying deep learning on the factory floor