Artificial Data Generation With Generative Adversarial Networks (GANs)

Artificial Data Generation With Generative Adversarial Networks (GANs)
Deep learning is nowadays one of the most exciting technologies. Over the last decade it has been applied with a great success to various computer vision tasks, such as image classification or object detection. The real breakthrough was in 2014, when a new type of neural networks was introduced, namely Generative Adversarial Networks (GANs) [1], which can be used to generate real-looking artificial data.

In the following sections the main idea behind GANs will be presented as well as how we use them at NeuroSYS.

Why use GANs?

One of our tasks at NeuroSYS is to build an image classifier for distinguishing Petri dishes with bacteria or fungi grown in them from empty ones. Most images in our dataset contain multiple, large organisms, located close to the center of the dish and our classifier can recognize such samples with high accuracy. However, we also have images that contain only few, small organisms near the edges of the dish. Since there is a small amount of such samples, our model cannot learn to correctly recognize them.

Below two types of samples are presented:

Common samples - multiple, large organisms located close to the center of the Petri dish

Uncommon samples - small organisms located near the edges of the Petri dish

We applied GANs to produce artificial images of bacteria and fungi in Petri dishes with hope that we will generate more untypical samples.

Generative vs. Discriminative Model

To understand GANs, it is essential to first learn how generative algorithms work and a great way to do that is to compare generative and discriminative approaches.

Suppose that you have pictures of cats and dogs and want to classify them. Given your training set, a discriminative algorithm will try to find a decision boundary that separates cats and dogs. Then, to classify a new picture, it checks on which side of the decision boundary this image falls, and makes its prediction accordingly. In contrast, a generative algorithm first looks at cat pictures and builds a model of what cats look like. Analogously, it creates a separate model for dogs. To classify a new picture we can match it against both models, to see whether it looks more like cats or more like dogs we had seen in the training set.

So a discriminative algorithm only models a decision boundary between different classes while a generative algorithm models the actual distribution of each class. Both models can be used for classification, however a generative model can do much more than that, namely it can be used to generate new instances of a class, for example new pictures of cats and dogs. GAN is an especially powerful type of generative models.

How GANs work?

GAN consists of two neural networks competing against each other (hence the word “adversarial”). One network, called generator, generates new instances of data, for example images. The second network, called discriminator, evaluates them, i.e. it decides whether the instance belongs to the original training set or have been generated by the generator. The goal of the discriminator is to correctly recognize fake and authentic data instances while the generator aims at creating data instances, that will be deemed authentic by the discriminator. The hope is that during training both networks will get better and better, with the end result being a generator that produces realistic data instances.

The process is depicted in the figure below:

In practice, when GAN is applied to visual data as images, the discriminator is a Convolutional Neural Network (CNN) that can categorize images and the generator is another CNN that learns a mapping from random noise to the particular data distribution of interest.

Real-world examples

As stated above, at NeuroSYS we applied GANs to produce fake images of bacteria and fungi in Petri dishes. We aimed at creating samples that we lacked in the original set, namely images with only few, small organisms located near the edges of the dish. Below we present some generated images and compare them with original ones. Specifically we used a variation of standard GANs called Improved Wasserstein GANs [2, 3].

Original images

Generated images

We did not manage to generate real-looking samples. While fake Petri dishes look very realistically, bacteria and fungi on generated images can be easily distinguished from the original organisms. The reason may be the fact, that the Petri dishes are large structures, present on every image. In addition, all Petri dishes in our training set are similar. Bacteria and fungi are much smaller and differ significantly in terms of size and appearance. Moreover they are not present on every image. We need more data for GAN to learn all the features of bacteria and fungi and create real-looking samples.

On the other hand, our GAN generates organisms of various sizes, both close to the center of the Petri dish and near the edges. If generated organisms were more similar to the original ones, we would be able to produce needed samples.

To sum up

GANs are an extremely powerful tool for generating artificial data and impressive results have been achieved on standard datasets used in machine learning. However it can be hard to successfully apply them to the real-world problems. We are looking forward to conduct more experiments and hope to produce real-looking samples in the future.


[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets, Advances in neural information processing systems, pages 2672–2680, 2014

[2] M. Arjovsky, S. Chintala, and L. Bottou, Wasserstein gan, arXiv preprint arXiv:1701.07875, 2017

[3] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, Improved training of wasserstein gans, arXiv preprint arXiv:1704.00028, 2017

About the author:

Agnieszka Pawlak
Data Scientist at NeuroSYS

Related articles
What works well and what you should be aware of while working with TensorFlow. READ MORE
Learning methods go through tremendous changes. Right after the e-learning and video tutorials boom, another technology emerged that promises to revolutionize the way we share information. Find out why there is such hype around augmented reality and how it can transform your business in terms of training and knowledge retention. READ MORE
Retrospective meetings are a great occasion for generating insights and ideas to improve workflow efficiency. Sometimes such improvements start travelling from one project to another and become best practices eventually. We want to share our top 7 tried and tested retrospective findings - hope they come in handy for you as well. READ MORE
For tech professionals, staying up-to-date is more than a natural curiosity, it’s a must. How to be wired up without spending too much time on it, and what blogs to follow - further in the blog post. READ MORE
Smooth collaboration and effective project management are crucial for software development. No wonder there are so many approaches and tools designed especially for software project management. And whereas there is no single cure-all tool, you might find our experience in using the following tools helpful! READ MORE
Inspired with the idea to improve the reliability of computer object detection and eliminate the necessity to handle it manually, we initiated a proof-of-concept project on automating microbiological analysis with the help of deep learning for one of our clients. After few rounds of the meticulous testing, we are happy to share the outcomes. READ MORE
The role of a CTO involves constant challenges. And a product release doesn't mean a retirement for a CTO, but a new mode of working and new issues to be prepared for. Check out these top 5 challenges that every CEO of a growing company might face and our tips for handling them. READ MORE
When there are so many examples of tremendous success done by pioneers of certain technologies, the fear of missing out becomes quite reasonable. Don’t miss out two biggest trends of 2018 (from our perspective) and learn how your business can respond to them by reading this blog post. READ MORE
A university-industry partnership is a win-win deal that gives a competitive advantage for both parties. Read more about our experience of collaborating with a university and how we benefited from it - in this blog post. READ MORE
The transition to Scrum is an exciting learning experience for everyone involved. However, it might be overshadowed with human skepticism and doubts. Read more to learn what challenged to expect during the Scrum adoption and how to avoid them. READ MORE
Changing the way how people used to work might be quite a pain. And doing it by introducing Scrum is not an exception. If you’re considering to adopt Scrum or started the transition process already, this blog post is for you. READ MORE
When development becomes painful and time-consuming the debate over “what to do now?” starts. Fueled by developers’ claims “the code is a mess”, “we can’t work with the legacy code”, “let’s rewrite it” from one side, and the business trying to solve the issue less dramatically - from the other, the discussion seems to be never-ending. This blog post will guide you through potential pitfalls while deciding whether to improve the existing code (by refactoring, running unit tests etc.) or rewrite it from scratch. READ MORE
Bringing together people to work on the same project from different locations is becoming a common practice. But distance and lack of physical connection take its toll and may bring impediments to the workflow. After 7 years of experience in setting up distributed teams using Scrum, we cut our teeth on the possible pitfalls and collected several rules to make things work... READ MORE