Machine Learning part II – Generative Adversarial Networks (GANs)

Generative Adversarial Networks: Artificial Intelligence learns to create with the Game Theory

In the previous article of the series, we addressed the basic concepts of deep learning, explaining how a generic neural network works. Research in neural networks have made considerable progress, they are already able to recognize objects and voices at levels higher than humans. Progress has also been made in natural language.

 “What I cannot create, I do not understand.” (R. Feynman).

But if recognition is a field of extreme interest, the results can be surprising if we start talking about synthesis, that is creating images, videos, and voices.

Competing networks

Generative Adversarial Networks (GANs), are a type of neural network where research is flourishing. The idea is quite recent, introduced by Ian Goodfellow and colleagues at the University of Montreal in 2014. The article, entitled Generative Adversarial Nets, illustrates an architecture in which two neural networks were competing in a zero-sum game.



The figure above illustrates operation in a face recognition context. The discriminator consists of a convolutional network1, coupled to a “fully connected” network. The convolutional part serves to gradually extract the salient features of the faces, while the final part serves to generalize the recognition2.

Convolutional process

Before training the GAN, the discriminator is initially separately trained to recognize real faces with a satisfactory degree of precision. The final aim of the discriminator will be to distinguish whether a given image represents a real or an artificial face, that is to “beat” the generator by recognizing the fakes.


The generator, on the other hand, is a deconvolutional network3, which takes in input random noise and tries to generate images through interpolation processes. The final aim is to generate realistic images, able to “cheat” the discriminator, producing faces that are indistinguishable from the real ones.


While assembling the GAN, the images created by the generator will be the input to the discriminator.

When the discriminator provides its assessment (“real” or “artificial”), it compares the input image with the images present among the test set to determine the correctness of the judgment. If the judgment is wrong (the artificial face is mistaken for true), the discriminator will learn from the error and therefore improve the subsequent answers. If the judgment is correct (artificial face identified as such), the feedback will help the generator to improve subsequent images.

This process goes on until the images produced by the generator and the discriminator’s judgments become stable. In practice, the two neural networks train each other by competition.

The results can be astonishing, as you can see in the video below, where the whole scene, including the horse, is completely transformed in real time.


GANs are being used in almost every field of AI.

Image synthesis

One of the most popular applications is image synthesis, such as generating animal images. For example, turning zebras into horses.

This kind of application (known as style transfer ) allows us to completely change the graphics style of an image. For example, recreate a photograph with Van Gogh’s style, or even with other photos’ style!


The conversion of text descriptions into images is another field where GANs are producing extremely interesting results, to the point of generating photorealistic images.

Risultati immagini per text-to-image gan

Video Generation

Vondrick and coll. proposed an architecture to generate video by separately modeling foreground and background.

Very promising are also the advances on the generation of lip-synced videos, as seen in the video below produced by Klingemann with Alternative Face.

Detail recovery

The recovery of videos and images details, until now only obtainable with manual work, today seems to be moving towards automation.

Below is an example of denoise (noise reduction from images).

Below GAN used to restore a video by eliminating dominant and haze from underwater videos.


1. The concept of ConvNet will be expanded in another article.

2. The generalization makes the recognition more robust to disturbances such as noise and rotations or translations.

3. The term “deconvolutional” in this area is strongly debated,. For an in-depth discussion of deconvolution in neural networks we can refer to the excellent article by Wenzhe Shi and colleagues.


Ian Goodfellow et al.: Generative Adversarial Nets [arXiv, pdf]

Is the deconvolution layer the same as a convolutional layer?

Could 3D GAN Be the Next Step Forward for Faster 3D Modeling? []

Generative Adversarial Networks for Beginners [O’Reilly)

Introductory guide to Generative Adversarial Networks [GANs) and their promise! [Analythics Vidhya]

Carl Vondrick et al: Generating Videos with Scene Dynamics

Deconvolution and Checkerboard Artifacts

Multi-Agent Diverse Generative Adversarial Networks [arXiv]

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks [Computer Video Foundation – open access]

WaveNet: A Generative Model for Raw Audio [DeepMind]

Google’s Dueling Neural Networks Spar To Get Smarter, No Humans Required [Wired]

Generative Adversarial Networks Using Adaptive Convolution (ILCR, pdf)

About Nonteek

Andrea has been working in Information Technology for about 20 years now, covering most of it, from development to business analysis, to project management. Today we can say he is a light-hearted gnome, passionate about technologies, cognitive sciences, and photography.
Bookmark the permalink.

Leave a Reply