Generative Adversarial Networks: Artificial Intelligence learns to create with the Game Theory
In the previous article of the series, we addressed the basic concepts of deep learning, explaining how a generic neural network works. Research in neural networks have made considerable progress, they are already able to recognize objects and voices at levels higher than humans. Progress has also been made in natural language.
“What I cannot create, I do not understand.” (R. Feynman).
But if recognition is a field of extreme interest, the results can be surprising if we start talking about synthesis, that is creating images, videos, and voices.
Competing networks
Generative Adversarial Networks (GANs), are a type of neural network where research is flourishing. The idea is quite recent, introduced by Ian Goodfellow and colleagues at the University of Montreal in 2014. The article, entitled Generative Adversarial Nets, illustrates an architecture in which two neural networks were competing in a zero-sum game.
Discriminator
The figure above illustrates operation in a face recognition context. The discriminator consists of a convolutional network1, coupled to a “fully connected” network. The convolutional part serves to gradually extract the salient features of the faces, while the final part serves to generalize the recognition2.

Before training the GAN, the discriminator is initially separately trained to recognize real faces with a satisfactory degree of precision. The final aim of the discriminator will be to distinguish whether a given image represents a real or an artificial face, that is to “beat” the generator by recognizing the fakes.
Generator
The generator, on the other hand, is a deconvolutional network3, which takes in input random noise and tries to generate images through interpolation processes. The final aim is to generate realistic images, able to “cheat” the discriminator, producing faces that are indistinguishable from the real ones.
GAN
While assembling the GAN, the images created by the generator will be the input to the discriminator.
When the discriminator provides its assessment (“real” or “artificial”), it compares the input image with the images present among the test set to determine the correctness of the judgment. If the judgment is wrong (the artificial face is mistaken for true), the discriminator will learn from the error and therefore improve the subsequent answers. If the judgment is correct (artificial face identified as such), the feedback will help the generator to improve subsequent images.
This process goes on until the images produced by the generator and the discriminator’s judgments become stable. In practice, the two neural networks train each other by competition.
The results can be astonishing, as you can see in the video below, where the whole scene, including the horse, is completely transformed in real time.
Applications
GANs are being used in almost every field of AI.
Image synthesis
One of the most popular applications is image synthesis, such as generating animal images. For example, turning zebras into horses.
This kind of application (known as style transfer ) allows us to completely change the graphics style of an image. For example, recreate a photograph with Van Gogh’s style, or even with other photos’ style!
Text-to-image
The conversion of text descriptions into images is another field where GANs are producing extremely interesting results, to the point of generating photorealistic images.
Video Generation
Vondrick and coll. proposed an architecture to generate video by separately modeling foreground and background.
Very promising are also the advances on the generation of lip-synced videos, as seen in the video below produced by Klingemann with Alternative Face.
Detail recovery
The recovery of videos and images details, until now only obtainable with manual work, today seems to be moving towards automation.
Below is an example of denoise (noise reduction from images).
Notes
1. The concept of ConvNet will be expanded in another article.
2. The generalization makes the recognition more robust to disturbances such as noise and rotations or translations.
3. The term “deconvolutional” in this area is strongly debated,. For an in-depth discussion of deconvolution in neural networks we can refer to the excellent article by Wenzhe Shi and colleagues.
Links
Ian Goodfellow et al.: Generative Adversarial Nets [arXiv, pdf]
Is the deconvolution layer the same as a convolutional layer?
Could 3D GAN Be the Next Step Forward for Faster 3D Modeling? [3dprinting.com]
Generative Adversarial Networks for Beginners [O’Reilly)
Introductory guide to Generative Adversarial Networks [GANs) and their promise! [Analythics Vidhya]
Carl Vondrick et al: Generating Videos with Scene Dynamics
Deconvolution and Checkerboard Artifacts
Multi-Agent Diverse Generative Adversarial Networks [arXiv]
WaveNet: A Generative Model for Raw Audio [DeepMind]
Google’s Dueling Neural Networks Spar To Get Smarter, No Humans Required [Wired]
Generative Adversarial Networks Using Adaptive Convolution (ILCR, pdf)
Andrea has been working in IT for almost 20 years, covering about everything, from development to business analysis, to project management.
Today we can tell he is a carefree gnome, passionate with Neurosciences, Artificial Intelligence, and photography
In the past few years, we the Homo Sapiens have indeed made the world a small village with seamless outreach, accessibility and connectivity across the globe.