Generative Adversarial Networks: Artificial Intelligence learns to create with the Game Theory
In the previous article of the series, we addressed the basic concepts of deep learning, explaining how a generic neural network works. Research in neural networks have made considerable progress, they are already able to recognize objects and voices at levels higher than humans. Progress has also been made in natural language.
“What I cannot create, I do not understand.” (R. Feynman).
But if recognition is a field of extreme interest, the results can be surprising if we start talking about synthesis, that is creating images, videos, and voices.
Generative Adversarial Networks (GANs), are a type of neural network where research is flourishing. The idea is quite recent, introduced by Ian Goodfellow and colleagues at the University of Montreal in 2014. The article, entitled Generative Adversarial Nets, illustrates an architecture in which two neural networks were competing in a zero-sum game.
The figure above illustrates operation in a face recognition context. The discriminator consists of a convolutional network1, coupled to a “fully connected” network. The convolutional part serves to gradually extract the salient features of the faces, while the final part serves to generalize the recognition2.
Before training the GAN, the discriminator is initially separately trained to recognize real faces with a satisfactory degree of precision. The final aim of the discriminator will be to distinguish whether a given image represents a real or an artificial face, that is to “beat” the generator by recognizing the fakes.
The generator, on the other hand, is a deconvolutional network3, which takes in input random noise and tries to generate images through interpolation processes. The final aim is to generate realistic images, able to “cheat” the discriminator, producing faces that are indistinguishable from the real ones.
While assembling the GAN, the images created by the generator will be the input to the discriminator.
When the discriminator provides its assessment (“real” or “artificial”), it compares the input image with the images present among the test set to determine the correctness of the judgment. If the judgment is wrong (the artificial face is mistaken for true), the discriminator will learn from the error and therefore improve the subsequent answers. If the judgment is correct (artificial face identified as such), the feedback will help the generator to improve subsequent images.
This process goes on until the images produced by the generator and the discriminator’s judgments become stable. In practice, the two neural networks train each other by competition.
The results can be astonishing, as you can see in the video below, where the whole scene, including the horse, is completely transformed in real time.
GANs are being used in almost every field of AI.
One of the most popular applications is image synthesis, such as generating animal images. For example, turning zebras into horses.
This kind of application (known as style transfer ) allows us to completely change the graphics style of an image. For example, recreate a photograph with Van Gogh’s style, or even with other photos’ style!
The conversion of text descriptions into images is another field where GANs are producing extremely interesting results, to the point of generating photorealistic images.
Vondrick and coll. proposed an architecture to generate video by separately modeling foreground and background.
Very promising are also the advances on the generation of lip-synced videos, as seen in the video below produced by Klingemann with Alternative Face.
The recovery of videos and images details, until now only obtainable with manual work, today seems to be moving towards automation.
Below is an example of denoise (noise reduction from images).
1. The concept of ConvNet will be expanded in another article.
2. The generalization makes the recognition more robust to disturbances such as noise and rotations or translations.