Basic Fame Work of GAN: Generative Adversarial Network part 1
Like many breakthroughs in history, GAN isn’t complicated. It’s brilliant. If you are familiar with Object oriented programming and rudimentary machine learning, GAN is the obvious next step: dynamics of multiple models working together. Not only GAN creates amazing results, it also teaches us effective learning method: healthy competition. This is only the tip of the iceberg where a classifier model is pitted against a generator model. Imagine the possibilities of using more than 2 models and a different sets of relationships instead of competition. The implication is limitless.
Generator, Discriminator, and loss functions
Let me make an analogy. A cell is composed of molecules, which itself consist of chemical elements, and all the way down to subatomic particles. GAN is similar, it is composed of models, which it self consist of deep neural networks, all the way down to a single line of code. I hope you see what I am trying to do here. I am trying to say that GAN is nothing more than interactions between a bunch of deep neural models: a Generator, a Discriminator, and a loss function, at least for the very simple GAN. If you understand this, you understand the basic architecture of GAN. Of course in reality, GAN has evolved with many nuances to improve its result, much like the 2021 Audi engine won’t be the same as the first combustible engine, but the series will focus on the most simple combustible engine. And if I have time, I might write about more contemporary GAN in the future.
Generator
There are 2 components to a Generator: get_generator_block and Generator. Lets take a close look at what’s inside get_generator_block class.
We began first by importing pytorch from the first line, this will save us a lot of time. Of course sklearn and tensor flow can achieve the same thing. The only problem with pytorch is that it tucks away many parts that is key to understanding, so I will go over it line by line.
This is what a nn.linear(p1,p2) layer looks like. The first parameter controls the number of coefficients in a linear node (purple circle) plus a bias term (aka y-intercept). In visual data, the input that interacts with the coefficients are pixel values. The second parameter controls how many linear nodes will be in the network.
The second layer nn.BatchNorm1d() normalizes each values as shown above: divide each value with its corresponding norm. Every row has a different norm. The norm of each row is calculated by summing up all the square values of that row. As seen above, square root of 0*0 + 3*3 + 4*4 equals to 5. Normalizing data will speed up training process via speeding up convergence by a significant amount.
We use nn.Relu() because other substitutes such as sigmoid and tanh functions create vanishing/exploding gradient problem. Since this problem is outside the scope, I will post links to 2 videos that does a very good job explaining It down below.
Conclusion
Keep in mind that although get_generator_block gives us an layer in deep neural network, it self is consists of 3 layers: Linear, Batchnorm, and Relu. I understand this can be confusing, that is why you will see some literatures call it a block instead of a layer, for example the name of the class it self is get_generator_block. But I do believe there is value in calling it layer, because it is an easier analogy for people to envision something passing through a layer rather than a block. That is all for the class, I will finish up the Generator model in the next blog, stay tuned.