M Lin, Network In Network, 2013
Conventional convolution layer
- Input --> Linear filter --> Feature map
- Linear filter
- Generalized linear model(GLM)
- Low level abstraction ability
- Abstraction = invariant to variants of the same concept
- => It is possible to make them more "non-linear" ?
- Generalized linear model(GLM)
Linearity
- H(x) = y
- H(kx)=ky
- H(x1+x2)=H(x1)+H(x2)=y1+y2
- Linear separable
- To separate points in n-dimension with n-1 dimension
- To increase feature dimensions
- To utilize "over-complete set of filters"
- Can cause extra burden on the next layer
MLP convolution layers
- Add a "mirco network"
- Why is MLP?
- MLP is trained using back-propagation
- MLP can be a deep model itself
- -> Cross feature map pooling
- -> Equivalent to 1x1 convolution layer
- Compariosn to max-out layers
- Can model any function.(max-out: any convex function)
- A universal function approximator
Global Average Pooling
- Fully connected layers are prone to overfitting
- There is no parameter in the global average pooling
Layers calculation
- Linear convolution layer
- fi,j,k = max(wkTxi,j,0)
- MLPconv layer
- f1i,j,k1 = max(w1k1Txi,j+bk1, 0)
- ...
- fi,j,kn = max(wnkTxi,j+bkn, 0)
reference:https://arxiv.org/pdf/1312.4400.pdf