Network in Network Notes

M Lin, Network In Network, 2013

Conventional convolution layer

  • Input --> Linear filter --> Feature map
  • Linear filter
    • Generalized linear model(GLM)
      • Low level abstraction ability
      • Abstraction = invariant to variants of the same concept
        • => It is possible to make them more "non-linear" ?

Linearity

  • H(x) = y
    • H(kx)=ky
    • H(x1+x2)=H(x1)+H(x2)=y1+y2
  • Linear separable
    • To separate points in n-dimension with n-1 dimension
  • To increase feature dimensions
  • To utilize "over-complete set of filters"
    • Can cause extra burden on the next layer

MLP convolution layers

  • Add a "mirco network"
  • Why is MLP?
    • MLP is trained using back-propagation
    • MLP can be a deep model itself
    • -> Cross feature map pooling
    • -> Equivalent to 1x1 convolution layer
  • Compariosn to max-out layers
    • Can model any function.(max-out: any convex function)
    • A universal function approximator

Global Average Pooling

  • Fully connected layers are prone to overfitting
  • There is no parameter in the global average pooling

Layers calculation

  • Linear convolution layer
    • fi,j,k = max(wkTxi,j,0)
  • MLPconv layer
    • f1i,j,k1 = max(w1k1Txi,j+bk1, 0)
    • ...
    • fi,j,kn = max(wnkTxi,j+bkn, 0)

reference:https://arxiv.org/pdf/1312.4400.pdf