LeNet Notes

2020-02-24

Y LeCun, Gradient-Based Learning Applied to Document Recognition, 1998

前世:

Raw input ==> Feature extraction module ==> Feature input ==> Trainable classifier module ==> Class scores

Learning from Data

Loss Function
- E^p = D(D^P,F(Z^P,W))
- 計算Y^p與D^p間的差距
- E_train=sum(E^p/p)
- 調製W來獲得更小的E_train
Testing performance is more important than training.
E_test - E_train= k (h/p)^alpha
- p:訓練樣本數
- h:訓練模型複雜度
- 0.5 <= alpha <= 1
- k是常數
Structural risk minimization
E_train+ Beta*H(w)
- L(W) = 1/N sum(L_i(f(x_i,W),y_i)+lambda*R(W)
  - Data Loss: Model predictions should match training data
  - Regualrization:Prevent the model from doing too well on trianing data

今生:

Convolutional Network

Combine three architectural ideas to ensure some degree of shift,scale,and distortion invariance.
- Local Receptive Field
- Shared Weight
- Spatial or Temporal Subsampling
- Extract oriented edges,endpoints,corners

Architecture

Ray Sin Learning notes