Y LeCun, Gradient-Based Learning Applied to Document Recognition, 1998
前世:
Raw input ==> Feature extraction module ==> Feature input ==> Trainable classifier module ==> Class scores
Learning from Data
- Loss Function
- Ep = D(DP,F(ZP,W))
- 計算Yp與Dp間的差距
- Etrain=sum(Ep/p)
- 調製W來獲得更小的Etrain
- Testing performance is more important than training.
- Etest - Etrain = k (h/p)alpha
- p:訓練樣本數
- h:訓練模型複雜度
- 0.5 <= alpha <= 1
- k是常數
- Structural risk minimization
- Etrain + Beta*H(w)
- L(W) = 1/N sum(Li(f(xi,W),yi)+lambda*R(W)
- Data Loss: Model predictions should match training data
- Regualrization:Prevent the model from doing too well on trianing data
- L(W) = 1/N sum(Li(f(xi,W),yi)+lambda*R(W)
今生:
- Convolutional Network
- Combine three architectural ideas to ensure some degree of shift,scale,and distortion invariance.
- Local Receptive Field
- Shared Weight
- Spatial or Temporal Subsampling
- Extract oriented edges,endpoints,corners
- Architecture
-
Input C1 S2 C3 S4 C5 F6 Ouput 32x32 6@28x28 6@14x14 16@10x10 16@5x5 120 84 10 filters 5x5 2x2 5x5 2x2 5x5 參數 (5x5+1)x6 6x2 (5x5x6x10)+16 (1+1)x16=32 (400x120)+120 (120x84)+84 連接點 (5x5+1)x28x28x6 (2x2+1)x14x14x6 參數x10x10 80x5x5 (400x120)+120 (120x84)+84
-
- Combine three architectural ideas to ensure some degree of shift,scale,and distortion invariance.
- Other technique
- Back-propagation
- HOS(Heuristic oversegmentation):啟發式思維
- 文字辨識不只會是辨識一個字母而是
- 郵遞區號
- 支票數字
- 文字
- Word-level辨識的優勢
- 拒絕分割錯誤的優勢
- 降低整體辨識錯誤率
- Viterbi transfomer
- 找出最好的路徑
- Viterbi Algorithm
- 是一種動態規劃演算法
- 用於尋找最有可能產生觀測事件序列的維特比路徑-隱含狀態列
- 特別是在馬可夫資訊源上下文和隱藏式馬可夫模型中
reference:http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf