Academic
  • Introduction
  • Artificial Intelligence
    • Introduction
    • AI Concepts, Terminology, and Application Areas
    • AI: Issues, Concerns and Ethical Considerations
  • Biology
    • Scientific Method
    • Chemistry of Life
    • Water, Acids, Bases
    • Properties of carbon
    • Macromolecules
    • Energy and Enzymes
    • Structure of a cell
    • Membranes and transport
    • Cellular respiration
    • Cell Signaling
    • Cell Division
    • Classical and molecular genetics
    • DNA as the genetic material
    • Central dogma
    • Gene regulation
  • Bioinformatics
    • Bioinformatics Overview
  • Deep Learning
    • Neural Networks and Deep Learning
      • Introduction
      • Logistic Regression as a Neural Network
      • Python and Vectorization
      • Shallow Neural Network
      • Deep Neural Network
    • Improving Deep Neural Networks
      • Setting up your Machine Learning Application
      • Regularizing your Neural Network
      • Setting up your Optimization Problem
      • Optimization algorithms
      • Hyperparameter, Batch Normalization, Softmax
    • Structuring Machine Learning Projects
    • Convolutional Neural Networks
      • Introduction
    • Sequence Models
      • Recurrent Neural Networks
      • Natural Language Processing & Word Embeddings
      • Sequence models & Attention mechanism
  • Linear Algebra
    • Vectors and Spaces
      • Vectors
      • Linear combinations and spans
      • Linear dependence and independence
      • Subspaces and the basis for a subspace
      • Vector dot and cross products
      • Matrices for solving systems by elimination
      • Null space and column space
    • Matrix transformations
      • Functions and linear transformations
      • Linear transformation examples
      • Transformations and matrix multiplication
      • Inverse functions and transformations
      • Finding inverses and determinants
      • More Determinant Depth
  • Machine Learning
    • Introduction
    • Linear Regression
      • Model and Cost Function
      • Parameter Learning
      • Multivariate Linear Regression
      • Computing Parameters Analytically
      • Octave
    • Logistic Regression
      • Classification and Representation
      • Logistic Regression Model
    • Regularization
      • Solving the Problem of Overfitting
    • Neural Networks
      • Introduction of Neural Networks
      • Neural Networks - Learning
    • Improve Learning Algorithm
      • Advice for Applying Machine Learning
      • Machine Learning System Design
    • Support Vector Machine
      • Large Margin Classification
      • Kernels
      • SVM in Practice
  • NCKU - Artificial Intelligence
    • Introduction
    • Intelligent Agents
    • Solving Problems by Searching
    • Beyond Classical Search
    • Learning from Examples
  • NCKU - Computer Architecture
    • First Week
  • NCKU - Data Mining
    • Introduction
    • Association Analysis
    • FP-growth
    • Other Association Rules
    • Sequence Pattern
    • Classification
    • Evaluation
    • Clustering
    • Link Analysis
  • NCKU - Machine Learning
    • Probability
    • Inference
    • Bayesian Inference
    • Introduction
  • NCKU - Robotic Navigation and Exploration
    • Kinetic Model & Vehicle Control
    • Motion Planning
    • SLAM Back-end (I)
    • SLAM Back-end (II)
    • Computer Vision / Multi-view Geometry
    • Lie group & Lie algebra
    • SLAM Front-end
  • Python
    • Numpy
    • Pandas
    • Scikit-learn
      • Introduction
      • Statistic Learning
  • Statstics
    • Quantitative Data
    • Modeling Data Distribution
    • Bivariate Numerical Data
    • Probability
    • Random Variables
    • Sampling Distribution
    • Confidence Intervals
    • Significance tests
Powered by GitBook
On this page
  • Deep L-layer Neural Network
  • Forward Propogation in a Deep Neural Network
  • Getting your matrix dimensions right (Debug)
  • Parameters and
  • Vectorized Implementation
  • Why Deep Representations ?
  • Intuition
  • Circuit Theory
  • Building Blocks of Deep Neural Networks
  • Forward Propogation
  • Backward Propogation
  • HyperParameters

Was this helpful?

  1. Deep Learning
  2. Neural Networks and Deep Learning

Deep Neural Network

PreviousShallow Neural NetworkNextImproving Deep Neural Networks

Last updated 5 years ago

Was this helpful?

Deep L-layer Neural Network

  • 只有 1-layer 的 logistic 及一些層數較少的 nn 會稱作 "shallow" neural network

  • 當你有越多的 hidden layers 代表你的 nn 越接近 deep neural network

  • 介紹一些 deep neural networks 的 notation

  • LLL 表示 layer 數

  • n[l]n^{[l]}n[l] 表示第 l 層的 units 有幾個

  • a[l]a^{[l]}a[l] 表示第 l 層的 activations

    • a[l]=g[l](z[l])a^{[l]} = g^{[l]}(z^{[l]})a[l]=g[l](z[l])

  • w[l]w^{[l]}w[l] 和 b[l]b^{[l]}b[l] 表示 z[l]z^{[l]}z[l] 的 weights

  • 另外補充

    • x=a[0]x = a^{[0]}x=a[0] 表示 input layer,共有 n[0](nx)n^{[0]} (n_x)n[0](nx​) 個 units

    • y^=a[L]\hat{y} = a^{[L]}y^​=a[L] 表示 output layer,共有 n[L]n^{[L]}n[L] 個 units

Forward Propogation in a Deep Neural Network

  • 要在上面的 5-layer nn 進行 forward propogation

  • 方法跟之前學的 2-layer 很像,只是不斷對下面這個步驟 iterate 而已

  • 下面是一個 generalize 的 forward propogation step

Z[l]=W[l]A[l−1]+b[l]X=A[0]A[l]=g[l](Z[l])\begin{aligned} Z^{[l]} &= W^{[l]}A^{[l-1]} + b^{[l]} && X = A^{[0]}\\ A^{[l]} &= g^{[l]}(Z^{[l]}) \end{aligned}Z[l]A[l]​=W[l]A[l−1]+b[l]=g[l](Z[l])​X=A[0]
  • 若有 l layers,那就要 iterate for i = 1:l

  • 通常跑完所有 layer 需要一個 explicit for-loops 而不是 vectorization

Getting your matrix dimensions right (Debug)

  • 在建置一個 nn 時,考慮好每一個 vector 及 matrix 的 dimension 非常重要

  • 是防止 program 產生問題的重點

Parameters W[l]W^{[l]}W[l] and b[l]b^{[l]}b[l]

  • 先看計算第一層 zzz 時 www 和 bbb 的 dimension

z[1]=w[1]×x+b[1](3,1)(3,2)(2,1)(3,1)(n[1],1)(n[1],n[0])(n[0],1)(n[1],1)\begin{aligned} &z^{[1]} = &w^{[1]} &\times &x &+ &b^{[1]}\\ &(3, 1) &(3, 2) & &(2, 1) & &(3, 1)\\ &(n^{[1]}, 1) &(n^{[1]}, n^{[0]}) & &(n^{[0]}, 1) & &(n^{[1]}, 1) \end{aligned}​z[1]=(3,1)(n[1],1)​w[1](3,2)(n[1],n[0])​×​x(2,1)(n[0],1)​+​b[1](3,1)(n[1],1)​
  • 可以一般化 www 和 bbb 在計算 zzz 時的 dimension

  • 在 backprop 時,產生的 dw,dbdw, dbdw,db 也會和 w,bw, bw,b 一模一樣 dimension

w[l]=(n[l],n[l−1])b[l]=(n[l],1)=z[l]\begin{aligned} w^{[l]} &= (n^{[l]}, n^{[l-1]}) \\ b^{[l]} &= (n^{[l]}, 1) = z^{[l]} \end{aligned}w[l]b[l]​=(n[l],n[l−1])=(n[l],1)=z[l]​

Vectorized Implementation

  • 使用 vectorization 一次執行 m 筆 training examples 於 forward propogation 時

  • 原本的 z 變成一個 m columns 的矩陣

Z[1]=[∣∣∣z[1](1)z[1](2)⋯z[1](m)∣∣∣]Z^{[1]} = \begin{bmatrix} |&|&&|\\ z^{[1](1)}&z^{[1](2)}&\cdots&z^{[1](m)}\\ |&|&&| \end{bmatrix}Z[1]=​∣z[1](1)∣​∣z[1](2)∣​⋯​∣z[1](m)∣​​
  • z 的運算變成這樣

    • b 保持不變是因為 python 會自動使用 broadcasting 技巧

      Z[1]=W[1]×X+b[1](n[1],m)(n[1],n[0])(n[0],m)(n[1],1)\begin{aligned} &Z^{[1]} = &W^{[1]} &\times &X &+ &b^{[1]}\\ &(n^{[1]}, m) &(n^{[1]}, n^{[0]}) & &(n^{[0]}, m) & &(n^{[1]}, 1) \end{aligned}​Z[1]=(n[1],m)​W[1](n[1],n[0])​×​X(n[0],m)​+​b[1](n[1],1)​
  • 簡單來說,z,az, az,a 變成 Z,AZ, AZ,A 在 dimension 上就是從 1 column 變成 m columns

z[l],a[l]:(n[l],1)→Z[l],A[l]:(n[l],m)z^{[l]}, a^{[l]} : (n^{[l]}, 1) \rightarrow Z^{[l]}, A^{[l]} : (n^{[l]}, m)z[l],a[l]:(n[l],1)→Z[l],A[l]:(n[l],m)

Why Deep Representations ?

以下用兩個方向來解釋為什麼 nn 越 deep 越好

Intuition

  • 有了 Deep neural network

  • 我們可以在第一層 hidden layer 對圖片做簡單運算和偵測

    • 例如找出圖片的水平或垂直邊緣線

  • 接著組合上一層結果,在下一層進一步運算

    • 例如找出五官或是一些部位

  • 越往下層走,就可以得到越複雜的計算

    • 例如到最後可以進行人臉辨識

  • 另一個例子是語音辨識系統

    • Audio -> low level audio waves -> phonemes -> words -> sentences

  • 所以 deep learning 能將一件是從 simple 做到 complex 得到近似解

Circuit Theory

  • 另一個例子是用邏輯電路來呈現

  • 假設我要運算出 x1 XOR x2 XOR x3 XOR ⋯ XOR xnx_1 \text{ XOR } x_2 \text{ XOR } x_3 \text{ XOR } \cdots \text{ XOR } x_nx1​ XOR x2​ XOR x3​ XOR ⋯ XOR xn​

  • 一般的做法都會是使用一個 Olog⁡(N)O\log(N)Olog(N) 的方法 (意指 deep nn 的好處)

  • 另一種 shallow 的方法則需要展開 O(2N)O(2^N)O(2N) 的 nodes 才能做到

Informally:

There are functions you can compute with a "small" L-layer deep neural network

that shallower networks require exponentially more hidden units to compute.

Building Blocks of Deep Neural Networks

  • 在實際建置 dnn 時,可以把每一個 A[l]→A[l+1]A^{[l]} \rightarrow A^{[l+1]}A[l]→A[l+1] 的 forward 想成一個 block

  • 同理的,也可以將 dA[l−1]←dA[l]dA^{[l-1]} \leftarrow dA^{[l]}dA[l−1]←dA[l] 的 backward 想成一個 block

  • forward 會將計算好的值 cache 起來給 backward 使用

Forward Propogation

  • Input : a[l−1]a^{[l-1]}a[l−1]

  • Output : a[l]a^{[l]}a[l]

  • Cache : z[l],w[l],b[l]z^{[l]}, w^{[l]}, b^{[l]}z[l],w[l],b[l]

  • Process :

Z[l]=W[l]⋅A[l−1]+b[l]A[l]=g[l](Z[l])\begin{aligned} Z^{[l]} &= W^{[l]} \cdot A^{[l-1]} + b^{[l]} \\ A^{[l]} &= g^{[l]}(Z^{[l]}) \end{aligned}Z[l]A[l]​=W[l]⋅A[l−1]+b[l]=g[l](Z[l])​

Backward Propogation

  • Input : da[l]da^{[l]}da[l]

  • Ouput : da[l−1],dW[l],db[l]da^{[l-1]}, dW^{[l]}, db^{[l]}da[l−1],dW[l],db[l]

  • Process :

dZ[l]=dA[l]⋅∗g[l]′(Z[l])dW[l]=1mdZ[l]⋅A[l−1]Tdb[l]=1mnp.sum(dZ[l])dA[l−1]=W[l]T⋅dZ[l]\begin{aligned} dZ^{[l]} &= dA^{[l]} \cdot \ast g^{[l]'}(Z^{[l]}) \\ dW^{[l]} &= \frac{1}{m} dZ^{[l]} \cdot A^{[l-1]T} \\ db^{[l]} &= \frac{1}{m} \text{np.sum}(dZ^{[l]}) \\ dA^{[l-1]} &= W^{[l]T} \cdot dZ^{[l]} \end{aligned}dZ[l]dW[l]db[l]dA[l−1]​=dA[l]⋅∗g[l]′(Z[l])=m1​dZ[l]⋅A[l−1]T=m1​np.sum(dZ[l])=W[l]T⋅dZ[l]​
  • 整個組合起來的 blocks 會長成這樣

HyperParameters

  • 一般的 parameters 是指 W[1],b[1],W[2],b[2],⋯W^{[1]}, b^{[1]}, W^{[2]}, b^{[2]}, \cdotsW[1],b[1],W[2],b[2],⋯

  • hyperparameters 指的是那些足以影響 parameters 的參數

  • 這些 hyperparameters 的設定往往依靠著經驗來做

  • 現有的舉例有

    • learning rate α\alphaα

    • number of iterations

    • number of hidden layers LLL

    • number of hidden units n[1],n[2],⋯n^{[1]}, n^{[2]}, \cdotsn[1],n[2],⋯

    • choice of activation functions tanh,ReLU,...tanh, ReLU, ...tanh,ReLU,...

  • 未來還會有

    • momentum

    • mini-batch size

    • regularization algorithms

  • 所以 deep learning application 其實是一個 very empirical process

    • 常需要在 Idea -> Code -> Experiment 中不斷循環