Academic
  • Introduction
  • Artificial Intelligence
    • Introduction
    • AI Concepts, Terminology, and Application Areas
    • AI: Issues, Concerns and Ethical Considerations
  • Biology
    • Scientific Method
    • Chemistry of Life
    • Water, Acids, Bases
    • Properties of carbon
    • Macromolecules
    • Energy and Enzymes
    • Structure of a cell
    • Membranes and transport
    • Cellular respiration
    • Cell Signaling
    • Cell Division
    • Classical and molecular genetics
    • DNA as the genetic material
    • Central dogma
    • Gene regulation
  • Bioinformatics
    • Bioinformatics Overview
  • Deep Learning
    • Neural Networks and Deep Learning
      • Introduction
      • Logistic Regression as a Neural Network
      • Python and Vectorization
      • Shallow Neural Network
      • Deep Neural Network
    • Improving Deep Neural Networks
      • Setting up your Machine Learning Application
      • Regularizing your Neural Network
      • Setting up your Optimization Problem
      • Optimization algorithms
      • Hyperparameter, Batch Normalization, Softmax
    • Structuring Machine Learning Projects
    • Convolutional Neural Networks
      • Introduction
    • Sequence Models
      • Recurrent Neural Networks
      • Natural Language Processing & Word Embeddings
      • Sequence models & Attention mechanism
  • Linear Algebra
    • Vectors and Spaces
      • Vectors
      • Linear combinations and spans
      • Linear dependence and independence
      • Subspaces and the basis for a subspace
      • Vector dot and cross products
      • Matrices for solving systems by elimination
      • Null space and column space
    • Matrix transformations
      • Functions and linear transformations
      • Linear transformation examples
      • Transformations and matrix multiplication
      • Inverse functions and transformations
      • Finding inverses and determinants
      • More Determinant Depth
  • Machine Learning
    • Introduction
    • Linear Regression
      • Model and Cost Function
      • Parameter Learning
      • Multivariate Linear Regression
      • Computing Parameters Analytically
      • Octave
    • Logistic Regression
      • Classification and Representation
      • Logistic Regression Model
    • Regularization
      • Solving the Problem of Overfitting
    • Neural Networks
      • Introduction of Neural Networks
      • Neural Networks - Learning
    • Improve Learning Algorithm
      • Advice for Applying Machine Learning
      • Machine Learning System Design
    • Support Vector Machine
      • Large Margin Classification
      • Kernels
      • SVM in Practice
  • NCKU - Artificial Intelligence
    • Introduction
    • Intelligent Agents
    • Solving Problems by Searching
    • Beyond Classical Search
    • Learning from Examples
  • NCKU - Computer Architecture
    • First Week
  • NCKU - Data Mining
    • Introduction
    • Association Analysis
    • FP-growth
    • Other Association Rules
    • Sequence Pattern
    • Classification
    • Evaluation
    • Clustering
    • Link Analysis
  • NCKU - Machine Learning
    • Probability
    • Inference
    • Bayesian Inference
    • Introduction
  • NCKU - Robotic Navigation and Exploration
    • Kinetic Model & Vehicle Control
    • Motion Planning
    • SLAM Back-end (I)
    • SLAM Back-end (II)
    • Computer Vision / Multi-view Geometry
    • Lie group & Lie algebra
    • SLAM Front-end
  • Python
    • Numpy
    • Pandas
    • Scikit-learn
      • Introduction
      • Statistic Learning
  • Statstics
    • Quantitative Data
    • Modeling Data Distribution
    • Bivariate Numerical Data
    • Probability
    • Random Variables
    • Sampling Distribution
    • Confidence Intervals
    • Significance tests
Powered by GitBook
On this page
  • Vectorization
  • Vectorizing Logistic Regression
  • Vectorizing Gradient Output

Was this helpful?

  1. Deep Learning
  2. Neural Networks and Deep Learning

Python and Vectorization

PreviousLogistic Regression as a Neural NetworkNextShallow Neural Network

Last updated 5 years ago

Was this helpful?

Vectorization

ไปฅไธ‹็”จไธ€ๅ€‹ python ไพ‹ๅญไพ†่งฃ้‡‹ vectorization ็š„ๅฅฝ่™•

ๅ‡่จญๆˆ‘ๅ€‘่ฆๆฑ‚ wTxw^TxwTx๏ผŒ่€Œ w,xw, xw,x ้ƒฝๆ˜ฏ 1000000 * 1 ็š„็Ÿฉ้™ฃ

้€™ๆ˜ฏไธ€ๅ€‹ for loop ๆ–นๆณ•

w = np.random.rand(1000000)
x = np.random.rand(1000000)

c = 0
for i in range(1000000):
    c += w[i] * x[i]

# time = 474.29 ms

้€™ๅ€‹ไธ€ๅ€‹ vectorization ็š„ๆ–นๆณ•

import numpy as np

c = np.dot(w, x)

# time = 1.5 ms

้€™ไบ›ๅ‘้‡ๅŒ–็š„ๆ–นๆณ•๏ผŒๆ˜ฏๅˆฉ็”จ CPU ๆˆ– GPU ็š„ SIMD ๅนณ่กŒๆŒ‡ไปค

ๆ‰€ไปฅๅฏไปฅ็œ‹ๅˆฐ vectorization ๆฏ” for loop ๅฟซไธŠไบ†ไธ‰็™พๅคšๅ€

ๅ› ๆญคไปฅๅพŒๅœจ็ทจๅฏซ็จ‹ๅผๆ™‚๏ผŒๅœจๅฏไปฅไฝฟ็”จ vectorization ๆ™‚ๅฐฑๆ‡‰็›ก้‡้ฟๅ…ไฝฟ็”จ for loop

Guideline : "Whenever possible, avoid explicit for-loops"

Vectorizing Logistic Regression

ๆˆ‘ๅ€‘ๅฐ‡ forward propogation ็š„ for-loops ่ฉฆ่‘—่ฝ‰ๆˆ vectorization

ๅˆฉ็”จ for-loops ๆˆ‘ๅ€‘ๅฟ…้ ˆ่ฆ้€™ๆจฃๅš

ๆˆ‘ๅ€‘็Ÿฅ้“ X ๆ˜ฏไธ€ๅ€‹ nx * m ็š„็Ÿฉ้™ฃ

็„ถๅพŒๅ†่จˆ็ฎ—ๅ‡บ 1 * m ็š„ A ็Ÿฉ้™ฃ

ๅœจ python ไธญ๏ผŒZ ็š„้‹็ฎ—ๅฆ‚ไธ‹

Z = np.dot(w.T, X) + b

Vectorizing Gradient Output

dZ ๅ…ถๅฏฆๅฐฑๆ˜ฏ็Ÿฉ้™ฃ A ๅ’Œ็Ÿฉ้™ฃ y ็›ธๆธ›

ๆˆ‘ๅ€‘ๅฐ‡ dw1, dw2, ... ็ต„ๅˆๆˆ็‚บไธ€ๅ€‹ nx * 1 ็š„ dw ็Ÿฉ้™ฃ

่€Œ db ๅ‰‡ๆ˜ฏ็”จ sum ็š„ๆ–นๆณ•ไน˜่ตทไพ†ๅนณๅ‡

็พๅœจๆˆ‘ๅ€‘ๅฐฑ่ƒฝ็”จไธ€ๅ€‹ vectorization ็š„ๆ–นๆณ•ไธ€ๆฌก่จˆ็ฎ—ไธ€ๅ€‹ gradient descent ็š„ๆƒ…ๅฝข

Z = np.dot(w.T, X) + b
A = sigmoid(Z)

dZ = A - y
dw = 1/m * X * dZ.T
db = 1/m * np.sum(dZ)

w = w - a * dw
b = b - a * db
z(1)=wTx(1)+bz(2)=wTx(2)+bz(3)=wTx(3)+b...a(1)=ฯƒ(z(1))a(2)=ฯƒ(z(2))a(3)=ฯƒ(z(3))...\begin{aligned} &z^{(1)} = w^Tx^{(1)}+b &&z^{(2)} = w^Tx^{(2)}+b &&&z^{(3)} = w^Tx^{(3)}+b &&&& ... \\ &a^{(1)} = \sigma(z^{(1)}) &&a^{(2)} = \sigma(z^{(2)}) &&&a^{(3)} = \sigma(z^{(3)}) &&&& ... \end{aligned}โ€‹z(1)=wTx(1)+ba(1)=ฯƒ(z(1))โ€‹โ€‹z(2)=wTx(2)+ba(2)=ฯƒ(z(2))โ€‹โ€‹โ€‹z(3)=wTx(3)+ba(3)=ฯƒ(z(3))โ€‹โ€‹โ€‹โ€‹......โ€‹
X=[โˆฃโˆฃโˆฃx(1)x(2)โ‹ฏx(m)โˆฃโˆฃโˆฃ]X = \begin{bmatrix} |&|&&|\\ x^{(1)}&x^{(2)}&\cdots&x^{(m)}\\ |&|&&|\\ \end{bmatrix}X=โ€‹โˆฃx(1)โˆฃโ€‹โˆฃx(2)โˆฃโ€‹โ‹ฏโ€‹โˆฃx(m)โˆฃโ€‹โ€‹

ๆˆ‘ๅ€‘ๅฏไปฅๅฐ‡ๆ•ดๅ€‹็Ÿฉ้™ฃ็›ดๆŽฅ่ทŸ wTw^TwT ็›ธไน˜๏ผŒๅพ—ๅˆฐไธ€ๅ€‹ 1 * m ็š„ ZZZ ็Ÿฉ้™ฃ

Z=[z(1)z(2)โ‹ฏz(m)]=wTX+b=[wTx(1)+bwTx(2)+bโ‹ฏwTx(m)+b]\begin{aligned} Z &= \begin{bmatrix}z^{(1)} & z^{(2)} & \cdots z^{(m)}\end{bmatrix}\\ &= w^TX + b\\ &= \begin{bmatrix}w^Tx^{(1)}+b & w^Tx^{(2)}+b & \cdots w^Tx^{(m)}+b\end{bmatrix} \end{aligned}Zโ€‹=[z(1)โ€‹z(2)โ€‹โ‹ฏz(m)โ€‹]=wTX+b=[wTx(1)+bโ€‹wTx(2)+bโ€‹โ‹ฏwTx(m)+bโ€‹]โ€‹
A=[a(1)a(2)โ‹ฏa(m)]=ฯƒ(Z)A = \begin{bmatrix}a^{(1)} & a^{(2)}& \cdots a^{(m)}\end{bmatrix} = \sigma(Z)A=[a(1)โ€‹a(2)โ€‹โ‹ฏa(m)โ€‹]=ฯƒ(Z)

็พๅœจๆˆ‘ๅ€‘ไพ†ๅฐ‡ gradient (backpropogation) ็š„ dZ,dw,dbdZ, dw, dbdZ,dw,db ไนŸ้€ฒ่กŒๅ‘้‡ๅŒ–

dZ=[dz(1)dz(2)โ‹ฏdz(m)]=Aโˆ’Y=[a(1)โˆ’y(1)a(2)โˆ’y(2)โ‹ฏa(m)โˆ’y(m)]\begin{aligned} dZ &= \begin{bmatrix}dz^{(1)} & dz^{(2)} & \cdots & dz^{(m)}\end{bmatrix} \\ &= A - Y \\ &= \begin{bmatrix}a^{(1)} - y^{(1)} & a^{(2)} - y^{(2)} & \cdots & a^{(m)} - y^{(m)}\end{bmatrix} \end{aligned}dZโ€‹=[dz(1)โ€‹dz(2)โ€‹โ‹ฏโ€‹dz(m)โ€‹]=Aโˆ’Y=[a(1)โˆ’y(1)โ€‹a(2)โˆ’y(2)โ€‹โ‹ฏโ€‹a(m)โˆ’y(m)โ€‹]โ€‹
dw=1mXdZT=1m[โˆฃโˆฃโˆฃx(1)x(2)โ‹ฏx(m)โˆฃโˆฃโˆฃ][dZ(1)โ‹ฎdZ(m)]\begin{aligned} dw &= \frac{1}{m}X dZ^T \\ &= \frac{1}{m} \begin{bmatrix} |&|&&|\\ x^{(1)}&x^{(2)}&\cdots&x^{(m)}\\ |&|&&|\\ \end{bmatrix} \begin{bmatrix} dZ^{(1)}\\\vdots\\dZ^{(m)} \end{bmatrix} \end{aligned}dwโ€‹=m1โ€‹XdZT=m1โ€‹โ€‹โˆฃx(1)โˆฃโ€‹โˆฃx(2)โˆฃโ€‹โ‹ฏโ€‹โˆฃx(m)โˆฃโ€‹โ€‹โ€‹dZ(1)โ‹ฎdZ(m)โ€‹โ€‹โ€‹
db=1mโˆ‘i=1mdZ(i)=1mnp.sum(dZ)\begin{aligned} db &= \frac{1}{m} \sum_{i=1}^m dZ^{(i)} \\ &= \frac{1}{m} \text{np.sum}(dZ) \end{aligned}dbโ€‹=m1โ€‹i=1โˆ‘mโ€‹dZ(i)=m1โ€‹np.sum(dZ)โ€‹