Academic
  • Introduction
  • Artificial Intelligence
    • Introduction
    • AI Concepts, Terminology, and Application Areas
    • AI: Issues, Concerns and Ethical Considerations
  • Biology
    • Scientific Method
    • Chemistry of Life
    • Water, Acids, Bases
    • Properties of carbon
    • Macromolecules
    • Energy and Enzymes
    • Structure of a cell
    • Membranes and transport
    • Cellular respiration
    • Cell Signaling
    • Cell Division
    • Classical and molecular genetics
    • DNA as the genetic material
    • Central dogma
    • Gene regulation
  • Bioinformatics
    • Bioinformatics Overview
  • Deep Learning
    • Neural Networks and Deep Learning
      • Introduction
      • Logistic Regression as a Neural Network
      • Python and Vectorization
      • Shallow Neural Network
      • Deep Neural Network
    • Improving Deep Neural Networks
      • Setting up your Machine Learning Application
      • Regularizing your Neural Network
      • Setting up your Optimization Problem
      • Optimization algorithms
      • Hyperparameter, Batch Normalization, Softmax
    • Structuring Machine Learning Projects
    • Convolutional Neural Networks
      • Introduction
    • Sequence Models
      • Recurrent Neural Networks
      • Natural Language Processing & Word Embeddings
      • Sequence models & Attention mechanism
  • Linear Algebra
    • Vectors and Spaces
      • Vectors
      • Linear combinations and spans
      • Linear dependence and independence
      • Subspaces and the basis for a subspace
      • Vector dot and cross products
      • Matrices for solving systems by elimination
      • Null space and column space
    • Matrix transformations
      • Functions and linear transformations
      • Linear transformation examples
      • Transformations and matrix multiplication
      • Inverse functions and transformations
      • Finding inverses and determinants
      • More Determinant Depth
  • Machine Learning
    • Introduction
    • Linear Regression
      • Model and Cost Function
      • Parameter Learning
      • Multivariate Linear Regression
      • Computing Parameters Analytically
      • Octave
    • Logistic Regression
      • Classification and Representation
      • Logistic Regression Model
    • Regularization
      • Solving the Problem of Overfitting
    • Neural Networks
      • Introduction of Neural Networks
      • Neural Networks - Learning
    • Improve Learning Algorithm
      • Advice for Applying Machine Learning
      • Machine Learning System Design
    • Support Vector Machine
      • Large Margin Classification
      • Kernels
      • SVM in Practice
  • NCKU - Artificial Intelligence
    • Introduction
    • Intelligent Agents
    • Solving Problems by Searching
    • Beyond Classical Search
    • Learning from Examples
  • NCKU - Computer Architecture
    • First Week
  • NCKU - Data Mining
    • Introduction
    • Association Analysis
    • FP-growth
    • Other Association Rules
    • Sequence Pattern
    • Classification
    • Evaluation
    • Clustering
    • Link Analysis
  • NCKU - Machine Learning
    • Probability
    • Inference
    • Bayesian Inference
    • Introduction
  • NCKU - Robotic Navigation and Exploration
    • Kinetic Model & Vehicle Control
    • Motion Planning
    • SLAM Back-end (I)
    • SLAM Back-end (II)
    • Computer Vision / Multi-view Geometry
    • Lie group & Lie algebra
    • SLAM Front-end
  • Python
    • Numpy
    • Pandas
    • Scikit-learn
      • Introduction
      • Statistic Learning
  • Statstics
    • Quantitative Data
    • Modeling Data Distribution
    • Bivariate Numerical Data
    • Probability
    • Random Variables
    • Sampling Distribution
    • Confidence Intervals
    • Significance tests
Powered by GitBook
On this page
  • Mean
  • Variance
  • Standard Deviation
  • Expectation
  • Normal Distribution
  • Standard Normal Distribution
  • Entropy
  • Probability
  • Likelihood
  • Prior Probability
  • Posterior Probability
  • Bayes law
  • Odds ratio
  • Bent coin
  • 3.5 answer
  • Binomial Distribution
  • Maximum Likelihood
  • gaussian mixture model
  • Bayesian inference
  • Belief update
  • Maximum Likelihood Estimation
  • Random Variables
  • P-value
  • Clustering
  • k nearest neighbors
  • k-means clustering
  • soft k-mean clustering
  • Curse of demensionality
  • Demension Reduction
  • Beta Binomial Reasoning
  • Naive Bayes

Was this helpful?

  1. NCKU - Machine Learning

Introduction

PreviousBayesian InferenceNextNCKU - Robotic Navigation and Exploration

Last updated 5 years ago

Was this helpful?

Mean μ\muμ

就是求。

4,3,1,6,1,7μ=4+3+1+6+1+76=3.6\begin{aligned} &4, 3, 1, 6, 1, 7 \\ &\mu = \frac{4+3+1+6+1+7}{6} = 3.6 \end{aligned}​4,3,1,6,1,7μ=64+3+1+6+1+7​=3.6​

Variance σ\sigmaσ

代表每個數值的分散程度,越大代表越分散

就是找到。

4,3,1,6,1,7σ2=(4−3.6)2+(3−3.6)2+(1−3.6)2+(6−3.6)2+(1−3.6)2+(7−3.6)26=5.2\begin{aligned} &4, 3, 1, 6, 1, 7 \\ &\sigma^2 = \frac{(4-3.6)^2+(3-3.6)^2+(1-3.6)^2+(6-3.6)^2+(1-3.6)^2+(7-3.6)^2}{6} = 5.2 \end{aligned}​4,3,1,6,1,7σ2=6(4−3.6)2+(3−3.6)2+(1−3.6)2+(6−3.6)2+(1−3.6)2+(7−3.6)2​=5.2​

linearly variance

ifx,yindependentVAR[x+y]=VAR[x]+VAR[y]elseVAR[x+y]=VAR[x]+VAR[y]+2COV[x,y]Covariance=COV[x,y]=E[(xi−μx)(yi−μy)]μx=E[x]μy=E[y]if x, y independent\\ VAR[x+y] = VAR[x] + VAR[y]\\ else\\ VAR[x+y] = VAR[x] + VAR[y] + 2COV[x, y]\\ Covariance = COV[x, y] = E[(xi-\mu x)(yi-\mu y)]\\ \mu_x = E[x]\\ \mu_y = E[y]\\ifx,yindependentVAR[x+y]=VAR[x]+VAR[y]elseVAR[x+y]=VAR[x]+VAR[y]+2COV[x,y]Covariance=COV[x,y]=E[(xi−μx)(yi−μy)]μx​=E[x]μy​=E[y]
VAR[x+x]=VAR[x]+VAR[x]+2COV[x,x]=4VAR[x]VAR[x+x] = VAR[x] + VAR[x] + 2COV[x,x] = 4VAR[x]VAR[x+x]=VAR[x]+VAR[x]+2COV[x,x]=4VAR[x]

Pearsons correlation = COV[x,y]VAR[x]VAR[y]\frac{COV[x, y]}{\sqrt{VAR[x]VAR[y]}}VAR[x]VAR[y]​COV[x,y]​

Spearman's rank correlation

Standard Deviation σ\sigmaσ

一樣是用來表達數值間的分散程度

但更好的表達每個數值跟 mean 的平均距離有多遠

只是把 variance 開根號就好了

4,3,1,6,1,7σ=5.2=2.28\begin{aligned} &4, 3, 1, 6, 1, 7 \\ &\sigma = \sqrt{5.2} = 2.28 \end{aligned}​4,3,1,6,1,7σ=5.2​=2.28​

Expectation EEE

期望值又可以表達成 E(X)=μXE(X) = \mu_XE(X)=μX​

例如一個骰子的機率是

X

prob(X)

1

1/6

2

1/6

3

1/6

4

1/6

5

1/6

6

1/6

所以期望值為

E(X)=1×16+2×16+3×16+4×16+5×16+6×16=3.5E(X) = 1 \times \frac{1}{6} + 2 \times \frac{1}{6} + 3 \times \frac{1}{6} + 4 \times \frac{1}{6} + 5 \times \frac{1}{6} + 6 \times \frac{1}{6} = 3.5E(X)=1×61​+2×61​+3×61​+4×61​+5×61​+6×61​=3.5

linearly expectation

  • expectation (p1)

    E[f(x)]=∑ipxif(xi)E[f(x)] = \sum_ipx_if(x_i)E[f(x)]=i∑​pxi​f(xi​)

Normal Distribution

又稱作高斯分布 (Gaussian distribution)

  • 是一個 bell 的形狀

  • mean = median,且都在 distribution 的中央

  • 有大約 68% 的數值在 1 standard deviation of the mean

  • 有大約 95% 的數值在 2 standard deviation of the mean

  • 有大約 99.7% 的數值在 3 standard deviation of the mean

  • 上面三個數值的分布又稱為 Empirical rule

  • The probability density of the normal distribution is

f(x∣μ,σ2)=12πσ2e(x−μ)22σ2f(x \mid \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{(x-\mu)^2}{2\sigma^2}}f(x∣μ,σ2)=2πσ2​1​e2σ2(x−μ)2​

Standard Normal Distribution

標準常態分布是 Normal distribution 的一種

他的平均在 0,且 variance = 1

μ=0,σ2=1\mu = 0, \sigma^2 = 1μ=0,σ2=1

  • The probability density of the standard normal distribution is

φ(x)=12πe−12x2\varphi(x) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2}φ(x)=2π​1​e−21​x2

Entropy HHH

H(p1⋯pk)=∑ipilog⁡1pi=E[log⁡1pi]H(p_1 \cdots p_k) = \sum_i p_i \log\frac{1}{p_i} = E[\log\frac{1}{p_i}]H(p1​⋯pk​)=i∑​pi​logpi​1​=E[logpi​1​]

entropy can be used to predict the least bits needs to be transfer a = 1/2, b = 1/2, h = 1 a = 2/3, b = 1/3, h = .92 => can use less than 1 bit to code (use huffman code)

Probability

Likelihood

Prior Probability

prior odds P(A) = 20% P(-A) = 80% =1/4

Posterior Probability

posterior odds

P[A∣S]P[−A∣S]=P[A]P[−A]P[S∣A]P[S∣−A]\frac{P[A|S]}{P[-A|S]} = \frac{P[A]}{P[-A]} \frac{P[S|A]}{P[S|-A]}P[−A∣S]P[A∣S]​=P[−A]P[A]​P[S∣−A]P[S∣A]​

posterior = prior odds * likelihood

is better than bayes law (P[S] is not easy to get.)

Bayes law

  • odds form of bayes law (p29)

Odds ratio

Bent coin

beta integral with gamma function

∀kProb[ksuccess]=1n+1\forall k Prob[k success] = \frac{1}{n+1}∀kProb[ksuccess]=n+11​
Fa+Fb=FF_a + F_b = FFa​+Fb​=F
Fa!Fb!(Fa+Fb+1)!=1(N+1)Fa!(F−Fa)!F!=1(F+1)(FFa)\frac{F_a!F_b!}{(F_a + F_b + 1)!} =\frac{1}{(N+1)}\frac{F_a!(F-F_a)!}{F!} = \frac{1}{(F+1)\binom{F}{F_a}}(Fa​+Fb​+1)!Fa​!Fb​!​=(N+1)1​F!Fa​!(F−Fa​)!​=(F+1)(Fa​F​)1​

3.5 answer

P[Pa∣aba]=P[Pa]×P[aba∣Pa]P[aba]P[P_a\mid aba] = \frac{P[P_a]\times P[aba|P_a]}{P[aba]}P[Pa​∣aba]=P[aba]P[Pa​]×P[aba∣Pa​]​
P[aba∣Pa]=Pa2(1−Pa)P[aba|P_a] = P_a^2(1-P_a)P[aba∣Pa​]=Pa2​(1−Pa​)
P[aba]=1(3+1)(32)P[aba] = \frac{1}{(3+1)\binom{3}{2}}P[aba]=(3+1)(23​)1​
P[Pa∣aba]=Pa(Pa2(1−Pa))112P[P_a\mid aba] = \frac{P_a(P_a^2(1-P_a))}{\frac{1}{12}}P[Pa​∣aba]=121​Pa​(Pa2​(1−Pa​))​

Course goal: Full understanding of inferrence of Pa

  • binomial distribution inferrence

  • chapter 3

Binomial Distribution

Maximum Likelihood

gaussian mixture model

Bayesian inference

Belief update

Maximum Likelihood Estimation

Given f(x)=N(μ,σ2) with fixed σ2data x1⋯xnWhat is the maximum likelihood estimation of μ\text{Given } f(x) = N(\mu, \sigma^2) \text{ with fixed } \sigma^2\\ \text{data }x_1 \cdots x_n\\ \text{What is the maximum likelihood estimation of }\muGiven f(x)=N(μ,σ2) with fixed σ2data x1​⋯xn​What is the maximum likelihood estimation of μ

Random Variables

I(X,Y)=H(X)−H(X∣Y)=H(Y)−H(Y∣X)I(X, Y) = H(X) - H(X|Y) = H(Y) - H(Y|X)I(X,Y)=H(X)−H(X∣Y)=H(Y)−H(Y∣X)

if x, y independent

I(X,Y)=0I(X, Y) = 0I(X,Y)=0

if X determines Y

H(X∣Y)=0,I(X,Y)=H(X)=H(Y)H(X|Y) = 0, I(X, Y) = H(X) = H(Y)H(X∣Y)=0,I(X,Y)=H(X)=H(Y)

hw : p-val = prob of data given hypothesis

P-value

Clustering

k nearest neighbors

k-means clustering

soft k-mean clustering

p289

Curse of demensionality

Demension Reduction

Beta Binomial Reasoning

  • Conjugate

prior+likelihood=posterior formP[PH]∝PHa(1−PH)b\begin{aligned} \text{prior} + \text{likelihood} &= \text{posterior form} \\ P[P_H] &\propto P_H^a(1-P_H)^b \end{aligned}prior+likelihoodP[PH​]​=posterior form∝PHa​(1−PH​)b​
  • 這個的微分為 beta integral

Beta(x,a,b)⇒xa−1(1−x)b−1B(a,b)=Γ(a)Γ(b)xa(1−x)bΓ(a+b)Beta(x, a, b) \Rightarrow \frac{x^{a-1}(1-x)^{b-1}}{B(a, b)}= \frac{\Gamma(a)\Gamma(b)x^a(1-x)^b}{\Gamma(a+b)}Beta(x,a,b)⇒B(a,b)xa−1(1−x)b−1​=Γ(a+b)Γ(a)Γ(b)xa(1−x)b​
  • 當 a = 1, b = 1 時為 uniform prior

x0(1−x)0Γ(1)Γ(1)Γ(2)=1∣ where Γ(i)=(i−1)!\frac{x^0(1-x)^0\Gamma(1)\Gamma(1)}{\Gamma(2)} = 1 \mid \text{ where } \Gamma(i) = (i-1)!Γ(2)x0(1−x)0Γ(1)Γ(1)​=1∣ where Γ(i)=(i−1)!
  • beta distribution 不夠表達

    • 可以加上 mixture model

  • murphy p43

Naive Bayes

期望值其實只是

指的是現實中常見的

entropy (p32)

=> K-NN NEAREST NEIGHBORS

=> K-MEANS CLUSTERING

feature selection, PCA, assume features are independent

平均值
所有值跟 mean 的差的平方,然後全部合起來求平均
每個數值乘上出現的機率,得出來的總和而已
Expectation of binomial ditribution
常態分布
https://planetcalc.com/2476/
https://www.statisticshowto.datasciencecentral.com/likelihood-function/
https://www.statisticshowto.datasciencecentral.com/posterior-distribution-probability/
https://www.khanacademy.org/math/ap-statistics/probability-ap#stats-conditional-probability
https://www.statisticshowto.datasciencecentral.com/odds-ratio/
https://medium.com/@chih.sheng.huang821/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-em-%E6%BC%94%E7%AE%97%E6%B3%95-expectation-maximization-algorithm-em-%E9%AB%98%E6%96%AF%E6%B7%B7%E5%90%88%E6%A8%A1%E5%9E%8B-gaussian-mixture-model-gmm-%E5%92%8Cgmm-em%E8%A9%B3%E7%B4%B0%E6%8E%A8%E5%B0%8E-c6f634410483
https://en.wikipedia.org/wiki/Bayesian_inference
https://en.wikipedia.org/wiki/Belief_revision
https://bookdown.org/ccwang/medical_statistics6/likelihood-definition.html
https://zh.wikipedia.org/wiki/%E6%9C%80%E8%BF%91%E9%84%B0%E5%B1%85%E6%B3%95
https://medium.com/@chih.sheng.huang821/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-%E9%9B%86%E7%BE%A4%E5%88%86%E6%9E%90-k-means-clustering-e608a7fe1b43
https://en.wikipedia.org/wiki/Dimensionality_reduction#Feature_selection