Introduction

Mean μ\mu

就是求平均值

4,3,1,6,1,7μ=4+3+1+6+1+76=3.6\begin{aligned} &4, 3, 1, 6, 1, 7 \\ &\mu = \frac{4+3+1+6+1+7}{6} = 3.6 \end{aligned}

Variance σ\sigma

代表每個數值的分散程度,越大代表越分散

就是找到所有值跟 mean 的差的平方,然後全部合起來求平均

4,3,1,6,1,7σ2=(43.6)2+(33.6)2+(13.6)2+(63.6)2+(13.6)2+(73.6)26=5.2\begin{aligned} &4, 3, 1, 6, 1, 7 \\ &\sigma^2 = \frac{(4-3.6)^2+(3-3.6)^2+(1-3.6)^2+(6-3.6)^2+(1-3.6)^2+(7-3.6)^2}{6} = 5.2 \end{aligned}

linearly variance

ifx,yindependentVAR[x+y]=VAR[x]+VAR[y]elseVAR[x+y]=VAR[x]+VAR[y]+2COV[x,y]Covariance=COV[x,y]=E[(xiμx)(yiμy)]μx=E[x]μy=E[y]if x, y independent\\ VAR[x+y] = VAR[x] + VAR[y]\\ else\\ VAR[x+y] = VAR[x] + VAR[y] + 2COV[x, y]\\ Covariance = COV[x, y] = E[(xi-\mu x)(yi-\mu y)]\\ \mu_x = E[x]\\ \mu_y = E[y]\\
VAR[x+x]=VAR[x]+VAR[x]+2COV[x,x]=4VAR[x]VAR[x+x] = VAR[x] + VAR[x] + 2COV[x,x] = 4VAR[x]

Pearsons correlation = COV[x,y]VAR[x]VAR[y]\frac{COV[x, y]}{\sqrt{VAR[x]VAR[y]}}

Spearman's rank correlation

Standard Deviation σ\sigma

一樣是用來表達數值間的分散程度

但更好的表達每個數值跟 mean 的平均距離有多遠

只是把 variance 開根號就好了

4,3,1,6,1,7σ=5.2=2.28\begin{aligned} &4, 3, 1, 6, 1, 7 \\ &\sigma = \sqrt{5.2} = 2.28 \end{aligned}

Expectation EE

期望值其實只是每個數值乘上出現的機率,得出來的總和而已

期望值又可以表達成 E(X)=μXE(X) = \mu_X

例如一個骰子的機率是

X

prob(X)

1

1/6

2

1/6

3

1/6

4

1/6

5

1/6

6

1/6

所以期望值為

E(X)=1×16+2×16+3×16+4×16+5×16+6×16=3.5E(X) = 1 \times \frac{1}{6} + 2 \times \frac{1}{6} + 3 \times \frac{1}{6} + 4 \times \frac{1}{6} + 5 \times \frac{1}{6} + 6 \times \frac{1}{6} = 3.5

linearly expectation

  • expectation (p1)

    E[f(x)]=ipxif(xi)E[f(x)] = \sum_ipx_if(x_i)

Normal Distribution

指的是現實中常見的常態分布

又稱作高斯分布 (Gaussian distribution)

  • 是一個 bell 的形狀

  • mean = median,且都在 distribution 的中央

  • 有大約 68% 的數值在 1 standard deviation of the mean

  • 有大約 95% 的數值在 2 standard deviation of the mean

  • 有大約 99.7% 的數值在 3 standard deviation of the mean

  • 上面三個數值的分布又稱為 Empirical rule

  • The probability density of the normal distribution is

f(xμ,σ2)=12πσ2e(xμ)22σ2f(x \mid \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{(x-\mu)^2}{2\sigma^2}}

Standard Normal Distribution

標準常態分布是 Normal distribution 的一種

他的平均在 0,且 variance = 1

μ=0,σ2=1\mu = 0, \sigma^2 = 1

  • The probability density of the standard normal distribution is

φ(x)=12πe12x2\varphi(x) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2}

Entropy HH

entropy (p32) https://planetcalc.com/2476/

H(p1pk)=ipilog1pi=E[log1pi]H(p_1 \cdots p_k) = \sum_i p_i \log\frac{1}{p_i} = E[\log\frac{1}{p_i}]

entropy can be used to predict the least bits needs to be transfer a = 1/2, b = 1/2, h = 1 a = 2/3, b = 1/3, h = .92 => can use less than 1 bit to code (use huffman code)

Probability

Likelihood

https://www.statisticshowto.datasciencecentral.com/likelihood-function/

Prior Probability

prior odds P(A) = 20% P(-A) = 80% =1/4

Posterior Probability

https://www.statisticshowto.datasciencecentral.com/posterior-distribution-probability/

posterior odds

P[AS]P[AS]=P[A]P[A]P[SA]P[SA]\frac{P[A|S]}{P[-A|S]} = \frac{P[A]}{P[-A]} \frac{P[S|A]}{P[S|-A]}

posterior = prior odds * likelihood

is better than bayes law (P[S] is not easy to get.)

Bayes law

https://www.khanacademy.org/math/ap-statistics/probability-ap#stats-conditional-probability

  • odds form of bayes law (p29)

Odds ratio

https://www.statisticshowto.datasciencecentral.com/odds-ratio/

Bent coin

beta integral with gamma function

kProb[ksuccess]=1n+1\forall k Prob[k success] = \frac{1}{n+1}
Fa+Fb=FF_a + F_b = F
Fa!Fb!(Fa+Fb+1)!=1(N+1)Fa!(FFa)!F!=1(F+1)(FFa)\frac{F_a!F_b!}{(F_a + F_b + 1)!} =\frac{1}{(N+1)}\frac{F_a!(F-F_a)!}{F!} = \frac{1}{(F+1)\binom{F}{F_a}}

3.5 answer

P[Paaba]=P[Pa]×P[abaPa]P[aba]P[P_a\mid aba] = \frac{P[P_a]\times P[aba|P_a]}{P[aba]}
P[abaPa]=Pa2(1Pa)P[aba|P_a] = P_a^2(1-P_a)
P[aba]=1(3+1)(32)P[aba] = \frac{1}{(3+1)\binom{3}{2}}
P[Paaba]=Pa(Pa2(1Pa))112P[P_a\mid aba] = \frac{P_a(P_a^2(1-P_a))}{\frac{1}{12}}

Course goal: Full understanding of inferrence of Pa

  • binomial distribution inferrence

  • chapter 3

Binomial Distribution

Maximum Likelihood

gaussian mixture model

https://medium.com/@chih.sheng.huang821/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-em-%E6%BC%94%E7%AE%97%E6%B3%95-expectation-maximization-algorithm-em-%E9%AB%98%E6%96%AF%E6%B7%B7%E5%90%88%E6%A8%A1%E5%9E%8B-gaussian-mixture-model-gmm-%E5%92%8Cgmm-em%E8%A9%B3%E7%B4%B0%E6%8E%A8%E5%B0%8E-c6f634410483

Bayesian inference

https://en.wikipedia.org/wiki/Bayesian_inference

Belief update

https://en.wikipedia.org/wiki/Belief_revision

Maximum Likelihood Estimation

https://bookdown.org/ccwang/medical_statistics6/likelihood-definition.html

Given f(x)=N(μ,σ2) with fixed σ2data x1xnWhat is the maximum likelihood estimation of μ\text{Given } f(x) = N(\mu, \sigma^2) \text{ with fixed } \sigma^2\\ \text{data }x_1 \cdots x_n\\ \text{What is the maximum likelihood estimation of }\mu

Random Variables

I(X,Y)=H(X)H(XY)=H(Y)H(YX)I(X, Y) = H(X) - H(X|Y) = H(Y) - H(Y|X)

if x, y independent

I(X,Y)=0I(X, Y) = 0

if X determines Y

H(XY)=0,I(X,Y)=H(X)=H(Y)H(X|Y) = 0, I(X, Y) = H(X) = H(Y)

hw : p-val = prob of data given hypothesis

P-value

Clustering

k nearest neighbors

=> K-NN NEAREST NEIGHBORS https://zh.wikipedia.org/wiki/%E6%9C%80%E8%BF%91%E9%84%B0%E5%B1%85%E6%B3%95

k-means clustering

=> K-MEANS CLUSTERING https://medium.com/@chih.sheng.huang821/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-%E9%9B%86%E7%BE%A4%E5%88%86%E6%9E%90-k-means-clustering-e608a7fe1b43

soft k-mean clustering

p289

Curse of demensionality

Demension Reduction

https://en.wikipedia.org/wiki/Dimensionality_reduction#Feature_selection feature selection, PCA, assume features are independent

Beta Binomial Reasoning

  • Conjugate

prior+likelihood=posterior formP[PH]PHa(1PH)b\begin{aligned} \text{prior} + \text{likelihood} &= \text{posterior form} \\ P[P_H] &\propto P_H^a(1-P_H)^b \end{aligned}
  • 這個的微分為 beta integral

Beta(x,a,b)xa1(1x)b1B(a,b)=Γ(a)Γ(b)xa(1x)bΓ(a+b)Beta(x, a, b) \Rightarrow \frac{x^{a-1}(1-x)^{b-1}}{B(a, b)}= \frac{\Gamma(a)\Gamma(b)x^a(1-x)^b}{\Gamma(a+b)}
  • 當 a = 1, b = 1 時為 uniform prior

x0(1x)0Γ(1)Γ(1)Γ(2)=1 where Γ(i)=(i1)!\frac{x^0(1-x)^0\Gamma(1)\Gamma(1)}{\Gamma(2)} = 1 \mid \text{ where } \Gamma(i) = (i-1)!
  • beta distribution 不夠表達

    • 可以加上 mixture model

  • murphy p43

Naive Bayes

Last updated