就是求 。
4 , 3 , 1 , 6 , 1 , 7 μ = 4 + 3 + 1 + 6 + 1 + 7 6 = 3.6 \begin{aligned}
&4, 3, 1, 6, 1, 7 \\
&\mu = \frac{4+3+1+6+1+7}{6} = 3.6
\end{aligned} 4 , 3 , 1 , 6 , 1 , 7 μ = 6 4 + 3 + 1 + 6 + 1 + 7 = 3.6 代表每個數值的分散程度,越大代表越分散
就是找到 。
4 , 3 , 1 , 6 , 1 , 7 σ 2 = ( 4 − 3.6 ) 2 + ( 3 − 3.6 ) 2 + ( 1 − 3.6 ) 2 + ( 6 − 3.6 ) 2 + ( 1 − 3.6 ) 2 + ( 7 − 3.6 ) 2 6 = 5.2 \begin{aligned}
&4, 3, 1, 6, 1, 7 \\
&\sigma^2 = \frac{(4-3.6)^2+(3-3.6)^2+(1-3.6)^2+(6-3.6)^2+(1-3.6)^2+(7-3.6)^2}{6} = 5.2
\end{aligned} 4 , 3 , 1 , 6 , 1 , 7 σ 2 = 6 ( 4 − 3.6 ) 2 + ( 3 − 3.6 ) 2 + ( 1 − 3.6 ) 2 + ( 6 − 3.6 ) 2 + ( 1 − 3.6 ) 2 + ( 7 − 3.6 ) 2 = 5.2 linearly variance
i f x , y i n d e p e n d e n t V A R [ x + y ] = V A R [ x ] + V A R [ y ] e l s e V A R [ x + y ] = V A R [ x ] + V A R [ y ] + 2 C O V [ x , y ] C o v a r i a n c e = C O V [ x , y ] = E [ ( x i − μ x ) ( y i − μ y ) ] μ x = E [ x ] μ y = E [ y ] if x, y independent\\
VAR[x+y] = VAR[x] + VAR[y]\\
else\\
VAR[x+y] = VAR[x] + VAR[y] + 2COV[x, y]\\
Covariance = COV[x, y] = E[(xi-\mu x)(yi-\mu y)]\\
\mu_x = E[x]\\
\mu_y = E[y]\\ i f x , y in d e p e n d e n t V A R [ x + y ] = V A R [ x ] + V A R [ y ] e l se V A R [ x + y ] = V A R [ x ] + V A R [ y ] + 2 CO V [ x , y ] C o v a r ian ce = CO V [ x , y ] = E [( x i − μx ) ( y i − μ y )] μ x = E [ x ] μ y = E [ y ] V A R [ x + x ] = V A R [ x ] + V A R [ x ] + 2 C O V [ x , x ] = 4 V A R [ x ] VAR[x+x] = VAR[x] + VAR[x] + 2COV[x,x] = 4VAR[x] V A R [ x + x ] = V A R [ x ] + V A R [ x ] + 2 CO V [ x , x ] = 4 V A R [ x ] Pearsons correlation = C O V [ x , y ] V A R [ x ] V A R [ y ] \frac{COV[x, y]}{\sqrt{VAR[x]VAR[y]}} V A R [ x ] V A R [ y ] CO V [ x , y ]
Spearman's rank correlation
Standard Deviation
σ \sigma σ 一樣是用來表達數值間的分散程度
但更好的表達每個數值跟 mean 的平均距離有多遠
只是把 variance 開根號就好了
4 , 3 , 1 , 6 , 1 , 7 σ = 5.2 = 2.28 \begin{aligned}
&4, 3, 1, 6, 1, 7 \\
&\sigma = \sqrt{5.2} = 2.28
\end{aligned} 4 , 3 , 1 , 6 , 1 , 7 σ = 5.2 = 2.28 期望值又可以表達成 E ( X ) = μ X E(X) = \mu_X E ( X ) = μ X
例如一個骰子的機率是
所以期望值為
E ( X ) = 1 × 1 6 + 2 × 1 6 + 3 × 1 6 + 4 × 1 6 + 5 × 1 6 + 6 × 1 6 = 3.5 E(X) = 1 \times \frac{1}{6} + 2 \times \frac{1}{6} + 3 \times \frac{1}{6} + 4 \times \frac{1}{6} + 5 \times \frac{1}{6} + 6 \times \frac{1}{6} = 3.5 E ( X ) = 1 × 6 1 + 2 × 6 1 + 3 × 6 1 + 4 × 6 1 + 5 × 6 1 + 6 × 6 1 = 3.5 linearly expectation
expectation (p1)
E [ f ( x ) ] = ∑ i p x i f ( x i ) E[f(x)] = \sum_ipx_if(x_i) E [ f ( x )] = i ∑ p x i f ( x i ) Normal Distribution
又稱作高斯分布 (Gaussian distribution)
mean = median,且都在 distribution 的中央
有大約 68% 的數值在 1 standard deviation of the mean
有大約 95% 的數值在 2 standard deviation of the mean
有大約 99.7% 的數值在 3 standard deviation of the mean
上面三個數值的分布又稱為 Empirical rule
The probability density of the normal distribution is
f ( x ∣ μ , σ 2 ) = 1 2 π σ 2 e ( x − μ ) 2 2 σ 2 f(x \mid \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{(x-\mu)^2}{2\sigma^2}} f ( x ∣ μ , σ 2 ) = 2 π σ 2 1 e 2 σ 2 ( x − μ ) 2 Standard Normal Distribution
標準常態分布 是 Normal distribution 的一種
他的平均在 0,且 variance = 1
μ = 0 , σ 2 = 1 \mu = 0, \sigma^2 = 1 μ = 0 , σ 2 = 1
The probability density of the standard normal distribution is
φ ( x ) = 1 2 π e − 1 2 x 2 \varphi(x) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2} φ ( x ) = 2 π 1 e − 2 1 x 2 H ( p 1 ⋯ p k ) = ∑ i p i log 1 p i = E [ log 1 p i ] H(p_1 \cdots p_k) = \sum_i p_i \log\frac{1}{p_i} = E[\log\frac{1}{p_i}] H ( p 1 ⋯ p k ) = i ∑ p i log p i 1 = E [ log p i 1 ] entropy can be used to predict the least bits needs to be transfer a = 1/2, b = 1/2, h = 1 a = 2/3, b = 1/3, h = .92 => can use less than 1 bit to code (use huffman code)
Probability
Likelihood
Prior Probability
prior odds P(A) = 20% P(-A) = 80% =1/4
Posterior Probability
posterior odds
P [ A ∣ S ] P [ − A ∣ S ] = P [ A ] P [ − A ] P [ S ∣ A ] P [ S ∣ − A ] \frac{P[A|S]}{P[-A|S]} = \frac{P[A]}{P[-A]} \frac{P[S|A]}{P[S|-A]} P [ − A ∣ S ] P [ A ∣ S ] = P [ − A ] P [ A ] P [ S ∣ − A ] P [ S ∣ A ] posterior = prior odds * likelihood
is better than bayes law (P[S] is not easy to get.)
Bayes law
odds form of bayes law (p29)
Odds ratio
Bent coin
beta integral with gamma function
∀ k P r o b [ k s u c c e s s ] = 1 n + 1 \forall k Prob[k success] = \frac{1}{n+1} ∀ k P ro b [ k s u ccess ] = n + 1 1 F a + F b = F F_a + F_b = F F a + F b = F F a ! F b ! ( F a + F b + 1 ) ! = 1 ( N + 1 ) F a ! ( F − F a ) ! F ! = 1 ( F + 1 ) ( F F a ) \frac{F_a!F_b!}{(F_a + F_b + 1)!}
=\frac{1}{(N+1)}\frac{F_a!(F-F_a)!}{F!} = \frac{1}{(F+1)\binom{F}{F_a}} ( F a + F b + 1 )! F a ! F b ! = ( N + 1 ) 1 F ! F a ! ( F − F a )! = ( F + 1 ) ( F a F ) 1 3.5 answer
P [ P a ∣ a b a ] = P [ P a ] × P [ a b a ∣ P a ] P [ a b a ] P[P_a\mid aba] = \frac{P[P_a]\times P[aba|P_a]}{P[aba]} P [ P a ∣ aba ] = P [ aba ] P [ P a ] × P [ aba ∣ P a ] P [ a b a ∣ P a ] = P a 2 ( 1 − P a ) P[aba|P_a] = P_a^2(1-P_a) P [ aba ∣ P a ] = P a 2 ( 1 − P a ) P [ a b a ] = 1 ( 3 + 1 ) ( 3 2 ) P[aba] = \frac{1}{(3+1)\binom{3}{2}} P [ aba ] = ( 3 + 1 ) ( 2 3 ) 1 P [ P a ∣ a b a ] = P a ( P a 2 ( 1 − P a ) ) 1 12 P[P_a\mid aba] =
\frac{P_a(P_a^2(1-P_a))}{\frac{1}{12}} P [ P a ∣ aba ] = 12 1 P a ( P a 2 ( 1 − P a )) Course goal: Full understanding of inferrence of Pa
binomial distribution inferrence
Binomial Distribution
Maximum Likelihood
gaussian mixture model
Bayesian inference
Belief update
Maximum Likelihood Estimation
Given f ( x ) = N ( μ , σ 2 ) with fixed σ 2 data x 1 ⋯ x n What is the maximum likelihood estimation of μ \text{Given } f(x) = N(\mu, \sigma^2) \text{ with fixed } \sigma^2\\
\text{data }x_1 \cdots x_n\\
\text{What is the maximum likelihood estimation of }\mu Given f ( x ) = N ( μ , σ 2 ) with fixed σ 2 data x 1 ⋯ x n What is the maximum likelihood estimation of μ Random Variables
I ( X , Y ) = H ( X ) − H ( X ∣ Y ) = H ( Y ) − H ( Y ∣ X ) I(X, Y) = H(X) - H(X|Y) = H(Y) - H(Y|X) I ( X , Y ) = H ( X ) − H ( X ∣ Y ) = H ( Y ) − H ( Y ∣ X ) if x, y independent
I ( X , Y ) = 0 I(X, Y) = 0 I ( X , Y ) = 0 if X determines Y
H ( X ∣ Y ) = 0 , I ( X , Y ) = H ( X ) = H ( Y ) H(X|Y) = 0, I(X, Y) = H(X) = H(Y) H ( X ∣ Y ) = 0 , I ( X , Y ) = H ( X ) = H ( Y ) hw : p-val = prob of data given hypothesis
P-value
Clustering
k nearest neighbors
k-means clustering
soft k-mean clustering
p289
Curse of demensionality
Demension Reduction
Beta Binomial Reasoning
prior + likelihood = posterior form P [ P H ] ∝ P H a ( 1 − P H ) b \begin{aligned}
\text{prior} + \text{likelihood} &= \text{posterior form} \\
P[P_H] &\propto P_H^a(1-P_H)^b
\end{aligned} prior + likelihood P [ P H ] = posterior form ∝ P H a ( 1 − P H ) b B e t a ( x , a , b ) ⇒ x a − 1 ( 1 − x ) b − 1 B ( a , b ) = Γ ( a ) Γ ( b ) x a ( 1 − x ) b Γ ( a + b ) Beta(x, a, b) \Rightarrow \frac{x^{a-1}(1-x)^{b-1}}{B(a, b)}= \frac{\Gamma(a)\Gamma(b)x^a(1-x)^b}{\Gamma(a+b)} B e t a ( x , a , b ) ⇒ B ( a , b ) x a − 1 ( 1 − x ) b − 1 = Γ ( a + b ) Γ ( a ) Γ ( b ) x a ( 1 − x ) b 當 a = 1, b = 1 時為 uniform prior
x 0 ( 1 − x ) 0 Γ ( 1 ) Γ ( 1 ) Γ ( 2 ) = 1 ∣ where Γ ( i ) = ( i − 1 ) ! \frac{x^0(1-x)^0\Gamma(1)\Gamma(1)}{\Gamma(2)} = 1 \mid \text{ where } \Gamma(i) = (i-1)! Γ ( 2 ) x 0 ( 1 − x ) 0 Γ ( 1 ) Γ ( 1 ) = 1 ∣ where Γ ( i ) = ( i − 1 )! Naive Bayes