Academic
  • Introduction
  • Artificial Intelligence
    • Introduction
    • AI Concepts, Terminology, and Application Areas
    • AI: Issues, Concerns and Ethical Considerations
  • Biology
    • Scientific Method
    • Chemistry of Life
    • Water, Acids, Bases
    • Properties of carbon
    • Macromolecules
    • Energy and Enzymes
    • Structure of a cell
    • Membranes and transport
    • Cellular respiration
    • Cell Signaling
    • Cell Division
    • Classical and molecular genetics
    • DNA as the genetic material
    • Central dogma
    • Gene regulation
  • Bioinformatics
    • Bioinformatics Overview
  • Deep Learning
    • Neural Networks and Deep Learning
      • Introduction
      • Logistic Regression as a Neural Network
      • Python and Vectorization
      • Shallow Neural Network
      • Deep Neural Network
    • Improving Deep Neural Networks
      • Setting up your Machine Learning Application
      • Regularizing your Neural Network
      • Setting up your Optimization Problem
      • Optimization algorithms
      • Hyperparameter, Batch Normalization, Softmax
    • Structuring Machine Learning Projects
    • Convolutional Neural Networks
      • Introduction
    • Sequence Models
      • Recurrent Neural Networks
      • Natural Language Processing & Word Embeddings
      • Sequence models & Attention mechanism
  • Linear Algebra
    • Vectors and Spaces
      • Vectors
      • Linear combinations and spans
      • Linear dependence and independence
      • Subspaces and the basis for a subspace
      • Vector dot and cross products
      • Matrices for solving systems by elimination
      • Null space and column space
    • Matrix transformations
      • Functions and linear transformations
      • Linear transformation examples
      • Transformations and matrix multiplication
      • Inverse functions and transformations
      • Finding inverses and determinants
      • More Determinant Depth
  • Machine Learning
    • Introduction
    • Linear Regression
      • Model and Cost Function
      • Parameter Learning
      • Multivariate Linear Regression
      • Computing Parameters Analytically
      • Octave
    • Logistic Regression
      • Classification and Representation
      • Logistic Regression Model
    • Regularization
      • Solving the Problem of Overfitting
    • Neural Networks
      • Introduction of Neural Networks
      • Neural Networks - Learning
    • Improve Learning Algorithm
      • Advice for Applying Machine Learning
      • Machine Learning System Design
    • Support Vector Machine
      • Large Margin Classification
      • Kernels
      • SVM in Practice
  • NCKU - Artificial Intelligence
    • Introduction
    • Intelligent Agents
    • Solving Problems by Searching
    • Beyond Classical Search
    • Learning from Examples
  • NCKU - Computer Architecture
    • First Week
  • NCKU - Data Mining
    • Introduction
    • Association Analysis
    • FP-growth
    • Other Association Rules
    • Sequence Pattern
    • Classification
    • Evaluation
    • Clustering
    • Link Analysis
  • NCKU - Machine Learning
    • Probability
    • Inference
    • Bayesian Inference
    • Introduction
  • NCKU - Robotic Navigation and Exploration
    • Kinetic Model & Vehicle Control
    • Motion Planning
    • SLAM Back-end (I)
    • SLAM Back-end (II)
    • Computer Vision / Multi-view Geometry
    • Lie group & Lie algebra
    • SLAM Front-end
  • Python
    • Numpy
    • Pandas
    • Scikit-learn
      • Introduction
      • Statistic Learning
  • Statstics
    • Quantitative Data
    • Modeling Data Distribution
    • Bivariate Numerical Data
    • Probability
    • Random Variables
    • Sampling Distribution
    • Confidence Intervals
    • Significance tests
Powered by GitBook
On this page
  • Probability
  • Axiom
  • Conditional Probability
  • Independent
  • Bayes Theorem
  • Expectation and Variance
  • Bernoulli distribution
  • 證明 independent X, Y 時兩大公式
  • Binomial Distribution
  • Expectation and Variance of Binomial Distribution
  • Poisson Distribution
  • Normal Distribution
  • Probability density function, PDF
  • Normal Distribution
  • Standard Normal Distribution
  • Central Limit Theorem
  • Covariance
  • Correlation
  • the Central Limit Theorem

Was this helpful?

  1. NCKU - Machine Learning

Probability

PreviousNCKU - Machine LearningNextInference

Last updated 5 years ago

Was this helpful?

Probability

該筆記參考

Axiom

  • 0≤P(A)≤10 \le P(A) \le 10≤P(A)≤1

  • P(Ω)=1∣Ω=total sample spaceP(\Omega) = 1 \mid \Omega = \text{total sample space}P(Ω)=1∣Ω=total sample space

  • When A1,A2,⋯ ,An are independent\text{When }A_1, A_2, \cdots, A_n \text{ are independent}When A1​,A2​,⋯,An​ are independent

    • P(A1∪A2∪⋯∪An)=P(A1)+P(A2)+⋯+P(An)P(A_1 \cup A_2 \cup \cdots \cup A_n) = P(A_1) + P(A_2) + \cdots + P(A_n)P(A1​∪A2​∪⋯∪An​)=P(A1​)+P(A2​)+⋯+P(An​)

Conditional Probability

  • P(A∣S)=P(A∩S)P(S)P(A \mid S) = \frac{P(A\cap S)}{P(S)}P(A∣S)=P(S)P(A∩S)​

  • P(A∩S)=P(A∣S)P(S)P(A \cap S) = P(A\mid S)P(S)P(A∩S)=P(A∣S)P(S)

Independent

  • P(A∩B)=P(A)P(B)P(A \cap B) = P(A)P(B)P(A∩B)=P(A)P(B)

    • P(A∣B)=P(A)P(A\mid B) = P(A)P(A∣B)=P(A)

      • B 無法提供任何訊息給 A

Bayes Theorem

  • 用於條件互調

  • 已知

  • 所以

  • 另外已知

  • 最終的 Bayes theorem formula

Example

  • 有一個氣喘人出現,他有抽菸的機率是多少

Expectation and Variance

  • Expectation

  • Variance

  • 衡量一組數據變化幅度

    • 也可以寫成

  • Variance 有一些性質

Bernoulli distribution

  • 兩個變數是 independent

  • 假設取 1 的機率有 P

  • X 期望值為

證明 independent X, Y 時兩大公式

Binomial Distribution

  • 第 x 次實驗的機率寫為

Expectation and Variance of Binomial Distribution

  • 計算 Variance

Poisson Distribution

    • 每個微小時段可視為一個 Bernoulli (有發生或沒有發生)

  • 當 X 符合泊松分佈時

  • the probability function of a Poisson distribution

  • Expectation of poisson

  • Variance of poisson

Normal Distribution

Probability density function, PDF

  • 給定範圍 [a, b] 且 a < b,那麼一個隨機連續變量 X 的機率將滿足

    • 這稱為概率密度方程 (probability density function, PDF)

    • 在 a ∼ b 區間內的積分,就是這個連續變量在這個區間內取值的概率

  • 整個範圍的面積等於 1

    • Expectation

    • Variance

Normal Distribution

  • 當數據 X 符合 normal distribution,通常會用 exp 和 var 來描述

  • 他的 probability density function 可表示為

Standard Normal Distribution

  • 他的 probability density function 可表示為

  • 他有 95% 的面積在 standard deviation -1.96 至 1.96 區間

  • 任何 normal distribution 都可以正規化成 standard,利用以下公式

Central Limit Theorem

Covariance

  • 還記得兩個獨立變數的 variance 可以拆開

  • 若兩個變數互相會影響時 :

  • 所以當兩變數互相影響、不為獨立時,Variance 的算法如下

  • 注意 : Covariance 只能評價 X, Y 之間的 linear assoication

  • 以下有一些 covariance 的 properties

Correlation

  • Covariance 的大小波動不穩定,會被各自數值、單位影響

  • 可以除以各自標準差 (standardization),得到相關係數 Corr (-1 to 1)

the Central Limit Theorem

  • 可以寫成

  • 或寫成

  • 簡而言之,只要樣本數夠大, Sampling distribution 的分布會服從於 Normal distribution

Binomial Distribution with Central Limit Theorem

  • 因為 n 非常大,計算非常困難

  • 所以可以運用 Central Limit Theorem

  • n 多少算大

P(A∩S)=P(A∣S)P(S)=P(S∣A)P(A)\begin{aligned} P(A \cap S) &= P(A\mid S)P(S) \\ &= P(S\mid A)P(A) \end{aligned}P(A∩S)​=P(A∣S)P(S)=P(S∣A)P(A)​
P(S∣A)P(A)=P(A∣S)P(S)⇒P(S∣A)=P(A∣S)P(S)P(A)\begin{aligned} P(S\mid A)P(A) &= P(A\mid S)P(S) \\ \Rightarrow P(S\mid A) &= \frac{P(A\mid S)P(S)}{P(A)} \end{aligned}P(S∣A)P(A)⇒P(S∣A)​=P(A∣S)P(S)=P(A)P(A∣S)P(S)​​
P(A)=P(A∩S)+P(A∩Sˉ)=P(A∣S)P(S)+P(A∣Sˉ)P(Sˉ)\begin{aligned} P(A) &= P(A \cap S) + P(A \cap \bar{S})\\ &= P(A\mid S)P(S) + P(A\mid\bar{S})P(\bar{S}) \end{aligned}P(A)​=P(A∩S)+P(A∩Sˉ)=P(A∣S)P(S)+P(A∣Sˉ)P(Sˉ)​
P(S∣A)=P(A∣S)P(S)P(A∣S)P(S)+P(A∣Sˉ)P(Sˉ)P(S\mid A) = \frac{P(A\mid S)P(S)}{P(A\mid S)P(S) + P(A\mid\bar{S})P(\bar{S})}P(S∣A)=P(A∣S)P(S)+P(A∣Sˉ)P(Sˉ)P(A∣S)P(S)​

20% 吸菸 P(S)=0.2P(S) = 0.2P(S)=0.2

吸菸有 9% 有氣喘 P(A∣S)=0.09P(A\mid S) = 0.09P(A∣S)=0.09

不吸菸有 7% 有氣喘 P(A∣Sˉ)=0.07P(A\mid\bar{S}) = 0.07P(A∣Sˉ)=0.07

Find P(S∣A)P(S\mid A)P(S∣A)

P(S∣A)=P(A∣S)P(S)P(A∣S)P(S)+P(A∣Sˉ)P(Sˉ)=0.09⋅0.20.09⋅0.2+0.07⋅0.8=0.24\begin{aligned} P(S\mid A) &= \frac{P(A\mid S)P(S)}{P(A\mid S)P(S) + P(A\mid\bar{S})P(\bar{S})} \\ &= \frac{0.09 \cdot 0.2}{0.09 \cdot 0.2 + 0.07 \cdot 0.8}\\ &= 0.24 \end{aligned}P(S∣A)​=P(A∣S)P(S)+P(A∣Sˉ)P(Sˉ)P(A∣S)P(S)​=0.09⋅0.2+0.07⋅0.80.09⋅0.2​=0.24​

求取一組 Discrete Random Variables XXX 的期望 (均值)

將所有 X 的值和對應機率相乘後求和,也會用 μ\muμ 表示

E(X)=∑xx⋅P(X=x)E(X) = \sum_x x\cdot P(X=x)E(X)=∑x​x⋅P(X=x)

Var(X)=E((X−μ)2)∣μ=E(x)Var(X) = E((X-\mu)^2) \mid \mu = E(x)Var(X)=E((X−μ)2)∣μ=E(x)

Var(X)=E(X2)−E(X)2Var(X) = E(X^2) - E(X)^2Var(X)=E(X2)−E(X)2

Var(X+b)=Var(X)Var(X+b) = Var(X)Var(X+b)=Var(X)

Var(aX)=a2Var(X)Var(aX) = a^2Var(X)Var(aX)=a2Var(X)

Var(aX+b)=a2Var(X)Var(aX+b) = a^2Var(X)Var(aX+b)=a2Var(X)

有一個 X = 2 ramdom variables {0,1}\{0, 1\}{0,1}

E(X)=∑xx⋅P(X=x)=1⋅P+0⋅(1−P)=P\begin{aligned} E(X) &= \sum_x x\cdot P(X=x) \\ &= 1 \cdot P + 0 \cdot (1-P) \\ &= P \end{aligned}E(X)​=x∑​x⋅P(X=x)=1⋅P+0⋅(1−P)=P​

X 的 Var 為 (x=x2)(x = x^2)(x=x2)

Var(X)=E[X2]−E[X]2=E[X]−E[X]2=P−P2=P(1−P)\begin{aligned} Var(X) &= E[X^2] - E[X]^2 \\ &= E[X] - E[X]^2 \\ &= P - P^2 \\ &= P(1-P) \end{aligned}Var(X)​=E[X2]−E[X]2=E[X]−E[X]2=P−P2=P(1−P)​

E(XY)=E(X)E(Y)E(XY) = E(X)E(Y)E(XY)=E(X)E(Y)

E(XY)=∑x∑yxyP(X=x,Y=y)=∑x∑yxyP(X=x)P(Y=y)=∑xxP(X=x)∑yyP(Y=y)=E(X)E(Y)\begin{aligned} E(XY) &= \sum_x \sum_y xyP(X=x, Y=y) \\ &= \sum_x \sum_y xyP(X=x)P(Y=y) \\ &= \sum_x xP(X=x) \sum_y yP(Y=y) \\ &= E(X)E(Y) \end{aligned}E(XY)​=x∑​y∑​xyP(X=x,Y=y)=x∑​y∑​xyP(X=x)P(Y=y)=x∑​xP(X=x)y∑​yP(Y=y)=E(X)E(Y)​

Var(X+Y)=Var(X)+Var(Y)Var(X+Y) = Var(X) + Var(Y)Var(X+Y)=Var(X)+Var(Y)

Var(X+Y)=E((X+Y)2)−E(X+Y)2=E(X2+2XY+Y2)−(E(X)+E(Y))2=E(X2)+E(Y2)+2E(XY)      −E(X)2−E(Y)2−2E(X)E(Y)=E(X2)−E(X)2+E(Y2)−E(Y)2=Var(X)+Var(Y)\begin{aligned} Var(X+Y) &= E((X+Y)^2) - E(X+Y)^2 \\ &= E(X^2+2XY+Y^2) - (E(X) + E(Y))^2 \\ &= E(X^2) + E(Y^2) + 2E(XY) \\&\,\,\,\,\,\,- E(X)^2 - E(Y)^2 - 2E(X)E(Y) \\ &= E(X^2) -E(X)^2 + E(Y^2) - E(Y)^2 \\ &= Var(X) + Var(Y) \end{aligned}Var(X+Y)​=E((X+Y)2)−E(X+Y)2=E(X2+2XY+Y2)−(E(X)+E(Y))2=E(X2)+E(Y2)+2E(XY)−E(X)2−E(Y)2−2E(X)E(Y)=E(X2)−E(X)2+E(Y2)−E(Y)2=Var(X)+Var(Y)​

通常是 nnn 次成功率 PPP 的實驗,他成功的次數

success in nnn independent Bernoulli trials

若 X 符合二項分布,記為 X∼Binomial(n,P)X \sim \text{Binomial}(n, P)X∼Binomial(n,P)

P(X=x)=(nx)Px(1−P)n−x∣for x=1,2,⋯ ,nP(X=x) =\binom{n}{x}P^x(1-P)^{n-x} \mid \text{for } x = 1, 2, \cdots, nP(X=x)=(xn​)Px(1−P)n−x∣for x=1,2,⋯,n

期望值就會等於 X 滿足 X∼Binomial(n,P)X \sim \text{Binomial}(n, P)X∼Binomial(n,P)

用 Xi,i=1,2,⋯nX_i, i = 1, 2, \cdots nXi​,i=1,2,⋯n 標記每個獨立通過實驗的 XXX

E(X)=E(∑i=1nXi)=E(X1+X2+⋯+Xn)=E(X1)+E(X2)+⋯+E(Xn)=∑i=1nE(Xi)=∑i=1nP=nP\begin{aligned} E(X) &= E(\sum_{i=1}^n X_i) \\ &= E(X_1 + X_2 + \cdots + X_n) \\ &= E(X_1) + E(X_2) + \cdots + E(X_n) \\ &= \sum_{i=1}^n E(X_i) \\ &= \sum_{i=1}^n P \\ &= nP \end{aligned}E(X)​=E(i=1∑n​Xi​)=E(X1​+X2​+⋯+Xn​)=E(X1​)+E(X2​)+⋯+E(Xn​)=i=1∑n​E(Xi​)=i=1∑n​P=nP​
Var(X)=Var(∑i=1nXi)=Var(X1+X2+⋯+Xn)=Var(X1)+Var(X2)+⋯+Var(Xn)=∑i=1nVar(Xi)=∑i=1nP(1−P)=nP(1−P)\begin{aligned} Var(X) &= Var(\sum_{i=1}^n X_i) \\ &= Var(X_1 + X_2 + \cdots + X_n) \\ &= Var(X_1) + Var(X_2) + \cdots + Var(X_n) \\ &= \sum_{i=1}^n Var(X_i) \\ &= \sum_{i=1}^n P(1-P) \\ &= nP(1-P) \end{aligned}Var(X)​=Var(i=1∑n​Xi​)=Var(X1​+X2​+⋯+Xn​)=Var(X1​)+Var(X2​)+⋯+Var(Xn​)=i=1∑n​Var(Xi​)=i=1∑n​P(1−P)=nP(1−P)​

事件在時間 TTT 發生 λ\lambdaλ 次

期望在該時間的發生次數是 E(X)=λTE(X) = \lambda TE(X)=λT

用微分想像 TTT 有 nnn 個時段,n→∞n \rightarrow \infinn→∞ 的每個微小時段都發生一次事件

整個 TTT 發生事件可視為 Binomial distribution

若 XXX 代表一次事件發生經過的所有時間段

X∼Poisson(μ=λT)X \sim \text{Poisson}(\mu = \lambda T)X∼Poisson(μ=λT)
P(X=x)→μxx!e−μP(X = x) \rightarrow \frac{\mu^x}{x!}e^{-\mu}P(X=x)→x!μx​e−μ
E(X)=μE(X) = \muE(X)=μ
Var(X)=μVar(X) = \muVar(X)=μ
P(a≤X≤b)=∫abf(x)dxP(a \le X \le b) = \int_a^b f(x) dxP(a≤X≤b)=∫ab​f(x)dx
∫−∞∞f(x)dx=1\int_{-\infty}^\infty f(x) dx = 1∫−∞∞​f(x)dx=1
E(X)=∫−∞∞x⋅f(x)dxE(X) = \int_{-\infty}^\infty x\cdot f(x)dxE(X)=∫−∞∞​x⋅f(x)dx
Var(X)=∫−∞∞(x−μ)2f(x)dxVar(X) = \int_{-\infty}^\infty(x-\mu)^2f(x)dxVar(X)=∫−∞∞​(x−μ)2f(x)dx
X∼N(μ,σ2)X \sim N(\mu, \sigma^2)X∼N(μ,σ2)
f(x)=12πσ2e(−(x−μ)22σ2)f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{(-\frac{(x-\mu)^2}{2\sigma^2})}f(x)=2πσ2​1​e(−2σ2(x−μ)2​)

Expectation E(x)=μE(x) = \muE(x)=μ

Variance Var(x)=σ2Var(x) = \sigma^2Var(x)=σ2

Normal distribution with μ=0,σ2=1\mu = 0, \sigma^2 = 1μ=0,σ2=1 時稱之

記為 Z∼N(0,1)Z \sim N(0, 1)Z∼N(0,1)

f(x)=12πe−z22f(x) = \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}}f(x)=2π​1​e−2z2​
Z=X−μσZ = \frac{X-\mu}{\sigma}Z=σX−μ​

Var(X+Y)=Var(X)+Var(Y)Var(X+Y) = Var(X) + Var(Y)Var(X+Y)=Var(X)+Var(Y)

Var(X+Y)=E[((X+Y)−E(X+Y))2]=E[((X+Y)−(E(X)+E(Y)))2]=E[(X−E(X))−(Y−E(Y))2]=E[(X−E(X))2+(Y−E(Y))2+2(X−E(X))(Y−E(Y))]=Var(X)+Var(Y)+2E[(X−E(X))(Y−E(Y))]\begin{aligned} Var(X+Y) &= E[((X+Y) - E(X+Y))^2] \\ &= E[((X+Y)- (E(X) + E(Y)))^2] \\ &= E[(X - E(X)) - (Y - E(Y))^2] \\ &= E[(X-E(X))^2 + (Y - E(Y))^2 + 2(X-E(X))(Y - E(Y))] \\ &= Var(X) + Var(Y) + 2E[(X-E(X))(Y-E(Y))] \end{aligned}Var(X+Y)​=E[((X+Y)−E(X+Y))2]=E[((X+Y)−(E(X)+E(Y)))2]=E[(X−E(X))−(Y−E(Y))2]=E[(X−E(X))2+(Y−E(Y))2+2(X−E(X))(Y−E(Y))]=Var(X)+Var(Y)+2E[(X−E(X))(Y−E(Y))]​

後面多出來的 E[(X−E(X))(Y−E(Y))]E[(X-E(X))(Y-E(Y))]E[(X−E(X))(Y−E(Y))] 稱為 Covariance (協方差)

Cov(X,Y)=E[(X−E(X))(Y−E(Y))]Cov(X, Y) = E[(X-E(X))(Y-E(Y))]Cov(X,Y)=E[(X−E(X))(Y−E(Y))]
Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)Var(X+Y) = Var(X) + Var(Y) + 2Cov(X, Y)Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)

Cov(X,X)=Var(X)Cov(X, X) = Var(X)Cov(X,X)=Var(X)

Cov(X,Y)=Cov(Y,X)Cov(X, Y) = Cov(Y, X)Cov(X,Y)=Cov(Y,X)

Cov(aX,bY)=abCov(X,Y)Cov(aX, bY) = ab Cov(X, Y)Cov(aX,bY)=abCov(X,Y)

Cov(X+Y,X−Y)=Var(X)−Var(Y)Cov(X+Y, X-Y) = Var(X) - Var(Y)Cov(X+Y,X−Y)=Var(X)−Var(Y)

X,Y independent→Cov(X,Y)=0 (no vice versa)X, Y \text{ independent} \rightarrow Cov(X, Y) = 0 \text{ (no vice versa)}X,Y independent→Cov(X,Y)=0 (no vice versa)

Corr(X,Y)=Cov(X,Y)SD(X)SD(Y)=Cov(X,Y)Var(X)Var(Y)Corr(X, Y) = \frac{Cov(X, Y)}{SD(X)SD(Y)} = \frac{Cov(X, Y)}{\sqrt{Var(X)Var(Y)}}Corr(X,Y)=SD(X)SD(Y)Cov(X,Y)​=Var(X)Var(Y)​Cov(X,Y)​

當你有數量為 nnn 的樣本,其 sampling distribution 為 Xˉn\bar{X}_nXˉn​

當樣本數增加 n→∞n \rightarrow \inftyn→∞,會接近 normal distribution

Xˉn∼N(μ,σ2n)\bar{X}_n \sim N(\mu, \frac{\sigma^2}{n})Xˉn​∼N(μ,nσ2​)
∑i=1nXi∼N(nμ,nσ2)\sum_{i=1}^n X_i \sim N(n\mu, n\sigma^2)i=1∑n​Xi​∼N(nμ,nσ2)

二項分布要解決 n→∞n \rightarrow \inftyn→∞ 得到 X∼Binomial(n,P)X \sim \text{Binomial}(n, P)X∼Binomial(n,P)

X∼N(nP,nP(1−P))X \sim N(nP, nP(1-P))X∼N(nP,nP(1−P))

n>20 AND nP>5 AND n(1−P)>5n > 20 \text{ AND } nP > 5 \text{ AND } n(1-P) > 5n>20 AND nP>5 AND n(1−P)>5

醫學統計學