Academic
  • Introduction
  • Artificial Intelligence
    • Introduction
    • AI Concepts, Terminology, and Application Areas
    • AI: Issues, Concerns and Ethical Considerations
  • Biology
    • Scientific Method
    • Chemistry of Life
    • Water, Acids, Bases
    • Properties of carbon
    • Macromolecules
    • Energy and Enzymes
    • Structure of a cell
    • Membranes and transport
    • Cellular respiration
    • Cell Signaling
    • Cell Division
    • Classical and molecular genetics
    • DNA as the genetic material
    • Central dogma
    • Gene regulation
  • Bioinformatics
    • Bioinformatics Overview
  • Deep Learning
    • Neural Networks and Deep Learning
      • Introduction
      • Logistic Regression as a Neural Network
      • Python and Vectorization
      • Shallow Neural Network
      • Deep Neural Network
    • Improving Deep Neural Networks
      • Setting up your Machine Learning Application
      • Regularizing your Neural Network
      • Setting up your Optimization Problem
      • Optimization algorithms
      • Hyperparameter, Batch Normalization, Softmax
    • Structuring Machine Learning Projects
    • Convolutional Neural Networks
      • Introduction
    • Sequence Models
      • Recurrent Neural Networks
      • Natural Language Processing & Word Embeddings
      • Sequence models & Attention mechanism
  • Linear Algebra
    • Vectors and Spaces
      • Vectors
      • Linear combinations and spans
      • Linear dependence and independence
      • Subspaces and the basis for a subspace
      • Vector dot and cross products
      • Matrices for solving systems by elimination
      • Null space and column space
    • Matrix transformations
      • Functions and linear transformations
      • Linear transformation examples
      • Transformations and matrix multiplication
      • Inverse functions and transformations
      • Finding inverses and determinants
      • More Determinant Depth
  • Machine Learning
    • Introduction
    • Linear Regression
      • Model and Cost Function
      • Parameter Learning
      • Multivariate Linear Regression
      • Computing Parameters Analytically
      • Octave
    • Logistic Regression
      • Classification and Representation
      • Logistic Regression Model
    • Regularization
      • Solving the Problem of Overfitting
    • Neural Networks
      • Introduction of Neural Networks
      • Neural Networks - Learning
    • Improve Learning Algorithm
      • Advice for Applying Machine Learning
      • Machine Learning System Design
    • Support Vector Machine
      • Large Margin Classification
      • Kernels
      • SVM in Practice
  • NCKU - Artificial Intelligence
    • Introduction
    • Intelligent Agents
    • Solving Problems by Searching
    • Beyond Classical Search
    • Learning from Examples
  • NCKU - Computer Architecture
    • First Week
  • NCKU - Data Mining
    • Introduction
    • Association Analysis
    • FP-growth
    • Other Association Rules
    • Sequence Pattern
    • Classification
    • Evaluation
    • Clustering
    • Link Analysis
  • NCKU - Machine Learning
    • Probability
    • Inference
    • Bayesian Inference
    • Introduction
  • NCKU - Robotic Navigation and Exploration
    • Kinetic Model & Vehicle Control
    • Motion Planning
    • SLAM Back-end (I)
    • SLAM Back-end (II)
    • Computer Vision / Multi-view Geometry
    • Lie group & Lie algebra
    • SLAM Front-end
  • Python
    • Numpy
    • Pandas
    • Scikit-learn
      • Introduction
      • Statistic Learning
  • Statstics
    • Quantitative Data
    • Modeling Data Distribution
    • Bivariate Numerical Data
    • Probability
    • Random Variables
    • Sampling Distribution
    • Confidence Intervals
    • Significance tests
Powered by GitBook
On this page
  • Inference Concept
  • Likelihood
  • General Likelihood
  • Log-likelihood
  • Maximum Likelihood Estimator, MLE

Was this helpful?

  1. NCKU - Machine Learning

Inference

PreviousProbabilityNextBayesian Inference

Last updated 5 years ago

Was this helpful?

Inference Concept

  1. Population and sample

    • 一個是所有數據,一個是抽樣數據

    • 而討論抽樣術據 (樣本) 時

      • Sample 有無代表性

      • Population 有無準確定義

      • Population 可否無限大

      • 從所有可能的 Population 中抽樣嗎

  2. Sample and statistic

    • 一般實驗通常只會拿到 sample 數據

    • 並想要從 sample 數據推出 population 的統計量 (statistics)

      • 這個推導的動作稱為 inference

      • 用已知 sample 推測未知 population 的過程為 estimate

    • 想推導的 population 稱為參數 (parameter)

      • 從樣本預測出來的 statistics 則叫作估計量 (estimator)

    • 所有數據都會有 sampling distribution

      • 無限次取樣後的無限次統計量分布

        1. 先從 population 取 n 的樣本

        2. 計算該樣本的合適統計量,用於估計 population

        3. 計算該統計量的 sampling distribution (會假設抽樣無數次)

        4. 可以精準預測 sampling distribution,就可以預估 population 準確度

  3. Estimation

    • 從 sample mean 來推測 population mean 是一種 estimation

    • 這個估計值會有 bias 和 precision

    • Bias 代表樣本的估計量和 population 的差距

    • Precision 可以用樣本分布的 variance 來評估

      True standard deviationsample size\frac{\text{True standard deviation}}{\sqrt{\text{sample size}}}sample size​True standard deviation​
  4. Confidence intervals

    • 每次從樣本計算的估計量稱為點估計 (point estimate)

    • 信賴區間代表這些點估計的精準度

      • 信賴區間越窄代表精準度越高

    • 信賴區間會有 lower & upper bound

    • 每次從樣本計算出來的信賴區間都不同

    • 這些不同信賴區間就會有信賴區間的 sampling distribution

Likelihood

    • 再來算例如 10 次有 4 次 head 的機率是多少

    • 但這 0.5 是不是真的,只有神知道 (如果有神)

    • 這個 0.5 就是一個 likelihood

      • 我們或許不知道真正的 likelihood

      • 但我們可以預測 likelihood

  • 所以現在 likelihood 變為未知數

    • 下表可以看到 P = 0.4 時最有可能發生 4 次 head

P

head = 4

0.0

0.000

0.2

0.088

0.4

0.251

0.5

0.205

0.6

0.111

1.0

0.000

  • 圖表可表示為

  • 求取 likelihood 的公式可以寫成

General Likelihood

  • 對於 likelihood 的一般化,首先定義兩個變數

  • 所以 likelihood function 為

0

1/3

1/4

0

1

1

1/3

1/4

0

1

2

0

1/4

1/6

2

3

1/6

1/4

1/2

3

4

1/6

0

1/3

3

Log-likelihood

  • 有人發現將 likelihood 取 log 再求會更好算

Maximum Likelihood Estimator, MLE

  • Asymptotically unbiased

  • Asymptotically efficient

  • Asymptotically normal

  • Transformation invariant

  • Sufficient Information

  • Consistent

有時會假設硬幣 Prob(head)=0.5Prob(\text{head}) = 0.5Prob(head)=0.5

(104)×0.54×0.510−4=0.205\binom{10}{4}\times0.5^4\times0.5^{10-4}=0.205(410​)×0.54×0.510−4=0.205
Prob(head=4∣P)=(104)×P4×(1−P)10−4Prob(\text{head} = 4\mid P) = \binom{10}{4}\times P^4\times (1-P)^{10-4}Prob(head=4∣P)=(410​)×P4×(1−P)10−4

然後變成求 P(0∼1)P (0 \sim 1)P(0∼1) 為多少時,可以讓 Prob(X=4∣P)Prob(X = 4\mid P)Prob(X=4∣P) 得到最高值

L(P∣head=4)=(104)×P4×(1−P)10−4L(P\mid \text{head} = 4) = \binom{10}{4}\times P^4\times (1-P)^{10-4}L(P∣head=4)=(410​)×P4×(1−P)10−4

likelihood 參數為 θ\thetaθ

觀察的數據定義為 xxx

L(θ∣x)=P(x∣θ)L(\theta\mid x) = P(x\mid \theta)L(θ∣x)=P(x∣θ)

下圖是 x=0∼4x = 0\sim 4x=0∼4 時,θ=1∼3\theta = 1 \sim 3θ=1∼3 的機率

目標是求出每一個 xxx 的 θ\thetaθ

要求 likelihood 等於求 L(θ∣x)L(\theta\mid x)L(θ∣x) 的 maximum

等於求 LLL 微分等於 0,二次微分小於 0

dℓdθ=ℓ′(θ∣x)=0d2ℓdθ2<0\begin{aligned} \frac{d\ell}{d\theta} = \ell'(\theta\mid x) = 0 \\ \frac{d^2\ell}{d\theta^2} < 0 \end{aligned}dθdℓ​=ℓ′(θ∣x)=0dθ2d2ℓ​<0​

我們估計的最佳 likelihood 會給一頂帽子,寫作 θ^\hat{\theta}θ^

n→∞⇒E(θ^)→θn\rightarrow\infty \Rightarrow E(\hat{\theta}) \rightarrow \thetan→∞⇒E(θ^)→θ

n→∞⇒Var(θ^) is min.n\rightarrow\infty \Rightarrow Var(\hat{\theta}) \text{ is min}.n→∞⇒Var(θ^) is min.

n→∞⇒θ^∼N(θ,Var(θ^))n\rightarrow\infty \Rightarrow \hat{\theta} \sim N(\theta, Var(\hat{\theta}))n→∞⇒θ^∼N(θ,Var(θ^))

θ^ is MLE of θ⇒g(θ^) is MLE of g(θ)\hat{\theta} \text{ is MLE of } \theta \Rightarrow g(\hat{\theta}) \text{ is MLE of } g(\theta)θ^ is MLE of θ⇒g(θ^) is MLE of g(θ)

θ^ contains all info of data\hat{\theta} \text{ contains all info of data}θ^ contains all info of data

n→∞⇒θ^→θn\rightarrow\infty \Rightarrow \hat{\theta} \rightarrow \thetan→∞⇒θ^→θ

xxx
f(x∣1)f(x\mid 1)f(x∣1)
f(x∣2)f(x\mid 2)f(x∣2)
f(x∣3)f(x\mid 3)f(x∣3)
θ\thetaθ
Khan 解釋