Academic
  • Introduction
  • Artificial Intelligence
    • Introduction
    • AI Concepts, Terminology, and Application Areas
    • AI: Issues, Concerns and Ethical Considerations
  • Biology
    • Scientific Method
    • Chemistry of Life
    • Water, Acids, Bases
    • Properties of carbon
    • Macromolecules
    • Energy and Enzymes
    • Structure of a cell
    • Membranes and transport
    • Cellular respiration
    • Cell Signaling
    • Cell Division
    • Classical and molecular genetics
    • DNA as the genetic material
    • Central dogma
    • Gene regulation
  • Bioinformatics
    • Bioinformatics Overview
  • Deep Learning
    • Neural Networks and Deep Learning
      • Introduction
      • Logistic Regression as a Neural Network
      • Python and Vectorization
      • Shallow Neural Network
      • Deep Neural Network
    • Improving Deep Neural Networks
      • Setting up your Machine Learning Application
      • Regularizing your Neural Network
      • Setting up your Optimization Problem
      • Optimization algorithms
      • Hyperparameter, Batch Normalization, Softmax
    • Structuring Machine Learning Projects
    • Convolutional Neural Networks
      • Introduction
    • Sequence Models
      • Recurrent Neural Networks
      • Natural Language Processing & Word Embeddings
      • Sequence models & Attention mechanism
  • Linear Algebra
    • Vectors and Spaces
      • Vectors
      • Linear combinations and spans
      • Linear dependence and independence
      • Subspaces and the basis for a subspace
      • Vector dot and cross products
      • Matrices for solving systems by elimination
      • Null space and column space
    • Matrix transformations
      • Functions and linear transformations
      • Linear transformation examples
      • Transformations and matrix multiplication
      • Inverse functions and transformations
      • Finding inverses and determinants
      • More Determinant Depth
  • Machine Learning
    • Introduction
    • Linear Regression
      • Model and Cost Function
      • Parameter Learning
      • Multivariate Linear Regression
      • Computing Parameters Analytically
      • Octave
    • Logistic Regression
      • Classification and Representation
      • Logistic Regression Model
    • Regularization
      • Solving the Problem of Overfitting
    • Neural Networks
      • Introduction of Neural Networks
      • Neural Networks - Learning
    • Improve Learning Algorithm
      • Advice for Applying Machine Learning
      • Machine Learning System Design
    • Support Vector Machine
      • Large Margin Classification
      • Kernels
      • SVM in Practice
  • NCKU - Artificial Intelligence
    • Introduction
    • Intelligent Agents
    • Solving Problems by Searching
    • Beyond Classical Search
    • Learning from Examples
  • NCKU - Computer Architecture
    • First Week
  • NCKU - Data Mining
    • Introduction
    • Association Analysis
    • FP-growth
    • Other Association Rules
    • Sequence Pattern
    • Classification
    • Evaluation
    • Clustering
    • Link Analysis
  • NCKU - Machine Learning
    • Probability
    • Inference
    • Bayesian Inference
    • Introduction
  • NCKU - Robotic Navigation and Exploration
    • Kinetic Model & Vehicle Control
    • Motion Planning
    • SLAM Back-end (I)
    • SLAM Back-end (II)
    • Computer Vision / Multi-view Geometry
    • Lie group & Lie algebra
    • SLAM Front-end
  • Python
    • Numpy
    • Pandas
    • Scikit-learn
      • Introduction
      • Statistic Learning
  • Statstics
    • Quantitative Data
    • Modeling Data Distribution
    • Bivariate Numerical Data
    • Probability
    • Random Variables
    • Sampling Distribution
    • Confidence Intervals
    • Significance tests
Powered by GitBook
On this page
  • Loading Dataset
  • Learning and Predicting
  • Save Model
  • Refitting and Updating Hyperparameters

Was this helpful?

  1. Python
  2. Scikit-learn

Introduction

PreviousScikit-learnNextStatistic Learning

Last updated 5 years ago

Was this helpful?

  • 本篇完全參考 scikit-learn 的官方 documentation

Loading Dataset

  • scikit-learn 提供很多內建的 dataset 作為測試

from sklearn import datasets

iris = datasets.load_iris()     # 辨識花朵
digits = datasets.load_digits() # 辨識手寫數字
  • 載入的物件有幾個屬性

    • data : training data X

    • target : label y

print(digits.data)
# [[ 0.   0.   5. ...   0.   0.   0.]
#  [ 0.   0.   0. ...  10.   0.   0.]
#  [ 0.   0.   0. ...  16.   9.   0.]
#  ...
#  [ 0.   0.   1. ...   6.   0.   0.]
#  [ 0.   0.   2. ...  12.   0.   0.]
#  [ 0.   0.  10. ...  12.   1.   0.]]

print(digits.target)
# [0, 1, 2, ..., 8, 9, 8]

Learning and Predicting

  • scikit-learn 提供多種傳統 Machine learning 套件

  • 可以呼叫這些套件,並修改其 hyperparameters 便可開始訓練

  • 以下用內建的 SVM 套件來示範

from sklearn import svm

X, y = digits.data, digits.target

clf = svm.SVC(gamma=0.001, C=100.)  # 利用 SVM 提供的 support vector classification
clf.fit(X, y)  # clf 即為訓練好的 model (hypothesis)

print(clf)
# SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
#    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='rbf',
#    max_iter=-1, probability=False, random_state=None, shrinking=True,
#    tol=0.001, verbose=False)
  • 接著就可以拿訓練好的 model 進行預測

    • predict 需要傳遞一個 list 作為 param

    • 所以這邊使用 X[-1:]

ans = clf.predict(X[-1:])  # 試著預測最後一個 data

print(ans)
# [8]

Save Model

  • Python 內建的 pickle 以及 joblib 都可以將 model 存起來下次使用

  • 以下是 pickle 範例

import pickle

s = pickle.dumps(clf)  # save

clf2 = pickle.loads(s)
clf2.predict(X[-1:])  # 8
  • joblib 可以存更大更複雜的 model,但需要存至 disk 上

from joblib import dump, load

dump(clf, 'myModel.joblib')
clf3 = load('myModel.joblib')
clf3.predict(X[-1:])  # 8

Refitting and Updating Hyperparameters

  • 建立好的 model 可以隨時修改或 overwrite 他的 hyperparameters

  • 以下先使用 kernel 為 linear 的 SVC

  • 接著再改回 kernel 為 rbf 的 SVC

clf = svm.SVC()

clf.set_params(kernel='linear').fit(X, y)
clf.predict(X[:1])

clf.set_params(kernel='rbf', gamma='scale').fit(X, y)
clf.predict(X[:1])
https://scikit-learn.org/stable/tutorial/basic/tutorial.html