Academic
  • Introduction
  • Artificial Intelligence
    • Introduction
    • AI Concepts, Terminology, and Application Areas
    • AI: Issues, Concerns and Ethical Considerations
  • Biology
    • Scientific Method
    • Chemistry of Life
    • Water, Acids, Bases
    • Properties of carbon
    • Macromolecules
    • Energy and Enzymes
    • Structure of a cell
    • Membranes and transport
    • Cellular respiration
    • Cell Signaling
    • Cell Division
    • Classical and molecular genetics
    • DNA as the genetic material
    • Central dogma
    • Gene regulation
  • Bioinformatics
    • Bioinformatics Overview
  • Deep Learning
    • Neural Networks and Deep Learning
      • Introduction
      • Logistic Regression as a Neural Network
      • Python and Vectorization
      • Shallow Neural Network
      • Deep Neural Network
    • Improving Deep Neural Networks
      • Setting up your Machine Learning Application
      • Regularizing your Neural Network
      • Setting up your Optimization Problem
      • Optimization algorithms
      • Hyperparameter, Batch Normalization, Softmax
    • Structuring Machine Learning Projects
    • Convolutional Neural Networks
      • Introduction
    • Sequence Models
      • Recurrent Neural Networks
      • Natural Language Processing & Word Embeddings
      • Sequence models & Attention mechanism
  • Linear Algebra
    • Vectors and Spaces
      • Vectors
      • Linear combinations and spans
      • Linear dependence and independence
      • Subspaces and the basis for a subspace
      • Vector dot and cross products
      • Matrices for solving systems by elimination
      • Null space and column space
    • Matrix transformations
      • Functions and linear transformations
      • Linear transformation examples
      • Transformations and matrix multiplication
      • Inverse functions and transformations
      • Finding inverses and determinants
      • More Determinant Depth
  • Machine Learning
    • Introduction
    • Linear Regression
      • Model and Cost Function
      • Parameter Learning
      • Multivariate Linear Regression
      • Computing Parameters Analytically
      • Octave
    • Logistic Regression
      • Classification and Representation
      • Logistic Regression Model
    • Regularization
      • Solving the Problem of Overfitting
    • Neural Networks
      • Introduction of Neural Networks
      • Neural Networks - Learning
    • Improve Learning Algorithm
      • Advice for Applying Machine Learning
      • Machine Learning System Design
    • Support Vector Machine
      • Large Margin Classification
      • Kernels
      • SVM in Practice
  • NCKU - Artificial Intelligence
    • Introduction
    • Intelligent Agents
    • Solving Problems by Searching
    • Beyond Classical Search
    • Learning from Examples
  • NCKU - Computer Architecture
    • First Week
  • NCKU - Data Mining
    • Introduction
    • Association Analysis
    • FP-growth
    • Other Association Rules
    • Sequence Pattern
    • Classification
    • Evaluation
    • Clustering
    • Link Analysis
  • NCKU - Machine Learning
    • Probability
    • Inference
    • Bayesian Inference
    • Introduction
  • NCKU - Robotic Navigation and Exploration
    • Kinetic Model & Vehicle Control
    • Motion Planning
    • SLAM Back-end (I)
    • SLAM Back-end (II)
    • Computer Vision / Multi-view Geometry
    • Lie group & Lie algebra
    • SLAM Front-end
  • Python
    • Numpy
    • Pandas
    • Scikit-learn
      • Introduction
      • Statistic Learning
  • Statstics
    • Quantitative Data
    • Modeling Data Distribution
    • Bivariate Numerical Data
    • Probability
    • Random Variables
    • Sampling Distribution
    • Confidence Intervals
    • Significance tests
Powered by GitBook
On this page
  • Suffix Tree
  • Keyword Trees
  • Suffix Tree
  • FP-Growth Algorithm
  • FP-Trees Construction
  • FP-Growth
  • FP-Growth Adventages

Was this helpful?

  1. NCKU - Data Mining

FP-growth

PreviousAssociation AnalysisNextOther Association Rules

Last updated 5 years ago

Was this helpful?

FP-growth 的主要目的是破解 candidate generation 所引起的 bottleneck

所以 FP-growth 將不會用到任何 candidate generation

並且在 main memory 實作,進而減少對 database 的 scans

FP-growth 主要的概念是 divide-and-conquer

並且利用了 suffix tree 的概念

Suffix Tree

Keyword Trees

  • 將 keywords 儲存在 rooted labeled tree

  • 每個從 root 到 leaf 的 path 都會對應一個 keyword

  • 利用 "threading" 技巧 traverse 文章找出 keywords

  • 每當我們 reach 到 leaf node 時代表找到 keyword

Suffix Tree

  • 運用 keyword tree 的概念

  • 利用 text 的 suffixes 來建立一種 trie data structure

  • Suffix tree 的 leaves 會是每個 suffix 在文章的 start position

  • Suffix tree 可以掌握所有 text 的 suffixes

  • 例如底下是一個將基因序列 index 成 suffix tree 的例子

FP-Growth Algorithm

  1. Construct FP-Tree (Frequent pattern tree)

  2. FP-Growth (Frequent pattern growth)

    1. 根據不同 frequent item 將 FP-tree 分裂成 Conditional FP-Tree

    2. 針對每一個 conditional FP-Tree 進行 mining

FP-Trees Construction

  1. 一樣先 scan 所有 1-itemset 的 support ,但要 sort 成 descending order

    • Descend 可以改變 transaction 結構,幫助我們用更少 nodes 來建樹

  1. 將 ordered 的 itemset 建立成 FP-tree,建立方法如下

    • 從第一筆開始建樹

    • 一樣的 character 就 + 1

    • 不一樣的 character 就建立分支

  1. 最後將 tree 中相同 character 從 header table 串連起來

FP-Growth

接續上面產生的 FP-Tree

  1. 先從每一個 1-item 來建立 Conditional Pattern Base (由下往上)

    • p 可以從 fcam (2次) 接過來

    • p 也可從 cb (1次) 接過來

  2. 接著從 conditional pattern base 建立出 Conditional FP-Tree

    • 找出在 conditional pattern base 有相同 prefix 的 items

    • 並且可以符合 minsup

  3. 最後我們將 conditional FP-Tree 找到的 items 各自拆開與原本的 item 結合

FP-Growth Adventages

  • Divide-and-Conquer

    • Decompose mining and DB

  • No candidate required

  • Compressed DB (FP-Tree)

  • No repeated scan of DB

  • No pattern searching and matching

並且在 O(length)O(\text{length})O(length) 的時間就可以建好 tree

例如 p 的 fcam 跟 cb 代表 c→pc \rightarrow pc→p 共有出現 3 次

例如 c  ⟹  (c→p)c \implies (c \rightarrow p)c⟹(c→p)

Here is a complete FP-Growth Example