Skip to main content

Command Palette

Search for a command to run...

Statistics For Data Science with Python — Classification (2/10)

Published
1 min readView as Markdown
Statistics For Data Science with Python — Classification (2/10)

Let’s Show Classification Algorithms

1.Naive Bayes

  • Simple and efficient tools for predictive data analysis
  • Accessible to everybody, and reusable in various contexts

Built on NumPy, SciPy, and matplotlib

Notebook Python

a) Loading Data Set

b) Data Clean

c) Train & Test Base

d) Gaussian Naive Bayes Classification

[**GaussianNB**](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bayes.GaussianNB "sklearn.naive_bayes.GaussianNB") implements the Gaussian Naive Bayes algorithm for classification. The likelihood of the features is assumed to be Gaussian:

P(xi∣y)=12πσy2exp⁡(−(xi−μy)22σy2)

The parameters σy and μy are estimated using maximum likelihood.

>>>

>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.naive_bayes import GaussianNB
>>> X, y = load_iris(return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
>>> gnb = GaussianNB()
>>> y_pred = gnb.fit(X_train, y_train).predict(X_test)
>>> print("Number of mislabeled points out of a total %d points : %d"
... % (X_test.shape[0], (y_test != y_pred).sum()))
Number of mislabeled points out of a total 75 points : 4

2. Undersampling x Oversampling

4. Oversampling with Smote

Notebook Python with Code

  • Naive Bayes
  • Undersampling
  • Oversampling

A example with Rando Forest

Random Forest Code with Python