# Statistics For Data Science with Python — Classification (2/10)

*Let’s Show Classification* Algorithms

### 1.Naive Bayes

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662832560262/JfdtKEF8r.jpeg)

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662832561635/dA7l3ik4n.png)

*   Simple and efficient tools for predictive data analysis
*   Accessible to everybody, and reusable in various contexts

### Built on NumPy, SciPy, and matplotlib

### Notebook Python

#### a) Loading Data Set

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662832562927/4NbOdZGqA.png)

#### b) Data Clean

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662832564291/-ZSeOmh-5.png)

#### c) Train & Test Base

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662832565621/_WH0roDUu.png)

#### d) Gaussian Naive Bayes Classification

`[**GaussianNB**](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bayes.GaussianNB "sklearn.naive_bayes.GaussianNB")` implements the Gaussian Naive Bayes algorithm for classification. The likelihood of the features is assumed to be Gaussian:

P(xi∣y)=12πσy2exp⁡(−(xi−μy)22σy2)

The parameters σy and μy are estimated using maximum likelihood.

\>>>

**\>>> from** **sklearn.datasets** **import** load\_iris  
**\>>> from** **sklearn.model\_selection** **import** train\_test\_split  
**\>>> from** **sklearn.naive\_bayes** **import** GaussianNB  
**\>>>** X, y = load\_iris(return\_X\_y=**True**)  
**\>>>** X\_train, X\_test, y\_train, y\_test = train\_test\_split(X, y, test\_size=0.5, random\_state=0)  
**\>>>** gnb = GaussianNB()  
**\>>>** y\_pred = gnb.fit(X\_train, y\_train).predict(X\_test)  
**\>>>** print("Number of mislabeled points out of a total *%d* points : *%d*"  
**... **      % (X\_test.shape\[0\], (y\_test != y\_pred).sum()))  
Number of mislabeled points out of a total 75 points : 4

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662832566950/i98grkkMk.jpeg)

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662832568301/fgcjNmqQS.png)

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662832569979/QquSmIklg.png)

### 2\. Undersampling x Oversampling

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662832571267/h9QwfE6TA.png)

### 3\. Undersampling with Tomek Links

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662832572624/_l65fd7qU.png)

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662832574330/MITwGpSRC.png)

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662832575818/da20kK6PE.png)

### 4\. Oversampling with Smote

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662832577225/p_NPhJkSZ.png)

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662832578558/rjNwGbIu3.png)

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662832580007/JP2Cm-eVe.png)

### Notebook Python with Code

*   Naive Bayes
*   Undersampling
*   Oversampling

### A example with Rando Forest

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662832581394/f57fMmE_2.jpeg)

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662832584060/Yg7rbqy4O.gif)

#### Random Forest Code with Python