Statistics For Data Science with Python — Classification (2/10)

Let’s Show Classification Algorithms
1.Naive Bayes


- Simple and efficient tools for predictive data analysis
- Accessible to everybody, and reusable in various contexts
Built on NumPy, SciPy, and matplotlib
Notebook Python
a) Loading Data Set

b) Data Clean

c) Train & Test Base

d) Gaussian Naive Bayes Classification
[**GaussianNB**](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bayes.GaussianNB "sklearn.naive_bayes.GaussianNB") implements the Gaussian Naive Bayes algorithm for classification. The likelihood of the features is assumed to be Gaussian:
P(xi∣y)=12πσy2exp(−(xi−μy)22σy2)
The parameters σy and μy are estimated using maximum likelihood.
>>>
>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.naive_bayes import GaussianNB
>>> X, y = load_iris(return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
>>> gnb = GaussianNB()
>>> y_pred = gnb.fit(X_train, y_train).predict(X_test)
>>> print("Number of mislabeled points out of a total %d points : %d"
... % (X_test.shape[0], (y_test != y_pred).sum()))
Number of mislabeled points out of a total 75 points : 4



2. Undersampling x Oversampling

3. Undersampling with Tomek Links



4. Oversampling with Smote



Notebook Python with Code
- Naive Bayes
- Undersampling
- Oversampling
A example with Rando Forest






