# Some Data Science Projects Every Data Scientist Must Know

![Data Science](https://cdn.hashnode.com/res/hashnode/image/upload/v1662833413381/b87ZKW7Up.jpeg)

**Open source data science projects to enhance your portfolio  
Let’s divide the projects into categories:**

1.  **Open Sourcer Computer Vision**

*   FaceX-Zoo
*   Bottleneck Transformer — Pytorch
*   StyleGAN2-ADA — Official PyTorch implementation

2\. **Open Source Natural Language Processing**

*   Trankit
*   EasyNMT — Easy to use, state-of-the-art Neural Machine Translation

3\. **Open Source Machine Learning**

*   SeaLion

### 1\. Open Sourcer Computer Vision

### FaceX-Zoo

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662833414927/_4zchOT9y.png)

FaceX-Zoo has to be one of the most impressive projects of the month. With face recognition becoming more and more relevant in the realm of computer vision FaceX-Zoo is an open-source data science project you do not want to miss.

FaceX-Zoo is a face recognition PyTorch toolbox. It comes with a training module having different supervisory heads and backbones towards state-of-the-art face recognition. It has a standardized evaluation module, enabling the evaluation of models in most of the popular benchmarks just by editing a simple configuration.

Also, a simple yet fully functional face SDK is provided for the validation and primary application of the trained models. Also, FaceX-Zoo easily upgrades and extends along with the development of face-related domains.

[**GitHub - Medium-Posts/FaceX-Zoo: A PyTorch Toolbox for Face Recognition**  
*FaceX-Zoo is a PyTorch toolbox for face recognition. It provides a training module with various supervisory heads and…*github.com](https://github.com/Medium-Posts/FaceX-Zoo "https://github.com/Medium-Posts/FaceX-Zoo")[](https://github.com/Medium-Posts/FaceX-Zoo)

### Bottleneck Transformer — Pytorch

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662833416259/KnmgqdTor.png)

Another mind-blowing project in computer vision, Bottleneck Transformer looks like a very good project to add to your data science portfolio.

The paper says-

> *“It is simple yet powerful backbone architecture that incorporates self-attention for multiple computer vision tasks including image classification, object detection, and instance segmentation”*

Baseline models see significant improvement by simply replacing the last 3 bottleneck blocks of a ResNet and no other changes. Sounds promising, doesn’t it?

The Bottleneck transformer has all the potential to serve as a strong baseline for future research in self-attention models for vision.

[**GitHub - Medium-Posts/bottleneck-transformer-pytorch: Implementation of Bottleneck Transformer in…**  
*Implementation of Bottleneck Transformer, SotA visual recognition model with convolution + attention that outperforms…*github.com](https://github.com/Medium-Posts/bottleneck-transformer-pytorch "https://github.com/Medium-Posts/bottleneck-transformer-pytorch")[](https://github.com/Medium-Posts/bottleneck-transformer-pytorch)

### StyleGAN2-ADA — Official PyTorch implementation

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662833418469/xX7JNUara.png)

When generative adversarial networks are trained using too small data, it may end up in discriminator overfitting, causing training to diverge. This project comes with a solution by including an adaptive discriminator augmentation mechanism that can stabilize training in limited data regimes.

The project come with a lot of promises including-

*   Full support for all primary training configurations.
*   Extensive verification of image quality, training curves, and quality metrics against the TensorFlow version.
*   Results are expected to match in all cases, excluding the effects of pseudo-random numbers and floating-point arithmetic.

With increased speed and efficiency as compared to other projects, [StyleGAN2-ADA](https://github.com/NVlabs/stylegan2-ada-pytorch) is a nice open-sourced project to add to your portfolio.

[**GitHub - Medium-Posts/stylegan2-ada-pytorch: StyleGAN2-ADA - Official PyTorch implementation**  
*Training Generative Adversarial Networks with Limited Data Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine…*github.com](https://github.com/Medium-Posts/stylegan2-ada-pytorch "https://github.com/Medium-Posts/stylegan2-ada-pytorch")[](https://github.com/Medium-Posts/stylegan2-ada-pytorch)

### 2\. Open Source Natural Language Processing

### Trankit

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1662833419968/aBcadNfJi.jpeg)

The fascinating world of NLP is not far behind when it comes to impressive open-sourced data science projects. Trankit is another popular project released last month.

Trankit is a light-weight transformer-based python toolkit for multilingual Natural Language Processing. Its 2 main constituents include-

*   A trainable pipeline for fundamental NLP tasks over [100 languages](https://trankit.readthedocs.io/en/latest/pkgnames.html#trainable-languages)
*   90 [downloadable](https://trankit.readthedocs.io/en/latest/pkgnames.html#pretrained-languages-their-code-names) pretrained pipelines for [56 languages](https://trankit.readthedocs.io/en/latest/pkgnames.html#pretrained-languages-their-code-names)

Another impressive thing about Trankit is that it **beats the current state-of-the-art multilingual toolkit Stanza (StanfordNLP)** in many tasks over [90 Universal Dependencies v2.5 treebanks of 56 different languages](https://trankit.readthedocs.io/en/latest/performance.html#universal-dependencies-v2-5) without losing efficiency in memory usage and speed, making it usable amongst a larger audience.

[**GitHub - Medium-Posts/trankit: Trankit is a Light-Weight Transformer-based Python Toolkit for…**  
*Our technical paper for Trankit won the Outstanding Demo Paper Award at EACL 2021. Please cite the paper if you use…*github.com](https://github.com/Medium-Posts/trankit "https://github.com/Medium-Posts/trankit")[](https://github.com/Medium-Posts/trankit)

### EasyNMT — Easy to use, state-of-the-art Neural Machine Translation

![Neural Machine Tranlation](https://cdn.hashnode.com/res/hashnode/image/upload/v1662833421488/dCa8QatBgK.png)

With Easy installation, usage, and Automatic download of pre-trained machine translation models, EasyMNT will easily make your NLP portfolio stand out.

It has translation between 150+ languages and automatic language detection for 170+ languages along with sentence and document translation.

At present, the project provides the following models-

*   [Opus-MT](https://github.com/UKPLab/EasyNMT#Opus-MT) from [Helsinki-NLP](https://github.com/Helsinki-NLP/Opus-MT),
*   [mBART50\_m2m](https://github.com/UKPLab/EasyNMT#mBART_50) from [Facebook Research](https://arxiv.org/abs/2008.00401)
*   [M2M\_100](https://github.com/UKPLab/EasyNMT#M2M_100) from [Facebook Research](https://arxiv.org/abs/2010.11125)

[**GitHub - Medium-Posts/EasyNMT: Easy to use, state-of-the-art Neural Machine Translation for 100+…**  
*This package provides easy to use, state-of-the-art machine translation for more than 100+ languages.*github.com](https://github.com/Medium-Posts/EasyNMT "https://github.com/Medium-Posts/EasyNMT")[](https://github.com/Medium-Posts/EasyNMT)

### 3\. Open Source Machine Learning

### SeaLion

SeaLion is a brilliant Machine Learning Project created to teach the concepts in a more easy manner using concise algorithms capable of doing the tasks efficiently.

![SeaLion](https://cdn.hashnode.com/res/hashnode/image/upload/v1662833422920/eOCwsPmh9.png)

> *SeaLion is designed to teach today’s aspiring ml-engineers the popular machine learning concepts of today in a way that gives both intuition and ways of application.*

It is beginner-friendly when it comes to solving the standard libraries like iris, breast cancer, swiss roll, the moons dataset, MNIST, etc. The algorithms in SeaLion include:

1.  **Deep Neural Networks**
2.  **Regression**
3.  **Dimensionality Reduction**
4.  **Unsupervised Clustering**
5.  **Naive Bayes**
6.  **Trees**
7.  **Ensemble Learning**
8.  **Nearest Neighbors**
9.  **Utils**

[**GitHub - Medium-Posts/SeaLion: The first machine learning framework that encourages learning ML…**  
*SeaLion is designed to teach today's aspiring ml-engineers the popular machine learning concepts of today in a way that…*github.com](https://github.com/Medium-Posts/SeaLion "https://github.com/Medium-Posts/SeaLion")[](https://github.com/Medium-Posts/SeaLion)
