Some Data Science Projects Every Data Scientist Must Know


Open source data science projects to enhance your portfolio
Let’s divide the projects into categories:
Open Sourcer Computer Vision
FaceX-Zoo
- Bottleneck Transformer — Pytorch
- StyleGAN2-ADA — Official PyTorch implementation
2. Open Source Natural Language Processing
- Trankit
- EasyNMT — Easy to use, state-of-the-art Neural Machine Translation
3. Open Source Machine Learning
- SeaLion
1. Open Sourcer Computer Vision
FaceX-Zoo

FaceX-Zoo has to be one of the most impressive projects of the month. With face recognition becoming more and more relevant in the realm of computer vision FaceX-Zoo is an open-source data science project you do not want to miss.
FaceX-Zoo is a face recognition PyTorch toolbox. It comes with a training module having different supervisory heads and backbones towards state-of-the-art face recognition. It has a standardized evaluation module, enabling the evaluation of models in most of the popular benchmarks just by editing a simple configuration.
Also, a simple yet fully functional face SDK is provided for the validation and primary application of the trained models. Also, FaceX-Zoo easily upgrades and extends along with the development of face-related domains.
[GitHub - Medium-Posts/FaceX-Zoo: A PyTorch Toolbox for Face Recognition
FaceX-Zoo is a PyTorch toolbox for face recognition. It provides a training module with various supervisory heads and…github.com](https://github.com/Medium-Posts/FaceX-Zoo "https://github.com/Medium-Posts/FaceX-Zoo")
Bottleneck Transformer — Pytorch

Another mind-blowing project in computer vision, Bottleneck Transformer looks like a very good project to add to your data science portfolio.
The paper says-
“It is simple yet powerful backbone architecture that incorporates self-attention for multiple computer vision tasks including image classification, object detection, and instance segmentation”
Baseline models see significant improvement by simply replacing the last 3 bottleneck blocks of a ResNet and no other changes. Sounds promising, doesn’t it?
The Bottleneck transformer has all the potential to serve as a strong baseline for future research in self-attention models for vision.
[GitHub - Medium-Posts/bottleneck-transformer-pytorch: Implementation of Bottleneck Transformer in…
Implementation of Bottleneck Transformer, SotA visual recognition model with convolution + attention that outperforms…github.com](https://github.com/Medium-Posts/bottleneck-transformer-pytorch "https://github.com/Medium-Posts/bottleneck-transformer-pytorch")
StyleGAN2-ADA — Official PyTorch implementation

When generative adversarial networks are trained using too small data, it may end up in discriminator overfitting, causing training to diverge. This project comes with a solution by including an adaptive discriminator augmentation mechanism that can stabilize training in limited data regimes.
The project come with a lot of promises including-
- Full support for all primary training configurations.
- Extensive verification of image quality, training curves, and quality metrics against the TensorFlow version.
- Results are expected to match in all cases, excluding the effects of pseudo-random numbers and floating-point arithmetic.
With increased speed and efficiency as compared to other projects, StyleGAN2-ADA is a nice open-sourced project to add to your portfolio.
[GitHub - Medium-Posts/stylegan2-ada-pytorch: StyleGAN2-ADA - Official PyTorch implementation
Training Generative Adversarial Networks with Limited Data Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine…github.com](https://github.com/Medium-Posts/stylegan2-ada-pytorch "https://github.com/Medium-Posts/stylegan2-ada-pytorch")
2. Open Source Natural Language Processing
Trankit

The fascinating world of NLP is not far behind when it comes to impressive open-sourced data science projects. Trankit is another popular project released last month.
Trankit is a light-weight transformer-based python toolkit for multilingual Natural Language Processing. Its 2 main constituents include-
- A trainable pipeline for fundamental NLP tasks over 100 languages
- 90 downloadable pretrained pipelines for 56 languages
Another impressive thing about Trankit is that it beats the current state-of-the-art multilingual toolkit Stanza (StanfordNLP) in many tasks over 90 Universal Dependencies v2.5 treebanks of 56 different languages without losing efficiency in memory usage and speed, making it usable amongst a larger audience.
[GitHub - Medium-Posts/trankit: Trankit is a Light-Weight Transformer-based Python Toolkit for…
Our technical paper for Trankit won the Outstanding Demo Paper Award at EACL 2021. Please cite the paper if you use…github.com](https://github.com/Medium-Posts/trankit "https://github.com/Medium-Posts/trankit")
EasyNMT — Easy to use, state-of-the-art Neural Machine Translation

With Easy installation, usage, and Automatic download of pre-trained machine translation models, EasyMNT will easily make your NLP portfolio stand out.
It has translation between 150+ languages and automatic language detection for 170+ languages along with sentence and document translation.
At present, the project provides the following models-
- Opus-MT from Helsinki-NLP,
- mBART50_m2m from Facebook Research
- M2M_100 from Facebook Research
[GitHub - Medium-Posts/EasyNMT: Easy to use, state-of-the-art Neural Machine Translation for 100+…
This package provides easy to use, state-of-the-art machine translation for more than 100+ languages.github.com](https://github.com/Medium-Posts/EasyNMT "https://github.com/Medium-Posts/EasyNMT")
3. Open Source Machine Learning
SeaLion
SeaLion is a brilliant Machine Learning Project created to teach the concepts in a more easy manner using concise algorithms capable of doing the tasks efficiently.

SeaLion is designed to teach today’s aspiring ml-engineers the popular machine learning concepts of today in a way that gives both intuition and ways of application.
It is beginner-friendly when it comes to solving the standard libraries like iris, breast cancer, swiss roll, the moons dataset, MNIST, etc. The algorithms in SeaLion include:
- Deep Neural Networks
- Regression
- Dimensionality Reduction
- Unsupervised Clustering
- Naive Bayes
- Trees
- Ensemble Learning
- Nearest Neighbors
- Utils
[GitHub - Medium-Posts/SeaLion: The first machine learning framework that encourages learning ML…
SeaLion is designed to teach today's aspiring ml-engineers the popular machine learning concepts of today in a way that…github.com](https://github.com/Medium-Posts/SeaLion "https://github.com/Medium-Posts/SeaLion")




