Data Science Mlxtend: Feature Selection Tutorial In this article, I present Mlxtend (machine learning extensions), a Python library of useful tools for the day-to-day data science tasks. To showcase its strength I use the library to

Data Science Silhouette Analysis vs Elbow Method vs Davies-Bouldin Index: Selecting the optimal number of clusters for KMeans clustering In this article, I compare three well-known techniques for validating the quality of clustering: the Davies-Bouldin Index, the Silhouette Score and the Elbow Method. All the aforementioned techniques are used

Data Science Implementation of K-means from scratch in Python (9 lines) Last week, I was asked to implement the K-Means clustering algorithm from scratch in python as part of my MSc Data Science Degree Apprenticeship from the University of Exeter. In

Data Science How to create and add a conda environment as Jupyter Kernel? As data scientist, I daily work with Jupyter Notebook/ Jupyter Lab. One thing that I used to google a lot every time I start a new project is how to

Data Science When (& why) to use log transformation in regression? As data scientist working on regression problems I have faced a lot of times datasets with right-skewed target's distributions. By googling it I found out that log transformation can help

Data Science Python vs Swift for data science: Python's days are numbered At the TensorFlow Developer Summit in March 2018, Swift was announced for TensorFlow as an open-source project on GitHub. Later in March 2019, Jeremy Howard founder of fastai announced that

Data Science Top 10 Technical Machine Learning YouTube Channels to follow In this article, I will present my favorite top-10 Machine Learning YouTube Channels to follow in order to keep up with the current trends. The list of my favorite channels

Data Science Hypothesis Testing: Z-test & Student's t-test Today I am going to speak about Hypothesis Testing which is frequently used by data scientists to: Test a particular ideaConstructed an experiment to answer a particular question✏️ Table of

Data Science Pandas-Profiling: A useful EDA tool When loading a new data set, the first thing we do is to get an understanding of the data. This includes steps like determining the number of unique values, identifying

Data Science Linear Models Decoded - Part 1 In this article, I will present the linear models in terms of questions and answers that can be asked during an interview process. I will try to start from very

Data Science Tmux an essential tool for Data Scientists Many data scientists get stuck with the manifold of tools available to them. In this article, I will present the Tmux tool. What it does best is turn a single

NLP What is an N-gram Multichannel Convolutional Neural Network for Text Classification Deep neural networks have achieved remarkable results in some NLP tasks, one of them is text classification, i.e., assigning a set of pre-defined tags on a text based on

Data Science Importance of Cross-Validation Validation is probably in one of most important techniques that a data scientist use as there is always a need to validate the stability of the machine learning model-how well

NLP Sentiment Analysis on IMDB movie dataset - Achieve state of the art result using a simple Neural Network In my previous articles, I used two models to predict whether the movie reviews were positive or negative using the IMDB dataset. If you haven't read those articles I would

Data Science What is an Embedding Layer? A couple of months ago I had myself the same question, so I thought of writing an article trying to summarize and documented my understanding of an embedding layer. ✏️ Table

NLP Sentiment Analysis on IMDB movie dataset - Achieve state of the art result using Logistic Regression In my previous article, I used the Naive Bayes model to predict whether the movie reviews were positive or negative using the IMDB dataset. If you haven't read this article

Data Science Support Vector Machine vs Logistic Regression Support Vector Machine (SVM) is an algorithm used for classification problems similar to Logistic Regression (LR). LR and SVM with linear Kernel generally perform comparably in practice. The goal of

NLP Sentiment Analysis on IMDB movie dataset - Achieve state of the art result using Naive Bayes NLP refers to any kind of modelling where we are working with natural language text. Sentiment Analysis is a one of the most common NLP task that Data Scientists need

NLP NLP Tutorial: MultiLabel Classification Problem using Linear Models This article presents in details how to predict tags for posts from StackOverflow using Linear Model after carefully preprocessing our text features. Table of ContentsIntroductionDatasetImport Libraries and Load the dataText

NLP Transforming tokens into useful features (BOW,TF-IDF) In my previous article, I presented different methods to preprocess text and extract useful tokens. However, these tokens are only useful if you can transform them into features for your

NLP All you need to know about NLP Text Preprocessing Text preprocessing is a severely overlooked topic and a lot NLP applications fail badly due to use of wrong kind of text preprocessing. With that in mind, I thought of

Data Science Random Forest regression model Advanced Topics (+ Python code snippet using Sklearn) In my previous article, I presented the Random Forest Regressor model. If you haven't read this article I would urge you to read it before continuing. In simple terms, a

Data Science Random Forest Regressor explained in depth In my previous article, I presented the Decision Tree Regressor algorithm. If you haven't read this article I would urge you to read it before continuing. The reason is that

Data Science Entity Embeddings of Categorical Variables in Neural Networks Categorical variables are known to hide and mask lots of interesting information in a data set and many times they might even be the most important variables in a model.

Data Science What is a Recurrent Neural Networks (RNNS) and Gated Recurrent Unit (GRUS) Recurrent Neural Networks (RNNs) are popular models that have shown great promise in many sequential data and among others used by Apples Siri and Googles Voice Search. Their great advantage