Skip to content

Docs #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Docs #18

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 65 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,54 @@
[![License](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](https://opensource.org/licenses/BSD-2-Clause)
[![GitHub forks](https://img.shields.io/github/forks/tirthajyoti/Machine-Learning-with-Python.svg)](https://github.com/tirthajyoti/Machine-Learning-with-Python/network)
[![GitHub stars](https://img.shields.io/github/stars/tirthajyoti/Machine-Learning-with-Python.svg)](https://github.com/tirthajyoti/Machine-Learning-with-Python/stargazers)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/tirthajyoti/Machine-Learning-with-Python/pulls)

# Python Machine Learning Notebooks (Tutorial style)
# Python Machine Learning Jupyter Notebooks ([ML website](https://machine-learning-with-python.readthedocs.io/en/latest/))

### Dr. Tirthajyoti Sarkar, Fremont, California ([Please feel free to connect on LinkedIn here](https://www.linkedin.com/in/tirthajyoti-sarkar-2127aa7))

### Dr. Tirthajyoti Sarkar, Fremont, CA ([Please feel free to add me on LinkedIn here](https://www.linkedin.com/in/tirthajyoti-sarkar-2127aa7))
---

### Requirements
* **Python 3.5+**
* **NumPy (`$ pip install numpy`)**
* **Pandas (`$ pip install pandas`)**
* **Scikit-learn (`$ pip install scikit-learn`)**
* **SciPy (`$ pip install scipy`)**
* **Statsmodels (`$ pip install statsmodels`)**
* **MatplotLib (`$ pip install matplotlib`)**
* **Seaborn (`$ pip install seaborn`)**
* **Sympy (`$ pip install sympy`)**
## Also check out these super-useful Repos that I curated

[Highly cited and useful papers related to machine learning, deep learning, AI, game theory, reinforcement learning](https://github.com/tirthajyoti/Papers-Literature-ML-DL-RL-AI)

[Carefully curated resource links for data science in one place](https://github.com/tirthajyoti/Data-science-best-resources)

## Requirements
* **Python 3.6+**
* **NumPy (`pip install numpy`)**
* **Pandas (`pip install pandas`)**
* **Scikit-learn (`pip install scikit-learn`)**
* **SciPy (`pip install scipy`)**
* **Statsmodels (`pip install statsmodels`)**
* **MatplotLib (`pip install matplotlib`)**
* **Seaborn (`pip install seaborn`)**
* **Sympy (`pip install sympy`)**
* **Flask (`pip install flask`)**
* **WTForms (`pip install wtforms`)**
* **Tensorflow (`pip install tensorflow>=1.15`)**
* **Keras (`pip install keras`)**
* **pdpipe (`pip install pdpipe`)**

---

You can start with this article that I wrote in Heartbeat magazine (on Medium platform):
### ["Some Essential Hacks and Tricks for Machine Learning with Python"](https://heartbeat.fritz.ai/some-essential-hacks-and-tricks-for-machine-learning-with-python-5478bc6593f2)
<img src="https://cookieegroup.com/wp-content/uploads/2018/10/2-1.png" width="450" height="300"/>

## Essential tutorial-type notebooks on Pandas and Numpy
Jupyter notebooks covering a wide range of functions and operations on the topics of NumPy, Pandans, Seaborn, matplotlib etc.

### [Basic Numpy operations](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Pandas%20and%20Numpy/Basics%20of%20Numpy%20arrays.ipynb)
### [Basic Pandas operations](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Pandas%20and%20Numpy/Basics%20of%20Pandas%20DataFrame.ipynb)
### [Basics of visualization with Matplotlib and descriptive stats](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Pandas%20and%20Numpy/Basics%20of%20Matplotlib%20and%20Descriptive%20Statistics.ipynb)
### [Advanced Pandas operations](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Pandas%20and%20Numpy/Advanced%20Pandas%20Operations.ipynb)
### [How to read various data sources](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Pandas%20and%20Numpy/Read_data_various_sources/How%20to%20read%20various%20sources%20in%20a%20DataFrame.ipynb)
### [PDF reading and table processing demo](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Pandas%20and%20Numpy/Read_data_various_sources/PDF%20table%20reading%20and%20processing%20demo.ipynb)
### [How fast are Numpy operations compared to pure Python code?](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Pandas%20and%20Numpy/How%20fast%20are%20NumPy%20ops.ipynb) (Read my [article](https://towardsdatascience.com/why-you-should-forget-for-loop-for-data-science-code-and-embrace-vectorization-696632622d5f) on Medium related to this topic)
### [Fast reading from Numpy using .npy file format](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Pandas%20and%20Numpy/Numpy_Reading.ipynb) (Read my [article](https://towardsdatascience.com/why-you-should-start-using-npy-file-more-often-df2a13cc0161) on Medium on this topic)
Jupyter notebooks covering a wide range of functions and operations on the topics of NumPy, Pandans, Seaborn, Matplotlib etc.

* [Detailed Numpy operations](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Pandas%20and%20Numpy/Numpy_operations.ipynb)
* [Detailed Pandas operations](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Pandas%20and%20Numpy/Pandas_Operations.ipynb)
* [Numpy and Pandas quick basics](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Pandas%20and%20Numpy/Numpy_Pandas_Quick.ipynb)
* [Matplotlib and Seaborn quick basics](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Pandas%20and%20Numpy/Matplotlib_Seaborn_basics.ipynb)
* [Advanced Pandas operations](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Pandas%20and%20Numpy/Advanced%20Pandas%20Operations.ipynb)
* [How to read various data sources](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Pandas%20and%20Numpy/Read_data_various_sources/How%20to%20read%20various%20sources%20in%20a%20DataFrame.ipynb)
* [PDF reading and table processing demo](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Pandas%20and%20Numpy/Read_data_various_sources/PDF%20table%20reading%20and%20processing%20demo.ipynb)
* [How fast are Numpy operations compared to pure Python code?](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Pandas%20and%20Numpy/How%20fast%20are%20NumPy%20ops.ipynb) (Read my [article](https://towardsdatascience.com/why-you-should-forget-for-loop-for-data-science-code-and-embrace-vectorization-696632622d5f) on Medium related to this topic)
* [Fast reading from Numpy using .npy file format](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Pandas%20and%20Numpy/Numpy_Reading.ipynb) (Read my [article](https://towardsdatascience.com/why-you-should-start-using-npy-file-more-often-df2a13cc0161) on Medium on this topic)

## Tutorial-type notebooks covering regression, classification, clustering, dimensionality reduction, and some basic neural network algorithms

Expand All @@ -47,34 +62,44 @@ Jupyter notebooks covering a wide range of functions and operations on the topic
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/f8/L1_and_L2_balls.svg/300px-L1_and_L2_balls.svg.png"/>

* Polynomial regression using ***scikit-learn pipeline feature*** ([check the article I wrote on *Towards Data Science*](https://towardsdatascience.com/machine-learning-with-python-easy-and-robust-method-to-fit-nonlinear-data-19e8a1ddbd49))
* Decision trees and Random Forest regression (showing how the Random Forest works as a robust/regularized meta-estimator rejecting overfitting)

* [Decision trees and Random Forest regression](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Regression/Random_Forest_Regression.ipynb) (showing how the Random Forest works as a robust/regularized meta-estimator rejecting overfitting)

* [Detailed visual analytics and goodness-of-fit diagnostic tests for a linear regression problem](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Regression/Regression_Diagnostics.ipynb)

* [Robust linear regression using `HuberRegressor` from Scikit-learn](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Regression/Robust%20Linear%20Regression.ipynb)

-----

### Classification
* Logistic regression/classification
* Logistic regression/classification ([Here is the Notebook](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Classification/Logistic_Regression_Classification.ipynb))
<img src="https://qph.fs.quoracdn.net/main-qimg-914b29e777e78b44b67246b66a4d6d71"/>

* _k_-nearest neighbor classification
* Decision trees and Random Forest Classification
* Support vector machine classification (**[check the article I wrote in Towards Data Science on SVM and sorting algorithm](https://towardsdatascience.com/how-the-good-old-sorting-algorithm-helps-a-great-machine-learning-technique-9e744020254b))**
* _k_-nearest neighbor classification ([Here is the Notebook](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Classification/KNN_Classification.ipynb))

* Decision trees and Random Forest Classification ([Here is the Notebook](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Classification/DecisionTrees_RandomForest_Classification.ipynb))

* Support vector machine classification ([Here is the Notebook](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Classification/Support_Vector_Machine_Classification.ipynb)) (**[check the article I wrote in Towards Data Science on SVM and sorting algorithm](https://towardsdatascience.com/how-the-good-old-sorting-algorithm-helps-a-great-machine-learning-technique-9e744020254b))**

<img src="https://docs.opencv.org/2.4/_images/optimal-hyperplane.png"/>

* Naive Bayes classification
* Naive Bayes classification ([Here is the Notebook](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Classification/Naive_Bayes_Classification.ipynb))

---

### Clustering
<img src="https://i.ytimg.com/vi/IJt62uaZR-M/maxresdefault.jpg" width="450" height="300"/>

* _K_-means clustering
* Affinity propagation (showing its time complexity and the effect of damping factor)
* Mean-shift technique (showing its time complexity and the effect of noise on cluster discovery)
* DBSCAN (showing how it can generically detect areas of high density irrespective of cluster shapes, which the k-means fails to do)
* Hierarchical clustering with Dendograms showing how to choose optimal number of clusters
* _K_-means clustering ([Here is the Notebook](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Clustering-Dimensionality-Reduction/K_Means_Clustering_Practice.ipynb))

* Affinity propagation (showing its time complexity and the effect of damping factor) ([Here is the Notebook](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Clustering-Dimensionality-Reduction/Affinity_Propagation.ipynb))

* Mean-shift technique (showing its time complexity and the effect of noise on cluster discovery) ([Here is the Notebook](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Clustering-Dimensionality-Reduction/Mean_Shift_Clustering.ipynb))

* DBSCAN (showing how it can generically detect areas of high density irrespective of cluster shapes, which the k-means fails to do) ([Here is the Notebook](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Clustering-Dimensionality-Reduction/DBScan_Clustering.ipynb))

* Hierarchical clustering with Dendograms showing how to choose optimal number of clusters ([Here is the Notebook](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Clustering-Dimensionality-Reduction/Hierarchical_Clustering.ipynb))

<img src="https://www.researchgate.net/profile/Carsten_Walther/publication/273456906/figure/fig3/AS:294866065084419@1447312956501/Example-of-hierarchical-clustering-clusters-are-consecutively-merged-with-the-most.png" width="700" height="400"/>

---
Expand All @@ -96,8 +121,12 @@ Jupyter notebooks covering a wide range of functions and operations on the topic
* How to use [Sympy package](https://www.sympy.org/en/index.html) to generate random datasets using symbolic mathematical expressions.

* Here is my article on Medium on this topic: [Random regression and classification problem generation with symbolic expression](https://towardsdatascience.com/random-regression-and-classification-problem-generation-with-symbolic-expression-a4e190e37b8d)

---

### Synthetic data generation techniques
* [Notebooks here](https://github.com/tirthajyoti/Machine-Learning-with-Python/tree/master/Synthetic_data_generation)

### Simple deployment examples (serving ML models on web API)
* [Serving a linear regression model through a simple HTTP server interface](https://github.com/tirthajyoti/Machine-Learning-with-Python/tree/master/Deployment/Linear_regression). User needs to request predictions by executing a Python script. Uses `Flask` and `Gunicorn`.

Expand All @@ -113,3 +142,6 @@ See my articles on Medium on this topic.
* [Object-oriented programming for data scientists: Build your ML estimator](https://towardsdatascience.com/object-oriented-programming-for-data-scientists-build-your-ml-estimator-7da416751f64)
* [How a simple mix of object-oriented programming can sharpen your deep learning prototype](https://towardsdatascience.com/how-a-simple-mix-of-object-oriented-programming-can-sharpen-your-deep-learning-prototype-19893bd969bd)

---
### Unit testing ML code with Pytest
Check the files and detailed instructions in the [Pytest](https://github.com/tirthajyoti/Machine-Learning-with-Python/tree/master/Pytest) directory to understand how one should write unit testing code/module for machine learning models
Loading