DURATION: 5 Day (40 hours).
Time Division (Break: 15 + 45 + 15 mines).
Course Outcomes:
- Understand the fundamentals of Data Science and Machine Learning.
- Analyse and pre-process data proficiently using Python.
- Apply Supervised Machine Learning techniques for regression and classification.
- Apply Unsupervised Machine Learning for clustering and natural language.
- Introduction to Deep learning concepts.
Important Note:
- Courseware – Reference material/ppt along with lab files/exercises will be provided.
Module 1: Introduction to Data Science & Machine Learning:
- Need for Data Science and Machine Learning.
- Types of Analytics.
- Lifecycle of a Data Science project.
- Skills for a Data Scientist role.
- Types of Machine Learning.
Module 2: Python for Data Analysis & Pre-processing:
Introduction to Python
- Python Libraries – NumPy, Pandas, matplotlib, Seaborn scikit-learn, Tensor Flow, Keras, Pytorch.
- Exploratory Data Analysis (EDA).
- Data Cleaning Techniques, Handling Missing Data, Handling Categorical Data.
- Introduction to EDA, 2D Scatter-plot, 3D Scatter-plot, Pair plots.
- Univariate, Bivariate, and Multivariate Analysis, Box-plot.
Data Pre-Processing
- Need for Data Pre-Processing.
- Handling Missing Values.
- Label-Encoding for Categorical Data.
- Hot-Encoding for Categorical Data Explained.
Data Transformation
- Need for Data Transformation.
- Concept of Data Normalization.
- Data Normalization Techniques – Standard Scalar & Minmax.
- Train, Test & Validation of Data.
Module 3: Supersized Machine Learning – Regression
Simple Linear Regression
- Concept of Linear Regression.
- Ordinary Least Square and Regression Errors.
- Data Processing & Train and Test of Model.
- Model Evaluation Parameters like R-squared, Score, RMSE and their Interpretations.
- Prediction Plot & its Interpretation.
- Hands-on Problem.
Multiple Linear Regression
- Concept of Multiple Linear Regression.
- Degrees of Freedom.
- Adjusted R-Squared.
- Assumptions of Multiple Linear Regression – Linearity, Multicollinearity, Autocorrelation,.
- Indigeneity, Normality of Residuals, Homoscedasticity, etc..
- Concept of time-lag data in Autocorrelation.
- Concept of Dummy variable trap.
- Hands-on Problem.
Module 4: Supervised Machine Learning – Classification
Logistic Regression
- Concept of Logistic Regression.
- Concept of Stratification.
- Concept of Confusion Matrix.
- Hands-on Problem.
Support Vector Machine (SVM)
- Common Sensical Intuition of SVM.
- Mathematical Intuition of SVM.
- Different types of SVM Kernel Functions.
- Hands-on Problem (Preferred: IRIS Classification Problem).
Decision Tree Classifier
- Decision Tree Classifier.
- Optimal Model Selection Criterion in Decision Tree.
- Hands-on Problem.
Random Forest Classifier
- Ensemble Learning and Random Forests.
- Bagging and Boosting.
- Hands-on Problem.
Evaluation Metrics for Classification Models
- Need for Evaluation and Accuracy Paradox.
- Different Measures for Classification Models – Accuracy, Precision, Recall, F1 Score, etc.
- Threshold and Adjusting Thresholds.
- AUC ROC Curve.
- Hands-on Problem.
Module 5: Feature Selection and Dimensionality Reduction
Univariate Feature Selection
- Feature Selection Importance.
- Concept of Univariate Feature Selection.
- F-Test for Regression and Classification.
- Hands on F-test (p value analysis).
- Chi-Squared for Classification.
- Feature Selection Techniques – Select Best, Select Percentile & Generic Univariate Select.
- Hands-on Chi-squared (p value analysis).
Recursive Feature Elimination (RFE)
- Concept of Recursive Feature Elimination (RFE).
- Feature Importance Score/Feature Ranking.
- Hands-on RFE.
Principle Component Analysis (PCA)
- Need to reduce dimensions and Importance of PCA.
- Mathematical Intuition of PCA & Steps to calculate PCA.
- Hands-on PCA (Model Comparisons with PCA & without PCA recommended).
Module 6: Cross validation & Hyper parameter Tuning
- Cross Validation.
- Importance of Cross Validation.
- Parameter & Implementation of Cross Validation.
- Hands-on Problem (Drawing inference from results).
Hyper parameter Tuning
- Concept of Hyper parameter Tuning.
- Grid Search & Randomized Search.
- Hands-on GridSearchCV (analyse results).
Module 7: Supervized Machine Learning – Natural Language Processing
- Introduction to NLP.
- Basic Concepts of NLP: Tokenization, stop words, Stemming, Lemmatization, etc.
- Tfidf Vector and its mathematical intuition.
- Recommendation system example.
Module 8: Supervized Machine Learning – Clustering
- Introduction to Clustering.
- Mathematical intuition behind cluster formation.
- Elbow method & its mathematical intuition.
- K-means Clustering Implementation (numerical).
- K-means Clustering Implementation (natural language processing).
- Introduction to Clustering.
Module 9: Introduction to Deep Learning
- Need & Applications of Deep Learning.
- Working of Artificial Neural Network.
- Backend (Tensor Flow) & Frontend (Keras).
- Concept of Tensor.
- Keras Model Building Overview – Construct, Compile & Evaluate.
- Activation Function.
- Loss Functions.
- Optimization Techniques.
- Evaluation metrics for Deep Learning.