Data Science, R, Mahout Training - Combo Course -iPartner

41 hr / 626*
(* including all taxes.)

Key Features

Course Agenda

  • Data Science Overview
  • Reasons to use Data Science
  • Project Lifecycle
  • Data Acquirement
  • Evaluation of Input Data
  • Transforming Data
  • Statistical and analytical methods to work with data
  • Machine Learning basics
  • Introduction to Recommender systems
  • Apache Mahout Overview
  • What is Data Science?
  • What Kind of Problems can you solve?
  • Data Science Project Life Cycle
  • Data Science-Basic Principles
  • Data Acquisition
  • Data Collection
  • Understanding Data- Attributes in a Data, Different types of Variables
  • Build the Variable type Hierarchy
  • Two Dimensional Problem
  • Co-relation b/w the Variables- explain using Paint Tool
  • Outliers, Outlier Treatment
  • Boxplot, How to Draw a Boxplot
  • Discussion on Boxplot- also Explain
  • Example to understand variable Distributions
  • What is Percentile? – Example using Rstudio tool
  • How do we identify outliers?
  • How do we handle outliers?
  • Outlier Treatment: Using Capping/Flooring General Method
  • Distribution- What is Normal Distribution?
  • Why Normal Distribution is so popular?
  • Uniform Distribution
  • Skewed Distribution
  • Transformation
  • Discussion about Boxplot and Outlier
  • Goal: Increase Profits of a Store
  • Areas of increasing the efficiency
  • Data Request
  • Business Problem: To maximize shop Profits
  • What are Interlinked variables
  • What is Strategy
  • Interaction b/w the Variables
  • Univariate analysis
  • Multivariate analysis
  • Bivariate analysis
  • Relation b/w Variables
  • Standardize Variables
  • What is Hypothesis?
  • Interpret the Correlation
  • Negative Correlation
  • Machine Learning
  • Correlation b/w Nominal Variables
  • Contingency Table
  • What is Expected Value?
  • What is Mean?
  • How Expected Value is differ from Mean
  • Experiment – Controlled Experiment, Uncontrolled Experiment
  • Degree of Freedom
  • Dependency b/w Nominal Variable & Continuous Variable
  • Linear Regression
  • Extrapolation and Interpolation
  • Univariate Analysis for Linear Regression
  • Building Model for Linear Regression
  • Pattern of Data means?
  • Data Processing Operation
  • What is sampling?
  • Sampling Distribution
  • Stratified Sampling Technique
  • Disproportionate Sampling Technique
  • Balanced Allocation-part of Disproportionate Sampling
  • Systematic Sampling
  • Cluster Sampling
  • 2 angels of Data Science-Statistical Learning, Machine Learning

  • Multi variable analysis
  • linear regration
  • Simple linear regration
  • Hypothesis testing
  • Speculation vs. claim(Query)
  • Sample
  • Step to test your hypothesis
  • performance measure
  • Generate null hypothesis
  • alternative hypothesis
  • Testing the hypothesis
  • Threshold value
  • Hypothesis testing explanation by example
  • Null Hypothesis
  • Alternative Hypothesis
  • Probability
  • Histogram of mean value
  • Revisit CHI-SQUARE independence test
  • Correlation between Nominal Variable
  • Machine Learning
  • Importance of Algorithms
  • Supervised and Unsupervised Learning
  • Various Algorithms on Business
  • Simple approaches to Prediction
  • Predict Algorithms
  • Population data
  • sampling
  • Disproportionate Sampling
  • Steps in Model Building
  • Sample the data
  • What is K?
  • Training Data
  • Test Data
  • Validation data
  • Model Building
  • Find the accuracy
  • Rules
  • Iteration
  • Deploy the model
  • Linear regression
  • Clustering
  • Cluster and Clustering with Example
  • Data Points, Grouping Data Points
  • Manual Profiling
  • Horizontal & Vertical Slicing
  • Clustering Algorithm
  • Criteria for take into Consideration before doing Clustering
  • Graphical Example
  • Clustering & Classification: Exclusive Clustering, Overlapping Clustering, Hierarchy Clustering
  • Simple Approaches to Prediction
  • Different types of Distances: 1.Manhattan, 2.Euclidean, 3.Consine Similarity
  • Clustering Algorithm in Mahout
  • Probabilistic Clustering
  • Pattern Learning
  • Nearest Neighbor Prediction
  • Nearest Neighbor Analysis
  • R introduction
  • How R is typically used
  • Features of R
  • Introduction to Big data
  • R+Hadoop
  • Ways to connect with R and Hadoop
  • Products
  • Case Study
  • Architecture
  • Steps for Installing RIMPALA
  • How to create IMPALA packages
  • Classification and Recommendation
  • Clustering in Mahout
  • Pattern Mining
  • Understanding machine Learning
  • Using Model diagram to decide the approach
  • Data flow
  • Supervised and Unsupervised learning

  • Concept of Recommendation
  • Recommendations by E-commerce site
  • Comparison between User Recommendations and Item recommendation
  • Define recommenders and Classifiers
  • Process of Collaborative Filtering
  • Explaining Pearson coefficient algorithm
  • Euclidean distance measure
  • Implementing a recommender using map reduce
    • What is statistics
    • How is this useful
    • What is this course for
    • Converting data into useful information
    • Collecting the data
    • Understand the data
    • Finding useful information in the data
    • Interpreting the data
    • Visualizing the data
  • Descriptive statistics
  • Let us understand some terms in statistics
  • Variable
  • Dot Plots
  • Histogram
  • Stemplots
  • Box and whisker plots
  • Outlier detection from box plots and Box and whisker plots
  • What is probability
  • Set & rules of probability
  • Bayes Theorem
  • Probability Distributions
  • Few Examples
  • Student T- Distribution
  • Sampling Distribution
  • Student t- Distribution
  • Poison distribution
  • Stratified Sampling
  • Proportionate Sampling
  • Systematic Sampling
  • P – Value
  • Stratified Sampling
  • Cross Tables
  • Bivariate Analysis
  • Multi variate Analysis
  • Dependence and Independence tests ( Chi-Square )
  • Analysis of Variance
  • Correlation between Nominal variables

Learn & Get

  • Learn the concept of Logistic Regression
  • Master Vector Creation and Assigning Values to Variables
  • Generate Repeats and Factor levels
  • Explore steps to install IMPALA
  • Get familiar with statistics concepts
  • Learn rules of Probability and Bayes Theorem

