Deprecated: mysql_connect(): The mysql extension is deprecated and will be removed in the future: use mysqli or PDO instead in /home/content/77/8880177/html/ipartner/db.php on line 2
Apache Spark -iPartner

Apache Spark, Scala, Storm Training

  1. Apache Spark, Scala, Storm Training
8/12 weeks / 611*
(* including all taxes.)

Key Features

Course Agenda

  • Overview of Big Data
  • Characteristics of Big Data
  • Types of data
  • Sources of Big Data
  • Big Data examples
  • Scaling
  • Hadoop batch processing o
  • Hadoop ecosystem
  • What is streaming data?
  • Batch vs streaming data processing
  • Real time analytics options
  • Map reduce limitations and motivation towards Spark
  • What is Spark?
  • Features
  • Spark unified platform
  • Spark in Hadoop ecosystem
  • Why in-memory processing?
  • Terasort wining
  • Most active project in Apache
  • Spark survery
  • Industries using Spark
  • Popular use cases across the industry wide
  • Spark components - Driver
  • Executor
  • Worker
  • Spark master
  • Significance of Spark context
  • Resilient distributed datasets
  • Properties of RDD
  • Creating RDDs
  • Transformations in RDD
  • Actions in RDD
  • Saving data through RDD
  • Key-value pair RDD
  • Installing Spark locally (Live)
  • Invoking Spark shell
  • Loading a file in shell
  • Hands-on word count program
  • Performing some basic operations on files in Spark shell
  • Spark application overview
  • Job scheduling process
  • DAG scheduler
  • RDD graph and lineage
  • Narrow and wide dependencies
  • Life cycle of spark application

  • RDD lineage
  • Caching overview
  • Caching and persistence
  • Data locality
  • How to choose between the different persistence levels for caching RDDs
  • Spark memory allocation
  • Broadcast variables
  • Accumulators
  • "Word count example in explanation and development in 3 APIs Code walk-through on translating spark transformations to equivalent Java transformation Spark packages"
  • IDE integration
  • Building project with SBT
  • Building project with maven
  • Running the application in cluster
  • Submit in cluster mode
  • Web UI - application monitoring
  • Log files
  • Important spark configuration properties
  • Spark application execution on a cluster
  • Scheduling process
  • How a Spark application breaks down into jobs -> stages -> tasks
  • Cluster managers: Local mode
  • Standalone scheduler
  • YARN
  • Mesos
  • Serialization in Spark
  • How to implement custom input format
  • Partition transformations
  • Storing data in database

  • Mentee can select project from predefined set of iPartner projects or they can come up with their own ideas for their projects
  • "Best practices/ common mistakes
  • Optimization techniques
  • General troubleshooting
  • Memory (RAM) management
  • Spark streaming overview and architecture
  • Example: Streaming word count demo
  • DStreams
  • Breakdown of DStreams to RDD batches
  • Spark streaming example program demo and code walk through
  • Walkthrough of various Spark streaming sources
  • Custom receivers
  • Sliding window operations on DStreams
  • Streaming UI overview
  • Checkpointing
  • Multiple receivers and the Union transformation
  • Spark SQL overview
  • Spark SQL demo
  • Comparison of Apache Hive vs Spark SQL
  • SchemaRDD and data frames
  • Integration with Spark streaming
  • Spark SQL example program demo and code walk through
  • Demo on tools learnt in the session
  • Overview of Spark MLlib basics
  • Walkthrough of various algorithms and examples
  • Overview of Spark GraphX
  • Mentee can select project from predefined set of iPartner projects or they can come up with their own ideas for their projects

Learn & Get

  • Understand basic distributed concepts and Storm Architecture
  • Learn Hadoop Distributed Computing, Big Data features, Legacy architecture of Real-time System
  • Know the Logic Dynamics, Components and Topology in Storm
  • Understand the difference between Apache Spark and Hadoop
  • Learn Scala and its programming implementation
  • Implement Spark on a cluster
  • Write Spark Applications using Python, Java and Scala
  • Get deep insights into the functioning of Scala
  • Implement Trident Spouts and understand Trident Filter, Function and Aggregator
  • Learn Twitter Boot Stripping
  • Work on Minor and Major Projects applying programming techniques of Scala to run on Spark applications

Payment Method

You need to pay through PayPal. We accept both Debit and Credit Card for transaction.
We subsidize our fees by 10% for military personnel, and college students with exceptional records. To apply for a scholarship, email
In our iPartner self-paced training program, you will receive the training assessments, recorded sessions, course materials, Quizzes, related softwares and assignments. The courses are designed in such a way that you will the get real world exposure; the solid understanding of every concept that allows you to get the most from the online training experience and you will be able to apply the information and skills in the workplace. After the successful completion of your training program, you can take quizzes which enable you to check your level of knowledge and also enables you to clear your relevant certification at higher marks/grade where you will be able to work on the technologies independently.
In Self-paced courses, the learners are able to conduct hands-on exercises and produce learning deliverables entirely on their own at any convenient time without a facilitator whereas in the Online training courses, a facilitator will be available for answering queries at a specific time to be dedicated for learning. During your self-paced learning, you can learn more effectively when you interact with the content that is presented and a great way to facilitate this is through review questions and quizzes that strengthen key concepts. In case if you face any unexpected challenges while learning, we will arrange a live class with our trainer.
All Courses from iPartner are highly interactive to provide good exposure to learners and gives them a real time experience. You can learn only at a time where there are no distractions, which leads to effective learning. The costs of self-paced training are 75% cheaper than the online training. You will offer lifetime access hence you can refer it anytime during your project work or job.
Yes, at the top of the page of course details you can see sample videos.
As soon as you enroll to the course, your LMS (The Learning Management System) Access will be Functional. You will immediately get access to our course content in the form of a complete set of previous class recordings, PPTs, PDFs, assignments and access to our 24*7 support team. You can start learning right away.
24/7 access to video tutorials and Email Support along with online interactive session support with trainer for issue resolving.
Yes, You can pay difference amount between Online training and Self-paced course and you can be enrolled in next online training batch.
Please send an email. You can join our Live chat for instant solution.
We will provide you the links of the software to download which are open source and for proprietary tools, we will provide you the trail version if available.
You will have to work on a training project towards the end of the course. This will help you understand how the different components of courses are related to each other.
Classes are conducted via LIVE Video Streaming, where you get a chance to meet the instructor by speaking, chatting and sharing your screen. You will always have the access to videos and PPT. This would give you a clear insight about how the classes are conducted, quality of instructors and the level of Interaction in the class.
Yes, we do keep launching multiple offers that best suits your needs. Please email us at: and we will get back to you with exciting offers.
We will help you with the issue and doubts regarding the course. You can attempt the quiz again.
Sure! Your feedbacks are greatly appreciated. Please connect with us on the email support -