Big Data Analytics using Spark

Industries such as Banks, Finance, Logistics among others process and analyse huge amounts of data every day. It is important to know how this is done using modern technologues involving Big Data and Spark. This course will take you through how you can deal with huge volumes of data and perform analytics and machine learning on it.

Participants should know the basics of Python programming in order to take up this session.

It's time to level up!

What you'll learn

Big Data has evolved over the last few years and has become mainstream for many big organizations. This course covers the fundaments of Big Data using Spark. Spark is a “fast cluster computing framework” for Big Data Processing. It lets you runs programs and operations up-to 100x faster in memory. You will be exposed to various libraries in PySpark for Data Processing and Machine
Learning. You will have a chance to work with various datasets through guided hands on training. At the end of this course, you will gain an in-depth understanding of PySpark and its application to general Big Data analysis. This workshop will be conducted using a tool called Databricks which is used today to run big data loads on spark.

About this course

Big Data is the forefront of digital transformation today. This course will help you take the next steps towards learning and understanding the technologies, algorithms and programming constructs involved in big data analysis and machine learning on big data.

Learn In No Time

Course Duration

Have a look
Course Plan

The participants will be introduced to the basics of Big Data as well as the various concepts and different frameworks for processing Big Data.

The participants will be exposed to the basics of spark with python, playing with data and functional programming.

The participants will be exposed to the backbone of Spark, resilient distributed dataset – RDDs. We will learn how RDDs are created, executed and various transformations and actions (map, reduce, collect among others) using RDDs.

Structured data processing is important when profiling and understanding data. Spark provides an elegant method for the above using Spark SQL. The participants will be exposed to dataframes, the distributed SQL query engine, various operations using Spark SQL and data visualization using PySpark

Participants will be exposed to various machine learning methods and algorithms and will work will different datasets to perform regression, clustering, classification among other such operations. Participants will also understand and learn to go through the process of model training and evaluation.

Participants will be exposed to real time streaming data and how spark can be leveraged to deal with real time data and perform real time analytics.

Participants will have the opportunity to work with various datasets and practice all the operations and techniques learnt over various modules. Trainers and Assistant Trainers will help the participants through their exercises and practice session. This will give the participants an opportunity to get a thorough grasp of programming using PySpark.

Looking for something else?

Excel VBA for Beginners

This is a beginners course which takes you through how Excel combined with VBA can prove to be a strong combination for automating repetitive tasks very easily.

Microsoft Azure for Beginners

Cloud has become the go-to place today for storage, computation, security and many other services. The cloud is disrupting the analytics industry and taking over processes and applications by storm! This course helps you get that first glimpse of the cloud and its services.

Join us & launch your career in data science

It's time to upskill for the Industry 4.0