+65 8133 0642 / +65 9138 9813 info@xaltius.tech

Big Data Analytics using Spark

Industries such as Banks, Finance, Logistics among others process and analyse huge amounts of data every day. It is important to know how this is done using modern technologues involving Big Data and Spark. This course will take you through how you can deal with huge volumes of data and perform analytics and machine learning on it.

What You’ll Learn

Big Data has evolved over the last few years and has become mainstream for many big organizations. This course covers the fundaments of Big Data using Spark. Spark is a “fast cluster computing framework” for Big Data Processing. It lets you runs programs and operations up-to 100x faster in memory. You will be exposed to various libraries in PySpark for Data Processing and Machine
Learning. You will have a chance to work with various datasets through guided hands on training. At the end of this course, you will gain an in- depth understanding of PySpark and its application to general Big Data analysis. This workshop will be conducted using a tool called Databricks which is used today to run big data loads on spark.

Please click the event link in the timetable for the respective price.

Course Plan

MODULE 1: INTRODUCTION TO BIG DATA AND DATABRICKS

The participants will be introduced to the basics of Big Data as well as the various concepts and different frameworks for processing Big Data.

MODULE 2: INTRODUCTION TO BIG DATA ANALYSIS USING PYSPARK

The participants will be exposed to the basics of spark with python, playing with data and functional programming.

MODULE 3: PROGRAMMING IN PYSPARK

The participants will be exposed to the backbone of Spark, resilient distributed dataset – RDDs. We will learn how RDDs are created, executed and various transformations and actions (map, reduce, collect among others) using RDDs.

MODULE 4: PYSPARK SQL & DATAFRAMES

Structured data processing is important when profiling and understanding data. Spark provides an elegant method for the above using Spark SQL. The participants will be exposed to dataframes, the distributed SQL query engine, various operations using Spark SQL and data visualization using PySpark

MODULE 5: MACHINE LEARNING WITH PYSPARK

Participants will be exposed to various machine learning methods and algorithms and will work will different datasets to perform regression, clustering, classification among other such operations. Participants will also understand and learn to go through the process of model training and evaluation.

MODULE 6: STREAMING ANALYTICS

Participants will be exposed to real time streaming data and how spark can be leveraged to deal with real time data and perform real time analytics.

MODULE 7: PRACTICE & EXTRA HANDS-ON WORKSHOPS

Participants will have the opportunity to work with various datasets and practice all the operations and techniques learnt over various modules. Trainers and Assistant Trainers will help the participants through their exercises and practice session. This will give the participants an opportunity to get a thorough grasp of programming using PySpark.

Timetable

Please check back for available dates or subscribe to stay updated!

Prerequisites

Participants should know the basics of programming in order to take up this session.

About the Course

Big Data is the forefront of digital transformation today. This course will help you take the next steps towards learning and understanding the technologies, algorithms and programming constructs involved in big data analysis and machine learning on big data.

Relevant Courses

Machine Learning with Python

This course will take you through the fundamentals of machine learning, the different algorithms involved in supervised and unsupervised learning, model evaluation and model optimization techniques.

read more

Data Analytics using Python

Understanding and analyzing data is one of the key skills required in the industry today. This course is completely focused on the various aspects of data analytics using Python. Participants will be taught to use and taken through the key libraries for data ingestion and manipulation, exploratory data analysis, model building and data visualization as well as the basic statistics knowledge required to understand the concepts in the latter courses.

read more

Python for Data Science

The Basics of Python is an introductory and beginners’ course to learning and understanding the fundamentals of coding in Python, a powerful, modern, industry demanding language. Participants will learn to write programs, perform various operations, manipulate and visualize data. Participants completing this course will be prepared to take up the advanced modules.

read more