LIVE – Fullstack Data Engineering – Azure | Databricks | AWS
About Course
Training for Complete Data Engineering course with Big Data Hadoop and Spark. The course focuses on various aspects of Big Data frameworks like Hadoop and Spark. We will be learning about many tools in the Hadoop ecosystem such as hive, sqoop, flume, spark, and Kafka.
Course Content:
- Azure Data Engineering
- AWS Data Engineering
- DataBricks Data Engineering
- 6 End to End Projects
- SparkStreaming
- Python Programming
- Apache Hadoop
- Apache Hive
- PySpark 500 Hands On Exercises
- SparkSQL
- Kafka
- NoSQL
What Will You Learn?
- Job interview preparation
- Covers most of the contents for "Databricks Certified Developer For Apache Spark 3.0" Certification
- In depth understanding of Hadoop Ecosystem components.
- Resume support.
- Enhanced understanding with Hands on exercises.
Course Content
Starter Kit
-
Support and Contact Guide
-
Steps To Install PST Application
-
Course Materials Access Guide
-
Study Roadmap Access Guide
M1 – Course Introduction
-
Roles and Responsibilities of a Data Engineer | Demo 1
01:10:20 -
Introduction to Big data | Demo 2
01:16:10 -
Hands On | Distributed Storage | Distributed Processing | Introduction | Demo 3
01:16:37 -
Case Study to understand Data Engineering Domain
44:14 -
Your First 5 Years as a Data Engineer
16:13
M2 – Hadoop Ecosystem (HISTORY LESSONS)
To understand how data engineering practices have evolved, you may review the following legacy sessions.
For a modern, industry-aligned learning path, I recommend the sequence below:
SQL → Python → PySpark → PySpark Projects
Before beginning this path, I also suggest covering Hadoop fundamentals up to the YARN architecture, as it provides helpful context for distributed processing.
This sequence will give you a strong foundation for the upcoming modules.
The legacy sessions are included for those working with older systems who may still find them useful.
-
WARNING IMPORTANT !!!!!!!!!
03:07 -
Hadoop Session 1 – Introduction to Hadoop
01:06:10 -
Hadoop Session 2 – Hadoop Components and Daemons
01:30:57 -
Hadoop Session 3 – File Blocks | Replication | Rack Awareness
01:12:31 -
Hadoop Session 4 – Rack Awareness | HA and Federation
01:19:56 -
Hadoop Session 5 – YARN Architecture – I
01:26:57 -
Hadoop Session 6 – YARN Architecture Doubts and Seminars
01:14:47 -
Hadoop Session 7 – Necessary Setups and Discussion
01:30:26 -
Hadoop Session 8 – MR workflow [Deprecated]
01:27:24 -
Hadoop Session 9 – YARN and MR QnA and Revision | Safe Mode | Load Balancer
01:07:57 -
Hadoop Session 10 – MR | File Blocks vs Input Splits [Deprecated]
02:13:29 -
Hadoop Session 11 – MR Workflow Revision | WordCount Hands On [Deprecated]
02:14:08 -
Hadoop Session 12 – Combiner and Partitioner [Deprecated]
02:39:59 -
Hadoop Session 13 – Reduce side join 1 [Deprecated]
01:32:32 -
Hadoop Session 14 – Reduce side join 2 [Deprecated]
01:32:10 -
Hadoop Session 15 – Map Side Join [Deprecated]
01:36:28 -
[Optional] setting_up_single_node_hadoop_cluster
00:00 -
AWS EMR Session 1 – EC2 introduction
23:37 -
AWS EMR Session 2 – IAM Roles
24:27 -
AWS EMR Session 3 – Starting EMR Cluster | Connecting to EMR from your System
58:26 -
AWS EMR Session 4 – Accessing EMR web Uis
58:26 -
Hive Session 1 – Hive Introduction and Architecture
01:25:03 -
Hive Session 2 – Hive Basic Commands
01:27:23 -
Hive Session 3 – Internal vs External Tables
58:35 -
Hive Session 4 – Design level optimizations (Partitioning)
01:06:31 -
Hive Session 5 – Design level optimizations (Bucketing) | Logical Joins
01:15:13 -
Hive Session 6 – Bucketing Scenarios | Hive SerDe
01:06:59 -
Hive Session 7 – SerDe Correction
02:55 -
Hive Session 8 – Join strategies in Hive | MR
01:13:03 -
Hive Session 9 – Hive Project
22:29 -
Hive Session 10 – Join Optimizations
01:34:31 -
Hive Session 11 – Hive Transactional Tables
01:19:36 -
Hive Session 12 – Hive Transactional Tables Materialized
01:15:29 -
Hive Session 13 – CBO | Vectorization | Resource level optimization | Materialized Views
01:44:59 -
Hive Session 14 – Vectorization in Hive
12:48 -
Hive Session 15 – MSCK repair
06:09 -
Hive Session 16 – UDF, UDAF , UDTF
01:30:37 -
Hive Exercise Documents and Data Files
00:00 -
Hive Notes
00:00 -
Sqoop Session 1 – Sqoop Introduction
01:19:28 -
Sqoop Session 2 – Sqoop Incremental Import
01:21:12 -
Flume Session 1 – Flume Introduction
01:39:31 -
Flume Session 2 – Flume Configuration
02:11:07
M3 – Python for Pyspark
-
Setup For Python Exercises
17:34 -
Python Session 1 – Multiple Ways to Print ‘Hello World’ in Python
36:21 -
Python Session 2 – Understanding Variables in Python
51:59 -
Python Session 3 – Input, Escape Sequences & Value Passing in Python
35:55 -
Python Session 4 – Introduction to Conditional Statements in Python Learn if-else with Examples
35:06 -
Python Session 5 – if else Explained with Hands On Exercises
22:05 -
Python Session 6 – Learn Python Lists, List Operations and Slicing
48:08 -
Python Session 7 – List Operations Hands On
26:19 -
Python Session 8 – Python Tuples Explained | Tuple vs List and Tuple Operations
32:55 -
Python Session 9 – Python Dictionaries Explained | Create, Access, Modify and Delete Data
28:57 -
Python Session 10 – Sets Explained – Create, Add, Remove, and Perform Set Operations
32:20 -
Python Session 11 – Loops in Python Explained | for loop, while loop, nested loops
41:15 -
Python Session 12 – Break vs Continue in Python Loops | Real-Time Examples and Differences
33:57 -
Python Session 13 – Build Rock Paper Scissors in Python using Loops Offline Classroom Tutorial
16:12 -
Python Session 14
55:57 -
Python Session 15
58:26 -
Python Session 16
01:15:42 -
Python Session 17
01:03:01 -
Python Session 18
43:51 -
Python Session 19
01:07:31 -
Python Session 20
49:06 -
Session 21 | Class Methods vs Instance Methods vs Static Methods
01:04:13 -
Session 22 | Dunder methods | Operator overloading
01:04:13 -
Session 23 | Property Decorator
54:43 -
Session 24 | Encapsulation and Private attributes
54:39 -
Session 25 | Abstraction and Spark Introduction
01:47:44
M5 – Linux Mastery
-
Session 01 | Linux (24-06-2025)
01:41:46 -
Session 02 | Linux (25-06-2025)
01:04:30 -
Session 03 | Linux (26-06-2025)
01:20:36 -
Session 04 | Linux(27-06-2025)
01:30:48 -
Session 05 | Linux(28-06-2025)
01:51:05 -
Session 06 | Linux (29-06-2025)
01:50:19 -
Session 07 | Linux (01-07-2025)
01:29:16 -
Session 08| Linux (02-07-2025)
01:24:37 -
Session 9 | Linux (04-07-2025)
14:10 -
Session 10 | Linux 02 (04-07-2025)
01:36:19 -
Session 11 | Linux (05-07-2025)
01:53:03 -
Session 12| Linux (06-07-2025)
02:01:48 -
Session 13 | Linux (07-07-2025)
01:50:02 -
Session 14 | Linux (08-07-2025)
02:04:15 -
Session 15 | Linux (10-07-2025)
01:34:44 -
Session 16 | Linux ( 11-07-2025)
01:07:45 -
Session 17 | Linux (29-07-2025)
01:34:24 -
Session 18 | Linux (30-07-2025)
01:22:27 -
Session 19 | Linux (31-07-2025)
01:29:07
M4 – PySpark Essentials For Data Engineering
-
[Overview] Spark Session 1 – Introduction
16:37 -
[ClassRec] Spark Session 1 – Introduction
37:34 -
[Overview] Spark Session 2 – Spark Cluster vs Application Architecture
12:07 -
[ClassRec] Spark Session 2 – Spark Cluster vs Application Architecture
47:42 -
[Overview] Spark Session 3 – RDD Terminologies and Features
47:20 -
[ClassRec] Spark Session 3 – RDD Terminologies and Features
52:01 -
[Overview] Spark Session 4 – App vs Job vs Stage vs Task
33:41 -
[ClassRec] Spark Session 4 – App vs Job vs Stage vs Task
49:23 -
[Overview] Spark Session 5 – Spark Cluster vs Client Mode
12:47 -
[ClassRec] Spark Session 5 – Spark Cluster vs Client Mode
35:05 -
[Overview] Spark Session 6 – Spark Architecture
39:24 -
[ClassRec] Spark Session 6 – Spark Architecture
48:59 -
[Overview] Spark Session 7 – Spark Distrubuted Shared Variables
43:39 -
[ClassRec] Spark Session 7 – Spark Distrubuted Shared Variables
36:21 -
[Overview] Spark Session 8 – SparkSQL Introduction – RDD vs DF vs DS
40:58 -
[ClassRec] Spark Session 8 – SparkSQL Introduction – RDD vs DF vs DS
45:47 -
[Overview] Spark Session 9 – Spark Catalyst Optimizer
26:01 -
[ClassRec] Spark Session 9 – Spark Catalyst Optimizer
10:06 -
[Overview] Spark Session 10 – SparkContext vs SpakSession
39:32 -
[ClassRec] Spark Session 10 – SparkContext vs SpakSession
46:56 -
[Installation] Spark 3.5 Installation
20:35 -
[Overview] Spark Session 11 – Setup For Exercises
12:55 -
[ClassRec] Spark Session 12 : Ways to Create RDDs
16:34 -
[ClassRec] Spark Session 13 – RDD Creations Practice and Good Practices
16:34 -
[Overview] Spark Session 14 – map, mapPartitions, mapPartitionsWithIndex, glom
35:56 -
[ClassRec] Spark Session 14 – map, mapPartitions, mapPartitionsWithIndex, glom
27:08 -
[ClassRec] Spark Session 15 – map vs flatMap
13:11 -
[ClassRec] Spark Session 16 – groupByKey vs reduceByKey
40:50 -
[Overview] Spark Session 17 – Creating DFs from CSV files
38:11 -
[ClassRec] Spark Session 17 – Creating DFs from CSV files
27:18 -
[Overview] Spark Session 18 – Creating DF From JSON and XML Files
07:08 -
[ClassRec] Spark Session 18 – Creating DF from JSON, nested JSON, MultiChar and Custom Delimiter
22:15 -
[Overview] Spark Session 19 – Creating DFs from Binary files
11:42 -
[ClassRec] Spark Session 19 – Creating DFs from Binary files
33:36 -
[Overview] Spark Session 20 – Referring Columns, select, selectExpr, filter
15:50 -
[ClassRec] Spark Session 20 – Referring Columns, select, selectExpr, filter
17:28 -
[ClassRec] Spark Session 21 – sort / orderBy
17:38 -
[Overview] Spark Session 22 – groupBy and Aggregations
21:07 -
[ClassRec] Spark Session 22 – groupBy and Aggregations
19:23 -
[Overview] Spark Session 23 – Joins (inner, outer, left, right, left semi, left anti, cross, self)
18:58 -
[ClassRec] Spark Session 23 – Joins (inner, outer, left, right, left semi, left anti, cross, self)
39:20 -
[ClassRec] Spark Session 24 – Joins Revision
20:15 -
[Overview] Spark Session 25 – Window Functions | Ranking Functions
27:34 -
[ClassRec] Spark Session 25 – Window Functions | Ranking Functions
32:13 -
[Overview] Spark Session 26 – Window Analytical and Aggregate Functions
29:07 -
[ClassRec] Spark Session 26 – Window Aggregate Functions
21:56 -
[ClassRec] Spark Session 27 – Window Analytical Functions
17:10 -
[Overview] Spark Session 28 – Dealing With NULL Values
16:17 -
[Overview] Spark Session 29 – Dealing With Duplicate Records
05:18 -
[Overview] Spark Session 30 – Pivot and UnPivot
32:15 -
[Overview] Spark Session 31 – UDFs in PySpark
23:39 -
[ClassRec] Spark Session 31 – UDFs in Spark
26:28
M5 – Spark Advanced – Optimization Techniques – Industry Scenarios
-
[Overview] Spark Session 32 – Cache vs Persist
55:26 -
[ClassRec] Spark Session 32 – Cache vs Persist
22:32 -
[ClassRec] Spark Session 32 – Cache vs Persist S2
01:05:28 -
[Overview] Spark Session 33 – Executom Memory Architecture
32:36 -
[ClassRec] Spark Session 33 – Executor Memory Architecture S1
55:16 -
[ClassRec] Spark Session 33 – Executor Memory Architecture S2
01:12:00 -
[Overview] Spark Session 34 – Adaptive Query Execution
45:30 -
[ClassRec] Spark Session 34 – Adaptive Query Execution
01:10:32 -
[Overview] Spark Session 35 – Join Strategies in PySpark
52:34 -
[ClassRec] Spark Session 35 – Join Strategies – Broadcast Join
50:52 -
[ClassRec] Spark Session 35 – Join Strategies – Shuffle Hash Join
01:02:44 -
[ClassRec] Spark Session 35 – Join Strategies – Sort Merge Join and More
37:27 -
[ClassRec] Spark Session 36 – Resource Calculations For Spark Applications
01:13:12 -
[ClassRec] Spark Session 37 – Dynamic Resource Allocation
30:59 -
[ClassRec] Spark Session 38 – Garbage Collection Tuning
01:04:26 -
[ClassRec] Spark Session 39 – Handling Data Skew S1
44:36 -
[Overview] Spark Session 40 – Controlling Prallelism For Spark Applications
42:53 -
[ClassRec] Spark Session 40 – Controlling Parallelism For Spark Applications
56:38 -
[ClassRec] Spark Session 41 – Handling Data Skew S2
40:07 -
[ClassRec] Spark Session 42- Design Level Optimizations
45:08 -
[ClassRec] Spark Session 43 – Out Of Memory Error – Speculative Execution – DPP
49:20
M6 – Full Stack Data Engineering using Azure Databricks | Part 1
-
DBX_001_2025-12-15_082818_S01_workspace-resource-groups-managed-resource-group.mp4
48:59 -
DBX_002_2025-12-16_083118_S02_networking-fundamentals-1.mp4
01:11:09 -
DBX_003_2025-12-17_084919_S03_networking-2.mp4
55:44 -
DBX_004_2025-12-19_083447_S03_deploy-databricks.mp4
34:51 -
DBX_006_2025-12-20_084710_S05_workspace-basics.mp4
48:59 -
DBX_007_2025-12-21_085139_S06_dbutils-storage-local-vs-dbfs.mp4
54:52 -
DBX_008_2025-12-22_084744_S07_hands-on-formalities.mp4
01:00:07 -
DBX_009_2025-12-23_084954_S08_(FLOP)_blob-vs-adls-dbfs-vs-volumes.mp4
57:25 -
DBX_010_2025-12-25_084527_S09_azure-storage-fundamentals.mp4
01:04:29 -
DBX_011_2025-12-26_084003_S10_storage-options-hands-on.mp4
54:53 -
DBX_012_2025-12-27_084545_S11_unity-catalog-overview.mp4
57:31 -
DBX_013_2025-12-29_083503_S12_sp-accessing-adls-from-databricks.mp4
01:01:53 -
DBX_014_2025-12-30_081837_S13_mi-accessing-adls-from-databricks.mp4
40:29 -
DBX_015_2025-12-31_080348_S14_ak-sp-mi-process-adls-access.mp4
54:10 -
DBX_016_2026-01-01_082113_S15_project-incremental-ingestion-pipeline-1.mp4
01:04:52 -
DBX_017_2026-01-02_082355_S15_project-incremental-ingestion-pipeline-2.mp4
01:11:03 -
DBX_018_2026-01-03_082101_S15_project-incremental-ingestion-pipeline-3.mp4
01:09:50 -
DBX_019_2026-01-05_081929_S16_project-scd-type-1-implement.mp4
30:32 -
DBX_020_2026-01-06_081429_S17_project-scd-type-2-implementation.mp4
01:13:04 -
DBX_021_2026-01-09_082106_S18_unity-catalog-purpose.mp4
58:20 -
DBX_022_2026-01-10_082743_S19_data-cleaning-standardization-terminologies.mp4
01:10:27 -
DBX_023_2026-01-12_082017_S20_uc-managed-vs-non-managed-tables.mp4
59:55 -
DBX_024_2026-01-13_082148_S21_delta-tables-overview.mp4
01:27:41 -
DBX_025_2026-01-14_083615_S22_uc-essentials-catalog-schema-table.mp4
47:32 -
DBX_026_2026-01-16_083754_S23_s1-managed-and-non-managed-tables.mp4
01:06:02 -
DBX_027_2026-01-17_000000_PRJ_databricks-azure-project_B12_S13.mp4
01:01:08 -
DBX_028_2026-01-19_082047_S25_delta-tables-overview.mp4
52:45 -
DBX_029_2026-01-20_083429_S26_delta-table-anatomy-1.mp4
01:19:54 -
DBX_030_2026-01-21_084023_S27_delta-tables-mvcc-si-occ.mp4
43:15 -
DBX_031_2026-01-22_083850_S28_optimize-and-vacuum_01.mp4
01:09:16 -
DBX_033_2026-01-23_083143_S29_time-travel-and-cloning.mp4
01:04:54 -
DBX_034_2026-01-26_125446_S30_partitioning.mp4
01:02:06
Industry Level PySpark | Scenarios and Databricks Certification Practice
-
Session : 1
01:07:53 -
Session : 2
01:02:28 -
Session : 3
01:16:19 -
Session : 4
51:23 -
Session : 5
47:50 -
Session : 6
35:49 -
Session : 7
40:39 -
Session : 8
29:37 -
Practice Exercises and Solutions Document
00:00
M6 – Kafka Essentials For Data Engineering
-
Kafka Installation
18:04 -
Session 1
48:44 -
Session 2
39:55 -
Session 3
15:00 -
Session 4
19:09 -
Session 5
01:04:18 -
flume to kafka
36:44
M7 – Industry Level PySpark | Spark Streaming
-
Spark streaming – I
47:14 -
Spark streaming – II
01:48:41 -
Spark Streaming – III
01:49:49
M8 – Data Modelling Essentials
-
Sessions will be updated as soon as they’re done in the Live Batch
M9 – FullStack Data Engineering Using DataBricks | Part 2
-
Sessions Will Be Added As Soon As They’re Covered in Live batch
M10 – Azure Data Engineering Complete Course
-
Sessions will be updated as soon as they’re covered in Live Class
M11 – AWS Data Engineering Complete Course
-
Big Data With AWS || Demo Session
36:43 -
AWS IAM User, Group and Policies
01:26:35 -
Big Data With AWS | IAM Roles
01:23:47 -
AWS Cloud Infrastructure
31:39 -
Big Data With AWS | S3 Session 1
01:40:25 -
S3 Session 2
01:00:49 -
S3 Session 3
01:00:45 -
AWS glue | Crawlers And Jobs
01:23:58 -
Glue Scenarios | Glue Workflows
01:24:01 -
Glue Scenarios
01:15:03 -
EMR basics EC2 introduction
23:36 -
EMR_basics_IAM_Role
24:26 -
Starting EMR cluster | Connecting to EMR Cluster
58:25 -
AWS EMR | Starting and Deploying Spark Application on EMR
58:04 -
AWS EMR | Cluster Mode Deployment | Accessing Web UIs | Steps Introduction
58:48 -
BDA | Deploying Spark Application in Cluster Mode on EMR
11:15 -
Deploying Spark Application using Steps on AWS EMR
01:02:03 -
Athena Basics
01:23:48 -
Athena on Command Line
59:43 -
Using Athena through python code
41:27 -
Redshift And Data Warehousing Introduction
41:34 -
Redshift Clusters | Snapshots | S3 Copy
59:16 -
Creating redshift cluster and making it publicly accessible
13:58 -
Redshift connect using sql workbench and python script
38:55 -
dist keys and sort keys in redshift
24:30 -
DIST Keys hands on
30:38 -
Redshift Federated Queries
40:58 -
What is Streaming Data | Streaming Data Terminologies
23:24 -
Kinesis Data Streams | Kinesis Architecture and terminologies
40:05 -
Streaming Data using Console Producer | Python Producer and Python Consumer
39:48
M12 – MongoDB NoSQL For Data Engineering
-
Introduction to NoSQL | Use Cases | Types
58:09 -
MongoDB Essential Elements | CRUD Operations
37:06 -
MongoDB Indexing
01:27:02
M13 – Complete Airflow For data Engineering
-
SL | Airflow | Session 1
-
SL | Airflow | Session 2
-
SL | Airflow | Session 3
-
Coming Soon
M14 – DevOps in DE | Version Control System Essentials
M15 – CI / CD for data Engineering Pipelines
Course End Projects | Live Projects
-
Sessions will be added as soon as they’re covered in Live Batch
Course Material
-
tg_vm Setup
20:56 -
Presentation Slides 22 july
00:00 -
Presentation Slides 7 Jan
00:00
Student Ratings & Reviews
No Review Yet