Self Paced - DataBricks and PySpark Fullstack Data Engineering

About Course

150+ Hours • Hands-on • Project-Driven • Certification-Oriented Training

Mode of Delivery : Self Paced

Module 1: Fundamentals (Foundation for All Data Engineers)

Build a rock-solid base in programming, databases, and PySpark essentials.

Topics Covered

Python for Data Engineering
Variables, data types, loops, functions, file handling, modules, error handling
SQL Fundamentals to Advanced
Joins, subqueries, aggregations, CTEs, window functions, performance tuning
PySpark Comprehensive Tour
DataFrames, RDDs, SQL, transformations, actions, UDFs, window functions
Spark Introduction & Core Architecture
Driver, executors, cluster manager, DAG, jobs, stages, tasks

Hands-On Practice

300+ Practical Exercises
Realistic datasets for industry-level training

Spark Optimization Techniques

Projects (3 Real-Time Spark Projects)

Module 2: Databricks Engineering (Lakehouse Mastery)

Master the Databricks platform with hands-on, production-level experience.

Topics Covered

Platform Fundamentals

Databricks workspace, cluster types, compute, notebooks
Lakehouse architecture & Delta Lake internals

Data Ingestion & Data Formats

File-based ingestion (CSV, JSON, Parquet, Avro, ORC)
Streaming ingestion (Kafka, Auto Loader)
Handling structured & unstructured data
Delta Lake fundamentals, schema evolution & time travel

Data Processing & Transformation

Spark SQL & DataFrame API
UDFs, complex transformations, window functions
Joins, partitioning strategies, performance basics

Scheduling & Orchestration

Databricks Jobs & Workflows
Task orchestration, chaining dependencies
Monitoring & job failure recovery

Governance & Quality (Enterprise Grade)

Unity Catalog
Role-based access control
Table management, versioning, quality checks

Advanced Performance & Cost Optimization

Cluster sizing & autoscaling
Caching & indexing
Partitioning, Z-Order, OPTIMIZE
Streaming optimization best practices

Security & Compliance

Data governance & auditing
Secure data sharing & federation
Access policies & compliance best practices

Monitoring, Deployment, CI/CD

Databricks CLI & REST API
Git integration, dev → prod workflows
Alerts, logging, observability
Asset bundles & deployment automation

Projects (3 Real-Time Databricks Projects)

Certifications Covered

Databricks Certified Spark Developer
Databricks Certified Data Engineer

Duration: ~150 Hours

A perfectly balanced, intensive 150-hour training covering
Fundamentals → PySpark Mastery → Databricks Expertise → Certifications → Projects.

What You Will Achieve

- Become a full-stack data engineer capable of working with PySpark, Databricks, SQL, and Lakehouse architecture
- Build 3 end-to-end projects
- Gain real-world skills in optimization, governance, and pipeline deployment
- Prepare confidently for Databricks certifications
- Become industry-ready for modern data engineering rol

Course Content

M1 – Understanding The Data Engineering Domain and The Challenges

Case Study to understand Data Engineering Domain

44:14
Your First 5 Years as a Data Engineer

16:13

M2 – History Lessons

History Session 1 – Introduction to Big data

01:16:10
History Session 2 – Distributed Storage | Distributed Processing | Introduction

01:16:37
History Session 3 – Introduction to Hadoop

01:06:10
History Session 4 – Hadoop Components and Daemons

01:30:57
History Session 5 – File Blocks | Replication | Rack Awareness

01:12:31
History Session 6 – Rack Awareness | HA and Federation

01:19:56
History Session 7 – YARN Architecture

01:26:57
History Session 8 – YARN Architecture Doubts and Seminars

01:14:47
History Session 9 – YARN and MR QnA and Revision | Safe Mode | Load Balancer

01:07:57

M3 – Python For PySpark

Setup For Python Exercises

17:34
Python Session 1 – Multiple Ways to Print ‘Hello World’ in Python

36:21
Python Session 2 – Understanding Variables in Python

51:59
Python Session 3 – Input, Escape Sequences & Value Passing in Python

32:55
Python Session 4 – Introduction to Conditional Statements in Python

35:06
Python Session 5 – Python if else Explained with Hands On Exercises

22:05
Python Session 6 – Learn Python Lists, List Operations and Slicing

48:08
Python Session 7 – Python List Operations Hands On

26:19
Python Session 8 – Python Tuples Explained

32:55
Python Session 9 – Python Dictionaries Explained

28:57
Python Session 10 – Sets Explained | Create, Add, Remove

32:20
Python Session 11 – Loops in Python Explained | for loop, while loop, nested loops

41:15
Python Session 12 – Break vs Continue in Python Loops | Real-Time Examples

33:57
Python Session 13 – Build Rock Paper Scissors in Python using Loops

16:12
Python Session 14

55:57
Python Session 15

58:26
Python Session 16

01:15:42
Python Session 17

01:03:01
Python Session 18

43:51
Python Session 19

01:07:31
Python Session 20

49:06

M4 – PySpark Essentials For Data Engineering | Conceptual + Hands On 300 Exercises

[Overview] Spark Session 1 – Introduction

16:37
[ClassRec] Spark Session 1 – Introduction

37:34
[Overview] Spark Session 2 – Spark Cluster vs Application Architecture

12:07
[ClassRec] Spark Session 2 – Spark Cluster vs Application Architecture

47:42
[Overview] Spark Session 3 – RDD Terminologies and Features

47:20
[ClassRec] Spark Session 3 – RDD Terminologies and Features

52:01
[Overview] Spark Session 4 – App vs Job vs Stage vs Task

33:41
[ClassRec] Spark Session 4 – App vs Job vs Stage vs Task

49:23
[Overview] Spark Session 5 – Spark Cluster vs Client Mode

12:47
[ClassRec] Spark Session 5 – Spark Cluster vs Client Mode

35:05
[Overview] Spark Session 6 – Spark Architecture

39:24
[ClassRec] Spark Session 6 – Spark Architecture

48:59
[Overview] Spark Session 7 – Spark Distrubuted Shared Variables

43:39
[ClassRec] Spark Session 7 – Spark Distrubuted Shared Variables

36:21
[Overview] Spark Session 8 – SparkSQL Introduction – RDD vs DF vs DS

40:58
[ClassRec] Spark Session 8 – SparkSQL Introduction – RDD vs DF vs DS

45:47
[Overview] Spark Session 9 – Spark Catalyst Optimizer

26:01
[ClassRec] Spark Session 9 – Spark Catalyst Optimizer

10:06
[Overview] Spark Session 10 – SparkContext vs SpakSession

39:32
[ClassRec] Spark Session 10 – SparkContext vs SpakSession

46:56
[Installation] Spark 3.5 Installation

20:35
[Overview] Spark Session 11 – Setup For Exercises

12:55
[ClassRec] Spark Session 12 : Ways to Create RDDs

16:34
[ClassRec] Spark Session 13 – RDD Creations Practice and Good Practices

16:34
[Overview] Spark Session 14 – map, mapPartitions, mapPartitionsWithIndex, glom

35:56
[ClassRec] Spark Session 14 – map, mapPartitions, mapPartitionsWithIndex, glom

27:08
[ClassRec] Spark Session 15 – map vs flatMap

13:11
[ClassRec] Spark Session 16 – groupByKey vs reduceByKey

40:50
[Overview] Spark Session 17 – Creating DFs from CSV files

38:11
[ClassRec] Spark Session 17 – Creating DFs from CSV files

27:18
[Overview] Spark Session 18 – Creating DF From JSON and XML Files

07:08
[Overview] Spark Session 19 – Creating DFs from Binary files

11:42
[ClassRec] Spark Session 19 – Creating DFs from Binary files

33:36
[Overview] Spark Session 20 – Referring Columns, select, selectExpr, filter

15:50
[ClassRec] Spark Session 20 – Referring Columns, select, selectExpr, filter

17:28
[ClassRec] Spark Session 21 – sort / orderBy

17:38
[Overview] Spark Session 22 – groupBy and Aggregations

21:07
[ClassRec] Spark Session 22 – groupBy and Aggregations

19:23
[Overview] Spark Session 23 – Joins (inner, outer, left, right, left semi, left anti, cross, self)

18:58
[ClassRec] Spark Session 23 – Joins (inner, outer, left, right, left semi, left anti, cross, self)

39:20
[ClassRec] Spark Session 24 – Joins Revision

20:15
[Overview] Spark Session 25 – Window Functions | Ranking Functions

27:34
[ClassRec] Spark Session 25 – Window Functions | Ranking Functions

32:13
[Overview] Spark Session 26 – Window Analytical and Aggregate Functions

29:07
[ClassRec] Spark Session 26 – Window Aggregate Functions

21:56
[ClassRec] Spark Session 27 – Window Analytical Functions

17:10
[Overview] Spark Session 28 – Dealing With NULL Values

16:17
[Overview] Spark Session 29 – Dealing With Duplicate Records

05:18
[Overview] Spark Session 30 – Pivot and UnPivot

32:15
[Overview] Spark Session 31 – UDFs in PySpark

23:39
[ClassRec] Spark Session 31 – UDFs in PySpark

26:28

M5 – Spark Advanced – Optimization Techniques – Industry Scenarios

[Overview] Spark Session 32 – Cache vs Persist

55:26
[ClassRec] Spark Session 32 – Cache vs Persist

22:32
[ClassRec] Spark Session 32 – Cache vs Persist S2

01:05:28
[Overview] Spark Session 33 – Executom Memory Architecture

32:36
[ClassRec] Spark Session 33 – Executor Memory Architecture S1

55:16
[ClassRec] Spark Session 33 – Executor Memory Architecture S2

01:12:00
[Overview] Spark Session 34 – Adaptive Query Execution

45:30
[ClassRec] Spark Session 34 – Adaptive Query Execution

01:10:32
[Overview] Spark Session 35 – Join Strategies in PySpark

52:34
[ClassRec] Spark Session 35 – Join Strategies – Broadcast Join

50:52
[ClassRec] Spark Session 35 – Join Strategies – Shuffle Hash Join

01:02:44
[ClassRec] Spark Session 35 – Join Strategies – Sort Merge Join and More

37:27
[ClassRec] Spark Session 36 – Resource Calculations For Spark Applications

01:13:12
[ClassRec] Spark Session 37 – Dynamic Resource Allocation

30:59
[ClassRec] Spark Session 38 – Garbage Collection Tuning

01:04:26
[ClassRec] Spark Session 39 – Handling Data Skew S1

44:36
[Overview] Spark Session 40 – Controlling Prallelism For Spark Applications

42:53
[ClassRec] Spark Session 40 – Controlling Parallelism For Spark Applications

56:38
[ClassRec] Spark Session 41 – Handling Data Skew S2

40:07
[ClassRec] Spark Session 42- Design Level Optimizations

45:08
[ClassRec] Spark Session 43 – Out Of Memory Error – Speculative Execution – DPP

49:20

M6 – Full Stack Data Engineering using Databricks | Part 1

DBX_001_2025-12-15_082818_S01_workspace-resource-groups-managed-resource-group.mp4

48:59
DBX_002_2025-12-16_083118_S02_networking-fundamentals-1.mp4

01:11:09
DBX_003_2025-12-17_084919_S03_networking-fundamentals-2.mp4

55:44
DBX_004_2025-12-19_083447_S03_deploy-databricks.mp4

34:51
DBX_006_2025-12-20_084710_S05_workspace-basics.mp4

48:59
DBX_007_2025-12-21_085139_S06_dbutils-storage-local-vs-dbfs.mp4

54:52
DBX_008_2025-12-22_084744_S07_hands-on-formalities.mp4

01:00:07
DBX_009_2025-12-23_084954_S08_(FLOP)_blob-vs-adls-dbfs-vs-volumes.mp4

57:25
DBX_010_2025-12-25_084527_S09_azure-storage-fundamentals.mp4

01:04:49
DBX_011_2025-12-26_084003_S10_storage-options-hands-on.mp4

54:53
DBX_012_2025-12-27_084545_S11_unity-catalog-overview.mp4

57:31
DBX_013_2025-12-29_083503_S12_sp-accessing-adls-from-databricks.mp4

01:01:53
DBX_014_2025-12-30_081837_S13_mi-accessing-adls-from-databricks.mp4

40:29
DBX_015_2025-12-31_080348_S14_ak-sp-mi-process-adls-access.mp4

54:10
DBX_016_2026-01-01_082113_S15_PROJECT-incremental-ingestion-pipeline-1.mp4

01:04:52
DBX_017_2026-01-02_082355_S15_PROJECT-incremental-ingestion-pipeline-2.mp4

01:11:03
DBX_018_2026-01-03_082101_S15_PROJECT-incremental-ingestion-pipeline-3.mp4

01:09:50
DBX_019_2026-01-05_081929_S16_PROJECT-scd-type-1-implement.mp4

30:32
DBX_020_2026-01-06_081429_S17_PROJECT-scd-type-2-implementation.mp4

01:13:04
DBX_021_2026-01-09_082106_S18_unity-catalog-purpose.mp4

58:20
DBX_022_2026-01-10_082743_S19_data-cleaning-standardization-terminologies.mp4

01:10:27
DBX_023_2026-01-12_082017_S20_uc-managed-vs-non-managed-tables.mp4

59:55
DBX_024_2026-01-13_082148_S21_delta-tables-overview.mp4

01:27:41
DBX_025_2026-01-14_083615_S22_uc-essentials-catalog-schema-table.mp4

47:32
DBX_026_2026-01-16_083754_S23_s1-managed-and-non-managed-tables.mp4

01:06:02
DBX_027_2026-01-17_000000_PRJ_databricks-azure-project_B12_S13.mp4

01:01:08
DBX_028_2026-01-19_082047_S25_delta-tables-overview.mp4

52:45
DBX_029_2026-01-20_083429_S26_delta-table-anatomy-1.mp4

01:19:54
DBX_030_2026-01-21_084023_S27_delta-tables-mvcc-si-occ.mp4

43:15
DBX_031_2026-01-22_083850_S28_optimize-and-vacuum_01.mp4

01:09:16
DBX_033_2026-01-23_083143_S29_time-travel-and-cloning.mp4

01:04:54
DBX_034_2026-01-26_125446_S30_partitioning.mp4

01:02:06

PySpark DF Scenarios and Databricks Certification Practice

Spark SQL Advanced

Certification Dump Discussion

Deploying Spark Applications on AWS EMR

End To End PySpark Project

Course Material

Kafka Sessions

Student Ratings & Reviews

No Review Yet

About Course

150+ Hours • Hands-on • Project-Driven • Certification-Oriented Training

Module 1: Fundamentals (Foundation for All Data Engineers)

Topics Covered

Hands-On Practice

Spark Optimization Techniques

Projects (3 Real-Time Spark Projects)

Module 2: Databricks Engineering (Lakehouse Mastery)

Topics Covered

Platform Fundamentals

Data Ingestion & Data Formats

Data Processing & Transformation

Scheduling & Orchestration

Governance & Quality (Enterprise Grade)

Advanced Performance & Cost Optimization

Security & Compliance

Monitoring, Deployment, CI/CD

Projects (3 Real-Time Databricks Projects)

Certifications Covered

Duration: ~150 Hours

What You Will Achieve

What Will You Learn?

Course Content

M1 – Understanding The Data Engineering Domain and The Challenges

Case Study to understand Data Engineering Domain

Your First 5 Years as a Data Engineer

M2 – History Lessons

History Session 1 – Introduction to Big data

History Session 2 – Distributed Storage | Distributed Processing | Introduction

History Session 3 – Introduction to Hadoop

History Session 4 – Hadoop Components and Daemons

History Session 5 – File Blocks | Replication | Rack Awareness

History Session 6 – Rack Awareness | HA and Federation

History Session 7 – YARN Architecture

History Session 8 – YARN Architecture Doubts and Seminars

History Session 9 – YARN and MR QnA and Revision | Safe Mode | Load Balancer

M3 – Python For PySpark

Setup For Python Exercises

Python Session 1 – Multiple Ways to Print ‘Hello World’ in Python

Python Session 2 – Understanding Variables in Python

Python Session 3 – Input, Escape Sequences & Value Passing in Python

Python Session 4 – Introduction to Conditional Statements in Python

Python Session 5 – Python if else Explained with Hands On Exercises

Python Session 6 – Learn Python Lists, List Operations and Slicing

Python Session 7 – Python List Operations Hands On

Python Session 8 – Python Tuples Explained

Python Session 9 – Python Dictionaries Explained

Python Session 10 – Sets Explained | Create, Add, Remove

Python Session 11 – Loops in Python Explained | for loop, while loop, nested loops

Python Session 12 – Break vs Continue in Python Loops | Real-Time Examples

Python Session 13 – Build Rock Paper Scissors in Python using Loops

Python Session 14

Python Session 15

Python Session 16

Python Session 17

Python Session 18

Python Session 19

Python Session 20

M4 – PySpark Essentials For Data Engineering | Conceptual + Hands On 300 Exercises

[Overview] Spark Session 1 – Introduction

[ClassRec] Spark Session 1 – Introduction

[Overview] Spark Session 2 – Spark Cluster vs Application Architecture

[ClassRec] Spark Session 2 – Spark Cluster vs Application Architecture

[Overview] Spark Session 3 – RDD Terminologies and Features

[ClassRec] Spark Session 3 – RDD Terminologies and Features

[Overview] Spark Session 4 – App vs Job vs Stage vs Task

[ClassRec] Spark Session 4 – App vs Job vs Stage vs Task

[Overview] Spark Session 5 – Spark Cluster vs Client Mode

[ClassRec] Spark Session 5 – Spark Cluster vs Client Mode

[Overview] Spark Session 6 – Spark Architecture

[ClassRec] Spark Session 6 – Spark Architecture

[Overview] Spark Session 7 – Spark Distrubuted Shared Variables

[ClassRec] Spark Session 7 – Spark Distrubuted Shared Variables

[Overview] Spark Session 8 – SparkSQL Introduction – RDD vs DF vs DS

[ClassRec] Spark Session 8 – SparkSQL Introduction – RDD vs DF vs DS

[Overview] Spark Session 9 – Spark Catalyst Optimizer

[ClassRec] Spark Session 9 – Spark Catalyst Optimizer

[Overview] Spark Session 10 – SparkContext vs SpakSession

[ClassRec] Spark Session 10 – SparkContext vs SpakSession

[Installation] Spark 3.5 Installation