Self Paced - DataBricks and PySpark Fullstack Data Engineering

About Course

150+ Hours • Hands-on • Project-Driven • Certification-Oriented Training

Mode of Delivery : Self Paced

Module 1: Fundamentals (Foundation for All Data Engineers)

Build a rock-solid base in programming, databases, and PySpark essentials.

Topics Covered

Python for Data Engineering
Variables, data types, loops, functions, file handling, modules, error handling
SQL Fundamentals to Advanced
Joins, subqueries, aggregations, CTEs, window functions, performance tuning
PySpark Comprehensive Tour
DataFrames, RDDs, SQL, transformations, actions, UDFs, window functions
Spark Introduction & Core Architecture
Driver, executors, cluster manager, DAG, jobs, stages, tasks

Hands-On Practice

300+ Practical Exercises
Realistic datasets for industry-level training

Spark Optimization Techniques

Projects (3 Real-Time Spark Projects)

Module 2: Databricks Engineering (Lakehouse Mastery)

Master the Databricks platform with hands-on, production-level experience.

Topics Covered

Platform Fundamentals

Databricks workspace, cluster types, compute, notebooks
Lakehouse architecture & Delta Lake internals

Data Ingestion & Data Formats

File-based ingestion (CSV, JSON, Parquet, Avro, ORC)
Streaming ingestion (Kafka, Auto Loader)
Handling structured & unstructured data
Delta Lake fundamentals, schema evolution & time travel

Data Processing & Transformation

Spark SQL & DataFrame API
UDFs, complex transformations, window functions
Joins, partitioning strategies, performance basics

Scheduling & Orchestration

Databricks Jobs & Workflows
Task orchestration, chaining dependencies
Monitoring & job failure recovery

Governance & Quality (Enterprise Grade)

Unity Catalog
Role-based access control
Table management, versioning, quality checks

Advanced Performance & Cost Optimization

Cluster sizing & autoscaling
Caching & indexing
Partitioning, Z-Order, OPTIMIZE
Streaming optimization best practices

Security & Compliance

Data governance & auditing
Secure data sharing & federation
Access policies & compliance best practices

Monitoring, Deployment, CI/CD

Databricks CLI & REST API
Git integration, dev → prod workflows
Alerts, logging, observability
Asset bundles & deployment automation

Projects (3 Real-Time Databricks Projects)

Certifications Covered

Databricks Certified Spark Developer
Databricks Certified Data Engineer

Duration: ~150 Hours

A perfectly balanced, intensive 150-hour training covering
Fundamentals → PySpark Mastery → Databricks Expertise → Certifications → Projects.

What You Will Achieve

- Become a full-stack data engineer capable of working with PySpark, Databricks, SQL, and Lakehouse architecture
- Build 3 end-to-end projects
- Gain real-world skills in optimization, governance, and pipeline deployment
- Prepare confidently for Databricks certifications
- Become industry-ready for modern data engineering rol

Course Content

M1 – Understanding The Data Engineering Domain and The Challenges

Case Study to understand Data Engineering Domain

44:14
Your First 5 Years as a Data Engineer

16:13

M2 – History Lessons

History Session 1 – Introduction to Big data

01:16:10
History Session 2 – Distributed Storage | Distributed Processing | Introduction

01:16:37
History Session 3 – Introduction to Hadoop

01:06:10
History Session 4 – Hadoop Components and Daemons

01:30:57
History Session 5 – File Blocks | Replication | Rack Awareness

01:12:31
History Session 6 – Rack Awareness | HA and Federation

01:19:56
History Session 7 – YARN Architecture

01:26:57
History Session 8 – YARN Architecture Doubts and Seminars

01:14:47
History Session 9 – YARN and MR QnA and Revision | Safe Mode | Load Balancer

01:07:57

M3 – Python For PySpark

Setup For Python Exercises

17:34
Session 1 – python hello world

01:04:27
Session 2 – python variables

01:03:53
Session 3 – python escape seq prompting passing

01:41:41
Session 4 – python passing hands on 2

01:01:17
Session 5 – python escape seq prompting passing

01:41:41
session 6 – python conditional statements session 1
Session 7 – python list indexing slicing and step

01:23:51
Session 8 – python conditional statements session 2

01:10:37
Session 9 – python list operations and machine test questions

54:04
Session 10 – python tuple and dictionaries

01:03:50
Session 11 – python while loops

01:22:10
Session 12 – python sets and for loop fundamentals 1

01:23:48
Session 13 – simple login app using python

01:04:56
Session 14 – python loops break and continue

01:04:23
Session 15 – python functions 1

01:23:59
Session 16 – python functions 2

01:28:22
Session 17 – python functions 3

01:10:46
Session 18 – python list comprehension 2

01:46:33
session 19 – python list compression 1

01:31:08
Session 20 – python re 1

01:11:50
Session 21 – python re 2

01:11:22
session 22 – python file io

57:56
Session 23 – python oop session 1

01:26:41
Session 24 – python oop session 2 instance variable vs class variable

01:12:01
session 25 – python oop instance vs class vs static methods

01:04:20
session 26 – python oop inheritance

01:80:32
session 27 – python oop property decorator

01:13:34
session 28 – python private methods and private attributes

35:35
session 29 – python bstract base clas and abstract method

36:12
session 30 – python polymorphism and encapsulation

58:33
session 31 – python functional programming first class citizens

01:09:20
session 32 – python functional programming closures

37:53
session 33 – python functional programming closures revision

39:40
session 34 – python functional programming decorators

56:17
session 35 – python functional programming generators

01:04:24
session 36 – python functional programming generators

45:15
session 37 – python functional programming iterators

37:20
session 38 – python modules and name main

01:14:12
session 39 – python pckages init

34:19
session 40 – python exception handling

26:58

M4 – PySpark Essentials For Data Engineering | Conceptual + Hands On 300 Exercises

[Overview] Spark Session 1 – Introduction

16:37
[ClassRec] Spark Session 1 – Introduction

37:34
[Overview] Spark Session 2 – Spark Cluster vs Application Architecture

12:07
[ClassRec] Spark Session 2 – Spark Cluster vs Application Architecture

47:42
[Overview] Spark Session 3 – RDD Terminologies and Features

47:20
[ClassRec] Spark Session 3 – RDD Terminologies and Features

52:01
[Overview] Spark Session 4 – App vs Job vs Stage vs Task

33:41
[ClassRec] Spark Session 4 – App vs Job vs Stage vs Task

49:23
[Overview] Spark Session 5 – Spark Cluster vs Client Mode

12:47
[ClassRec] Spark Session 5 – Spark Cluster vs Client Mode

35:05
[Overview] Spark Session 6 – Spark Architecture

39:24
[ClassRec] Spark Session 6 – Spark Architecture

48:59
[Overview] Spark Session 7 – Spark Distrubuted Shared Variables

43:39
[ClassRec] Spark Session 7 – Spark Distrubuted Shared Variables

36:21
[Overview] Spark Session 8 – SparkSQL Introduction – RDD vs DF vs DS

40:58
[ClassRec] Spark Session 8 – SparkSQL Introduction – RDD vs DF vs DS

45:47
[Overview] Spark Session 9 – Spark Catalyst Optimizer

26:01
[ClassRec] Spark Session 9 – Spark Catalyst Optimizer

10:06
[Overview] Spark Session 10 – SparkContext vs SpakSession

39:32
[ClassRec] Spark Session 10 – SparkContext vs SpakSession

46:56
[Installation] Spark 3.5 Installation

20:35
[Overview] Spark Session 11 – Setup For Exercises

12:55
[ClassRec] Spark Session 12 : Ways to Create RDDs

16:34
[ClassRec] Spark Session 13 – RDD Creations Practice and Good Practices

16:34
[Overview] Spark Session 14 – map, mapPartitions, mapPartitionsWithIndex, glom

35:56
[ClassRec] Spark Session 14 – map, mapPartitions, mapPartitionsWithIndex, glom

27:08
[ClassRec] Spark Session 15 – map vs flatMap

13:11
[ClassRec] Spark Session 16 – groupByKey vs reduceByKey

40:50
[Overview] Spark Session 17 – Creating DFs from CSV files

38:11
[ClassRec] Spark Session 17 – Creating DFs from CSV files

27:18
[Overview] Spark Session 18 – Creating DF From JSON and XML Files

07:08
[Overview] Spark Session 19 – Creating DFs from Binary files

11:42
[ClassRec] Spark Session 19 – Creating DFs from Binary files

33:36
[Overview] Spark Session 20 – Referring Columns, select, selectExpr, filter

15:50
[ClassRec] Spark Session 20 – Referring Columns, select, selectExpr, filter

17:28
[ClassRec] Spark Session 21 – sort / orderBy

17:38
[Overview] Spark Session 22 – groupBy and Aggregations

21:07
[ClassRec] Spark Session 22 – groupBy and Aggregations

19:23
[Overview] Spark Session 23 – Joins (inner, outer, left, right, left semi, left anti, cross, self)

18:58
[ClassRec] Spark Session 23 – Joins (inner, outer, left, right, left semi, left anti, cross, self)

39:20
[ClassRec] Spark Session 24 – Joins Revision

20:15
[Overview] Spark Session 25 – Window Functions | Ranking Functions

27:34
[ClassRec] Spark Session 25 – Window Functions | Ranking Functions

32:13
[Overview] Spark Session 26 – Window Analytical and Aggregate Functions

29:07
[ClassRec] Spark Session 26 – Window Aggregate Functions

21:56
[ClassRec] Spark Session 27 – Window Analytical Functions

17:10
[Overview] Spark Session 28 – Dealing With NULL Values

16:17
[Overview] Spark Session 29 – Dealing With Duplicate Records

05:18
[Overview] Spark Session 30 – Pivot and UnPivot

32:15
[Overview] Spark Session 31 – UDFs in PySpark

23:39
[ClassRec] Spark Session 31 – UDFs in PySpark

26:28

M5 – Spark Advanced – Optimization Techniques – Industry Scenarios

[Overview] Spark Session 32 – Cache vs Persist

55:26
[ClassRec] Spark Session 32 – Cache vs Persist

22:32
[ClassRec] Spark Session 32 – Cache vs Persist S2

01:05:28
[Overview] Spark Session 33 – Executom Memory Architecture

32:36
[ClassRec] Spark Session 33 – Executor Memory Architecture S1

55:16
[ClassRec] Spark Session 33 – Executor Memory Architecture S2

01:12:00
[Overview] Spark Session 34 – Adaptive Query Execution

45:30
[ClassRec] Spark Session 34 – Adaptive Query Execution

01:10:32
[Overview] Spark Session 35 – Join Strategies in PySpark

52:34
[ClassRec] Spark Session 35 – Join Strategies – Broadcast Join

50:52
[ClassRec] Spark Session 35 – Join Strategies – Shuffle Hash Join

01:02:44
[ClassRec] Spark Session 35 – Join Strategies – Sort Merge Join and More

37:27
[ClassRec] Spark Session 36 – Resource Calculations For Spark Applications

01:13:12
[ClassRec] Spark Session 37 – Dynamic Resource Allocation

30:59
[ClassRec] Spark Session 38 – Garbage Collection Tuning

01:04:26
[ClassRec] Spark Session 39 – Handling Data Skew S1

44:36
[Overview] Spark Session 40 – Controlling Prallelism For Spark Applications

42:53
[ClassRec] Spark Session 40 – Controlling Parallelism For Spark Applications

56:38
[ClassRec] Spark Session 41 – Handling Data Skew S2

40:07
[ClassRec] Spark Session 42- Design Level Optimizations

45:08
[ClassRec] Spark Session 43 – Out Of Memory Error – Speculative Execution – DPP

49:20

M6 – Full Stack Data Engineering using Databricks | Part 1

DBX_001_2025-12-15_082818_S01_workspace-resource-groups-managed-resource-group.mp4

48:59
DBX_002_2025-12-16_083118_S02_networking-fundamentals-1.mp4

01:11:09
DBX_003_2025-12-17_084919_S03_networking-fundamentals-2.mp4

55:44
DBX_004_2025-12-19_083447_S03_deploy-databricks.mp4

34:51
DBX_006_2025-12-20_084710_S05_workspace-basics.mp4

48:59
DBX_007_2025-12-21_085139_S06_dbutils-storage-local-vs-dbfs.mp4

54:52
DBX_008_2025-12-22_084744_S07_hands-on-formalities.mp4

01:00:07
DBX_009_2025-12-23_084954_S08_(FLOP)_blob-vs-adls-dbfs-vs-volumes.mp4

57:25
DBX_010_2025-12-25_084527_S09_azure-storage-fundamentals.mp4

01:04:49
DBX_011_2025-12-26_084003_S10_storage-options-hands-on.mp4

54:53
DBX_012_2025-12-27_084545_S11_unity-catalog-overview.mp4

57:31
DBX_013_2025-12-29_083503_S12_sp-accessing-adls-from-databricks.mp4

01:01:53
DBX_014_2025-12-30_081837_S13_mi-accessing-adls-from-databricks.mp4

40:29
DBX_015_2025-12-31_080348_S14_ak-sp-mi-process-adls-access.mp4

54:10
DBX_016_2026-01-01_082113_S15_PROJECT-incremental-ingestion-pipeline-1.mp4

01:04:52
DBX_017_2026-01-02_082355_S15_PROJECT-incremental-ingestion-pipeline-2.mp4

01:11:03
DBX_018_2026-01-03_082101_S15_PROJECT-incremental-ingestion-pipeline-3.mp4

01:09:50
DBX_019_2026-01-05_081929_S16_PROJECT-scd-type-1-implement.mp4

30:32
DBX_020_2026-01-06_081429_S17_PROJECT-scd-type-2-implementation.mp4

01:13:04
DBX_021_2026-01-09_082106_S18_unity-catalog-purpose.mp4

58:20
DBX_022_2026-01-10_082743_S19_data-cleaning-standardization-terminologies.mp4

01:10:27
DBX_023_2026-01-12_082017_S20_uc-managed-vs-non-managed-tables.mp4

59:55
DBX_024_2026-01-13_082148_S21_delta-tables-overview.mp4

01:27:41
DBX_025_2026-01-14_083615_S22_uc-essentials-catalog-schema-table.mp4

47:32
DBX_026_2026-01-16_083754_S23_s1-managed-and-non-managed-tables.mp4

01:06:02
DBX_027_2026-01-17_000000_PRJ_databricks-azure-project_B12_S13.mp4

01:01:08
DBX_028_2026-01-19_082047_S25_delta-tables-overview.mp4

52:45
DBX_029_2026-01-20_083429_S26_delta-table-anatomy-1.mp4

01:19:54
DBX_030_2026-01-21_084023_S27_delta-tables-mvcc-si-occ.mp4

43:15
DBX_031_2026-01-22_083850_S28_optimize-and-vacuum_01.mp4

01:09:16
DBX_033_2026-01-23_083143_S29_time-travel-and-cloning.mp4

01:04:54
DBX_034_2026-01-26_125446_S30_partitioning.mp4

01:02:06

PySpark DF Scenarios and Databricks Certification Practice

Spark SQL Advanced

Certification Dump Discussion

Deploying Spark Applications on AWS EMR

End To End PySpark Project

Course Material

Kafka Sessions

SQL for Data Engineering

session 1 – rdbms session introduction to dbms

47:34
session 2 – rdbms session introduction to mysql

30:28
session 3 – rdbms session ER modelling entity attribute types entity set entity types

53:24
session 4 – rdbms session election of a primary key

23:27
session 5 – rdbms session relationship all terminologies

01:04:21
session 6 – CRUD Basics

01:25:18
session 7- keys hands on

01:16:27
session 8 – purpose and defining the constraints

47:57
session 9 – foreign key options

50:15
session 10 – alter constraints

51:06
session 11 – select and filter session 1

01:15:52
session 12 – select and filter session 2

01:07:31
session 13 – group by 1

01:01:45
session 14 – group by 2

01:07:10
session 15 – group by 3

01:18:49
session 16 – order by 1

41:59
session 17 – order by 2

01:00:32
session 18 – order by 3

01:01:50
session 19 – case expression 1

01:17:19
session 20 – joins 1

01:38:36
session 21 – joins 2

50:19
session 22 – window functions session 1

01:13:42
rdbms session 23 window aggregate functions

01:09:30
rdbms session 24 window analytical functions and advance window functions

53:31
rdbms session 25 union unionall intersection except

01:02:26
rdbms session 26 subqueries session 1

01:24:47
rdbms session 27 subqueries session 2

40:33
rdbms session 28 CTE common table expression

01:26:26

Student Ratings & Reviews

No Review Yet

About Course

150+ Hours • Hands-on • Project-Driven • Certification-Oriented Training

Module 1: Fundamentals (Foundation for All Data Engineers)

Topics Covered

Hands-On Practice

Spark Optimization Techniques

Projects (3 Real-Time Spark Projects)

Module 2: Databricks Engineering (Lakehouse Mastery)

Topics Covered

Platform Fundamentals

Data Ingestion & Data Formats

Data Processing & Transformation

Scheduling & Orchestration

Governance & Quality (Enterprise Grade)

Advanced Performance & Cost Optimization

Security & Compliance

Monitoring, Deployment, CI/CD

Projects (3 Real-Time Databricks Projects)

Certifications Covered

Duration: ~150 Hours

What You Will Achieve

What Will You Learn?

Course Content

M1 – Understanding The Data Engineering Domain and The Challenges

Case Study to understand Data Engineering Domain

Your First 5 Years as a Data Engineer

M2 – History Lessons

History Session 1 – Introduction to Big data

History Session 2 – Distributed Storage | Distributed Processing | Introduction

History Session 3 – Introduction to Hadoop

History Session 4 – Hadoop Components and Daemons

History Session 5 – File Blocks | Replication | Rack Awareness

History Session 6 – Rack Awareness | HA and Federation

History Session 7 – YARN Architecture

History Session 8 – YARN Architecture Doubts and Seminars

History Session 9 – YARN and MR QnA and Revision | Safe Mode | Load Balancer

M3 – Python For PySpark

Setup For Python Exercises

Session 1 – python hello world

Session 2 – python variables

Session 3 – python escape seq prompting passing

Session 4 – python passing hands on 2

Session 5 – python escape seq prompting passing

session 6 – python conditional statements session 1

Session 7 – python list indexing slicing and step

Session 8 – python conditional statements session 2

Session 9 – python list operations and machine test questions

Session 10 – python tuple and dictionaries

Session 11 – python while loops

Session 12 – python sets and for loop fundamentals 1

Session 13 – simple login app using python

Session 14 – python loops break and continue

Session 15 – python functions 1

Session 16 – python functions 2

Session 17 – python functions 3

Session 18 – python list comprehension 2

session 19 – python list compression 1

Session 20 – python re 1

Session 21 – python re 2

session 22 – python file io

Session 23 – python oop session 1

Session 24 – python oop session 2 instance variable vs class variable

session 25 – python oop instance vs class vs static methods

session 26 – python oop inheritance

session 27 – python oop property decorator

session 28 – python private methods and private attributes

session 29 – python bstract base clas and abstract method

session 30 – python polymorphism and encapsulation

session 31 – python functional programming first class citizens

session 32 – python functional programming closures

session 33 – python functional programming closures revision

session 34 – python functional programming decorators

session 35 – python functional programming generators

session 36 – python functional programming generators

session 37 – python functional programming iterators

session 38 – python modules and name main

session 39 – python pckages init

session 40 – python exception handling

M4 – PySpark Essentials For Data Engineering | Conceptual + Hands On 300 Exercises

[Overview] Spark Session 1 – Introduction