Self Paced – DataBricks and PySpark Fullstack Data Engineering
About Course
150+ Hours • Hands-on • Project-Driven • Certification-Oriented Training
Mode of Delivery : Self Paced
Module 1: Fundamentals (Foundation for All Data Engineers)
Build a rock-solid base in programming, databases, and PySpark essentials.
Topics Covered
- Python for Data Engineering
Variables, data types, loops, functions, file handling, modules, error handling - SQL Fundamentals to Advanced
Joins, subqueries, aggregations, CTEs, window functions, performance tuning - PySpark Comprehensive Tour
DataFrames, RDDs, SQL, transformations, actions, UDFs, window functions - Spark Introduction & Core Architecture
Driver, executors, cluster manager, DAG, jobs, stages, tasks
Hands-On Practice
- 300+ Practical Exercises
- Realistic datasets for industry-level training
Spark Optimization Techniques
Projects (3 Real-Time Spark Projects)
Module 2: Databricks Engineering (Lakehouse Mastery)
Master the Databricks platform with hands-on, production-level experience.
Topics Covered
Platform Fundamentals
- Databricks workspace, cluster types, compute, notebooks
- Lakehouse architecture & Delta Lake internals
Data Ingestion & Data Formats
- File-based ingestion (CSV, JSON, Parquet, Avro, ORC)
- Streaming ingestion (Kafka, Auto Loader)
- Handling structured & unstructured data
- Delta Lake fundamentals, schema evolution & time travel
Data Processing & Transformation
- Spark SQL & DataFrame API
- UDFs, complex transformations, window functions
- Joins, partitioning strategies, performance basics
Scheduling & Orchestration
- Databricks Jobs & Workflows
- Task orchestration, chaining dependencies
- Monitoring & job failure recovery
Governance & Quality (Enterprise Grade)
- Unity Catalog
- Role-based access control
- Table management, versioning, quality checks
Advanced Performance & Cost Optimization
- Cluster sizing & autoscaling
- Caching & indexing
- Partitioning, Z-Order, OPTIMIZE
- Streaming optimization best practices
Security & Compliance
- Data governance & auditing
- Secure data sharing & federation
- Access policies & compliance best practices
Monitoring, Deployment, CI/CD
- Databricks CLI & REST API
- Git integration, dev → prod workflows
- Alerts, logging, observability
- Asset bundles & deployment automation
Projects (3 Real-Time Databricks Projects)
Certifications Covered
- Databricks Certified Spark Developer
- Databricks Certified Data Engineer
Duration: ~150 Hours
A perfectly balanced, intensive 150-hour training covering
Fundamentals → PySpark Mastery → Databricks Expertise → Certifications → Projects.
What You Will Achieve
-
- Become a full-stack data engineer capable of working with PySpark, Databricks, SQL, and Lakehouse architecture
- Build 3 end-to-end projects
- Gain real-world skills in optimization, governance, and pipeline deployment
- Prepare confidently for Databricks certifications
- Become industry-ready for modern data engineering rol
- Become a full-stack data engineer capable of working with PySpark, Databricks, SQL, and Lakehouse architecture
What Will You Learn?
- Deep Understanding of Apache Spark 3
- Understanding of Real time scenarios
- Certification and Industry Training
Course Content
M1 – Understanding The Data Engineering Domain and The Challenges
-
Case Study to understand Data Engineering Domain
44:14 -
Your First 5 Years as a Data Engineer
16:13
M2 – History Lessons
-
History Session 1 – Introduction to Big data
01:16:10 -
History Session 2 – Distributed Storage | Distributed Processing | Introduction
01:16:37 -
History Session 3 – Introduction to Hadoop
01:06:10 -
History Session 4 – Hadoop Components and Daemons
01:30:57 -
History Session 5 – File Blocks | Replication | Rack Awareness
01:12:31 -
History Session 6 – Rack Awareness | HA and Federation
01:19:56 -
History Session 7 – YARN Architecture
01:26:57 -
History Session 8 – YARN Architecture Doubts and Seminars
01:14:47 -
History Session 9 – YARN and MR QnA and Revision | Safe Mode | Load Balancer
01:07:57
M3 – Python For PySpark
-
Setup For Python Exercises
17:34 -
Python Session 1 – Multiple Ways to Print ‘Hello World’ in Python
36:21 -
Python Session 2 – Understanding Variables in Python
51:59 -
Python Session 3 – Input, Escape Sequences & Value Passing in Python
32:55 -
Python Session 4 – Introduction to Conditional Statements in Python
35:06 -
Python Session 5 – Python if else Explained with Hands On Exercises
22:05 -
Python Session 6 – Learn Python Lists, List Operations and Slicing
48:08 -
Python Session 7 – Python List Operations Hands On
26:19 -
Python Session 8 – Python Tuples Explained
32:55 -
Python Session 9 – Python Dictionaries Explained
28:57 -
Python Session 10 – Sets Explained | Create, Add, Remove
32:20 -
Python Session 11 – Loops in Python Explained | for loop, while loop, nested loops
41:15 -
Python Session 12 – Break vs Continue in Python Loops | Real-Time Examples
33:57 -
Python Session 13 – Build Rock Paper Scissors in Python using Loops
16:12 -
Python Session 14
55:57 -
Python Session 15
58:26 -
Python Session 16
01:15:42 -
Python Session 17
01:03:01 -
Python Session 18
43:51 -
Python Session 19
01:07:31 -
Python Session 20
49:06
M4 – PySpark Essentials For Data Engineering | Conceptual + Hands On 300 Exercises
-
[Overview] Spark Session 1 – Introduction
16:37 -
[ClassRec] Spark Session 1 – Introduction
37:34 -
[Overview] Spark Session 2 – Spark Cluster vs Application Architecture
12:07 -
[ClassRec] Spark Session 2 – Spark Cluster vs Application Architecture
47:42 -
[Overview] Spark Session 3 – RDD Terminologies and Features
47:20 -
[ClassRec] Spark Session 3 – RDD Terminologies and Features
52:01 -
[Overview] Spark Session 4 – App vs Job vs Stage vs Task
33:41 -
[ClassRec] Spark Session 4 – App vs Job vs Stage vs Task
49:23 -
[Overview] Spark Session 5 – Spark Cluster vs Client Mode
12:47 -
[ClassRec] Spark Session 5 – Spark Cluster vs Client Mode
35:05 -
[Overview] Spark Session 6 – Spark Architecture
39:24 -
[ClassRec] Spark Session 6 – Spark Architecture
48:59 -
[Overview] Spark Session 7 – Spark Distrubuted Shared Variables
43:39 -
[ClassRec] Spark Session 7 – Spark Distrubuted Shared Variables
36:21 -
[Overview] Spark Session 8 – SparkSQL Introduction – RDD vs DF vs DS
40:58 -
[ClassRec] Spark Session 8 – SparkSQL Introduction – RDD vs DF vs DS
45:47 -
[Overview] Spark Session 9 – Spark Catalyst Optimizer
26:01 -
[ClassRec] Spark Session 9 – Spark Catalyst Optimizer
10:06 -
[Overview] Spark Session 10 – SparkContext vs SpakSession
39:32 -
[ClassRec] Spark Session 10 – SparkContext vs SpakSession
46:56 -
[Installation] Spark 3.5 Installation
20:35 -
[Overview] Spark Session 11 – Setup For Exercises
12:55 -
[ClassRec] Spark Session 12 : Ways to Create RDDs
16:34 -
[ClassRec] Spark Session 13 – RDD Creations Practice and Good Practices
16:34 -
[Overview] Spark Session 14 – map, mapPartitions, mapPartitionsWithIndex, glom
35:56 -
[ClassRec] Spark Session 14 – map, mapPartitions, mapPartitionsWithIndex, glom
27:08 -
[ClassRec] Spark Session 15 – map vs flatMap
13:11 -
[ClassRec] Spark Session 16 – groupByKey vs reduceByKey
40:50 -
[Overview] Spark Session 17 – Creating DFs from CSV files
38:11 -
[ClassRec] Spark Session 17 – Creating DFs from CSV files
27:18 -
[Overview] Spark Session 18 – Creating DF From JSON and XML Files
07:08 -
[Overview] Spark Session 19 – Creating DFs from Binary files
11:42 -
[ClassRec] Spark Session 19 – Creating DFs from Binary files
33:36 -
[Overview] Spark Session 20 – Referring Columns, select, selectExpr, filter
15:50 -
[ClassRec] Spark Session 20 – Referring Columns, select, selectExpr, filter
17:28 -
[ClassRec] Spark Session 21 – sort / orderBy
17:38 -
[Overview] Spark Session 22 – groupBy and Aggregations
21:07 -
[ClassRec] Spark Session 22 – groupBy and Aggregations
19:23 -
[Overview] Spark Session 23 – Joins (inner, outer, left, right, left semi, left anti, cross, self)
18:58 -
[ClassRec] Spark Session 23 – Joins (inner, outer, left, right, left semi, left anti, cross, self)
39:20 -
[ClassRec] Spark Session 24 – Joins Revision
20:15 -
[Overview] Spark Session 25 – Window Functions | Ranking Functions
27:34 -
[ClassRec] Spark Session 25 – Window Functions | Ranking Functions
32:13 -
[Overview] Spark Session 26 – Window Analytical and Aggregate Functions
29:07 -
[ClassRec] Spark Session 26 – Window Aggregate Functions
21:56 -
[ClassRec] Spark Session 27 – Window Analytical Functions
17:10 -
[Overview] Spark Session 28 – Dealing With NULL Values
16:17 -
[Overview] Spark Session 29 – Dealing With Duplicate Records
05:18 -
[Overview] Spark Session 30 – Pivot and UnPivot
32:15 -
[Overview] Spark Session 31 – UDFs in PySpark
23:39 -
[ClassRec] Spark Session 31 – UDFs in PySpark
26:28
M5 – Spark Advanced – Optimization Techniques – Industry Scenarios
-
[Overview] Spark Session 32 – Cache vs Persist
55:26 -
[ClassRec] Spark Session 32 – Cache vs Persist
22:32 -
[ClassRec] Spark Session 32 – Cache vs Persist S2
01:05:28 -
[Overview] Spark Session 33 – Executom Memory Architecture
32:36 -
[ClassRec] Spark Session 33 – Executor Memory Architecture S1
55:16 -
[ClassRec] Spark Session 33 – Executor Memory Architecture S2
01:12:00 -
[Overview] Spark Session 34 – Adaptive Query Execution
45:30 -
[ClassRec] Spark Session 34 – Adaptive Query Execution
01:10:32 -
[Overview] Spark Session 35 – Join Strategies in PySpark
52:34 -
[ClassRec] Spark Session 35 – Join Strategies – Broadcast Join
50:52 -
[ClassRec] Spark Session 35 – Join Strategies – Shuffle Hash Join
01:02:44 -
[ClassRec] Spark Session 35 – Join Strategies – Sort Merge Join and More
37:27 -
[ClassRec] Spark Session 36 – Resource Calculations For Spark Applications
01:13:12 -
[ClassRec] Spark Session 37 – Dynamic Resource Allocation
30:59 -
[ClassRec] Spark Session 38 – Garbage Collection Tuning
01:04:26 -
[ClassRec] Spark Session 39 – Handling Data Skew S1
44:36 -
[Overview] Spark Session 40 – Controlling Prallelism For Spark Applications
42:53 -
[ClassRec] Spark Session 40 – Controlling Parallelism For Spark Applications
56:38 -
[ClassRec] Spark Session 41 – Handling Data Skew S2
40:07 -
[ClassRec] Spark Session 42- Design Level Optimizations
45:08 -
[ClassRec] Spark Session 43 – Out Of Memory Error – Speculative Execution – DPP
49:20
M6 – Full Stack Data Engineering using Databricks | Part 1
-
DBX_001_2025-12-15_082818_S01_workspace-resource-groups-managed-resource-group.mp4
48:59 -
DBX_002_2025-12-16_083118_S02_networking-fundamentals-1.mp4
01:11:09 -
DBX_003_2025-12-17_084919_S03_networking-fundamentals-2.mp4
55:44 -
DBX_004_2025-12-19_083447_S03_deploy-databricks.mp4
34:51 -
DBX_006_2025-12-20_084710_S05_workspace-basics.mp4
48:59 -
DBX_007_2025-12-21_085139_S06_dbutils-storage-local-vs-dbfs.mp4
54:52 -
DBX_008_2025-12-22_084744_S07_hands-on-formalities.mp4
01:00:07 -
DBX_009_2025-12-23_084954_S08_(FLOP)_blob-vs-adls-dbfs-vs-volumes.mp4
57:25 -
DBX_010_2025-12-25_084527_S09_azure-storage-fundamentals.mp4
01:04:49 -
DBX_011_2025-12-26_084003_S10_storage-options-hands-on.mp4
54:53 -
DBX_012_2025-12-27_084545_S11_unity-catalog-overview.mp4
57:31 -
DBX_013_2025-12-29_083503_S12_sp-accessing-adls-from-databricks.mp4
01:01:53 -
DBX_014_2025-12-30_081837_S13_mi-accessing-adls-from-databricks.mp4
40:29 -
DBX_015_2025-12-31_080348_S14_ak-sp-mi-process-adls-access.mp4
54:10 -
DBX_016_2026-01-01_082113_S15_PROJECT-incremental-ingestion-pipeline-1.mp4
01:04:52 -
DBX_017_2026-01-02_082355_S15_PROJECT-incremental-ingestion-pipeline-2.mp4
01:11:03 -
DBX_018_2026-01-03_082101_S15_PROJECT-incremental-ingestion-pipeline-3.mp4
01:09:50 -
DBX_019_2026-01-05_081929_S16_PROJECT-scd-type-1-implement.mp4
30:32 -
DBX_020_2026-01-06_081429_S17_PROJECT-scd-type-2-implementation.mp4
01:13:04 -
DBX_021_2026-01-09_082106_S18_unity-catalog-purpose.mp4
58:20 -
DBX_022_2026-01-10_082743_S19_data-cleaning-standardization-terminologies.mp4
01:10:27 -
DBX_023_2026-01-12_082017_S20_uc-managed-vs-non-managed-tables.mp4
59:55 -
DBX_024_2026-01-13_082148_S21_delta-tables-overview.mp4
01:27:41 -
DBX_025_2026-01-14_083615_S22_uc-essentials-catalog-schema-table.mp4
47:32 -
DBX_026_2026-01-16_083754_S23_s1-managed-and-non-managed-tables.mp4
01:06:02 -
DBX_027_2026-01-17_000000_PRJ_databricks-azure-project_B12_S13.mp4
01:01:08 -
DBX_028_2026-01-19_082047_S25_delta-tables-overview.mp4
52:45 -
DBX_029_2026-01-20_083429_S26_delta-table-anatomy-1.mp4
01:19:54 -
DBX_030_2026-01-21_084023_S27_delta-tables-mvcc-si-occ.mp4
43:15 -
DBX_031_2026-01-22_083850_S28_optimize-and-vacuum_01.mp4
01:09:16 -
DBX_033_2026-01-23_083143_S29_time-travel-and-cloning.mp4
01:04:54 -
DBX_034_2026-01-26_125446_S30_partitioning.mp4
01:02:06
PySpark DF Scenarios and Databricks Certification Practice
-
Session : 1
01:07:53 -
Session : 2
01:02:28 -
Session : 3
01:16:19 -
Session : 4
51:23 -
Session : 5
47:50 -
Session : 6
35:49 -
Session : 7
40:39 -
Session : 8
29:37 -
Practice Exercises and Solutions Document
00:00
Spark SQL Advanced
-
AQE | Cache VS Persist
01:48:43 -
Caching Doubts | Serialization Deserialization
53:10 -
Coalsesce vs Partition Scenarios
01:32:36 -
Resource Calculations for Spark Application | DRA
01:40:09 -
Garbage Collection Tuning
48:57
Certification Dump Discussion
Deploying Spark Applications on AWS EMR
-
Deploying Spark Application in Client Mode
58:05
End To End PySpark Project
Course Material
-
Presentation
00:00
Kafka Sessions
-
Kafka Installation
18:04 -
Session 1
48:44 -
Session 2
39:55 -
Session 3
15:00 -
Session 4
19:09 -
Session 5
01:04:18
Student Ratings & Reviews
No Review Yet