PySpark for Data Science - Fundamentals

  • Dive into the world of big data processing with PySpark, the Python library for Apache Spark.
  • Learn how to process, analyze, and derive insights from massive datasets using Python’s user-friendly interface.
  • Elevate your data skills with PySpark. Dive deep into distributed data processing, machine learning, streaming, and more to navigate the vast oceans of big data.


Created by Selva Prabhakaran

  • English

  • English captions

Validity Period: 365 days

Already Subscribed? Click here to access your courses

  •  Course Certificate
  • Code Walkthroughs
  • Practice Data
  • Money Back if not satisfied
  • Algorithms explanation
  • 2h 26m of Self-paced videos
  • Downloadable resources
  • Q&A sessions with experts

What you will learn

01

Introduction to PySpark

02

PySpark Statistics

03

PySpark Data Cleaning and Processing

04

PySpark MLlib Models

Course Curriculum

5 Modules      |    25 Sessions     |    2 hour 5 min Total Time  

label_important

Introduction to PySpark

Sessions: 3 | Time: 17 min expand_more

label_important

The Spark session and Spark Dataframes

Sessions: 5 | Time: 38 min expand_more

label_important

PySpark Data Wrangling Techniques

Sessions: 5 | Time: 15 min expand_more

label_important

Aggregation and custom methods

Sessions: 5 | Time: 34 min expand_more

label_important

Joins and Pivoting

Sessions: 6 | Time: 14 min expand_more

Requirements

  • Courses Page1 Basics of Python
  • Courses Page1 Foundational knowledge of Data Science
  • Courses Page1 High school maths

Who should attend this course?

  • Data Science Aspirants

  • Data Science Professionals

  • Professionals working with large datasets

  • Software/Data engineers interested in quantitative analysis

  • Data analysts, economists, researchers

Instructor

Selva Prabhakaran Principal Data Scientist

My name is Selva, and I am super excited to mentor you on this project!
I head the Data Science team for a global Fortune 500 company and over the last 10 years of my data science experience I’ve deployed 20+ global products. I’m also the Founder & Chief Author of Machine Learning Plus, which has over 4M annual readers.
I specialize in covering the in-depth intuition and maths of any concept or algorithm. And based on my existing student requests, I’ve put up the series of courses and projects with detailed explanations – just like an on the job experience. Hope you love it!
  • 4.5+Instructor rating

  • 200+ reviews

  • 10K+students

  • 15+ Courses

Validity Period: 365 days

Already Subscribed? Click here to access your courses

  •  Workshop Certificate
  • Code Walkthroughs
  • Practice Data
  • Money Back if not satisfied
  • Algorithms explanation
  • 2h 26m of Self-paced videos
  • Downloadable resources
  • Q&A sessions with experts