PySpark for Data Science - III: Data Cleaning and Analysis

  • Dive into the world of big data processing with PySpark, the Python library for Apache Spark.
  • Learn how to process, analyze, and derive insights from massive datasets using Python’s user-friendly interface.
  • Elevate your data skills with PySpark. Dive deep into distributed data processing, machine learning, streaming, and more to navigate the vast oceans of big data.

Created by Selva Prabhakaran

  • English

  • English captions

Validity Period: 365 days

Already Subscribed? Click here to access your courses

  •  Course Certificate
  • Code Walkthroughs
  • Practice Data
  • Money Back if not satisfied
  • Algorithms explanation
  • 2h 26m of Self-paced videos
  • Downloadable resources
  • Q&A sessions with experts

What you will learn

01

Identifying Variable Types

02

Outlier Detection and Treatment

03

Identifying and Removing Duplicates

04

Feature Encoding with PySpark

05

Missing Value Imputation with PySpark

06

Feature Scaling with PySpark

07

Feature Extraction / Dimensionality Reduction

Course Curriculum

6 Modules      |    22 Sessions     |    1 hour 15 min Total Time  

label_important

Introduction to Data PreProcessing

Sessions: 4 | Time: 14 min expand_more

label_important

Outlier Detection and Treatment

Sessions: 6 | Time: 19 min expand_more

label_important

Missing Value Imputation

Sessions: 5 | Time: 15 min expand_more

label_important

Feature Encoding

Sessions: 2 | Time: 11 min expand_more

label_important

Feature scaling

Sessions: 2 | Time: 5 min expand_more

label_important

Feature Extraction / Dimensionality Reduction

Sessions: 3 | Time: 11 min expand_more

Requirements

  • Courses Page1 Basics of Python
  • Courses Page1 Foundational knowledge of Data Science
  • Courses Page1 High school maths

Who should attend this course?

  • Data Science Aspirants

  • Data Science Professionals

  • Professionals working with large datasets

  • Software/Data engineers interested in quantitative analysis

  • Data analysts, economists, researchers

Instructor

Selva Prabhakaran Principal Data Scientist

My name is Selva, and I am super excited to mentor you on this project!
I head the Data Science team for a global Fortune 500 company and over the last 10 years of my data science experience I’ve deployed 20+ global products. I’m also the Founder & Chief Author of Machine Learning Plus, which has over 4M annual readers.
I specialize in covering the in-depth intuition and maths of any concept or algorithm. And based on my existing student requests, I’ve put up the series of courses and projects with detailed explanations – just like an on the job experience. Hope you love it!
  • 4.5+Instructor rating

  • 200+ reviews

  • 10K+students

  • 15+ Courses

Validity Period: 365 days

Already Subscribed? Click here to access your courses

  •  Workshop Certificate
  • Code Walkthroughs
  • Practice Data
  • Money Back if not satisfied
  • Algorithms explanation
  • 2h 26m of Self-paced videos
  • Downloadable resources
  • Q&A sessions with experts