PySpark for Data Science - V:Ml Pipelines

  • Dive into the world of big data processing with PySpark, the Python library for Apache Spark.
  • Learn how to process, analyze, and derive insights from massive datasets using Python’s user-friendly interface.
  • Elevate your data skills with PySpark. Dive deep into distributed data processing, machine learning, streaming, and more to navigate the vast oceans of big data.


Created by Selva Prabhakaran

  • English

  • English captions

Validity Period: 365 days

Already Subscribed? Click here to access your courses

  •  Course Certificate
  • Code Walkthroughs
  • Practice Data
  • Money Back if not satisfied
  • Algorithms explanation
  • 2h 26m of Self-paced videos
  • Downloadable resources
  • Q&A sessions with experts

What you will learn

01

PySpark Decision Tree

02

PySpark Logisticregression

03

PySpark Random Forest

04

PySpark Gradient Boost

05

PySParkXGBoost

Course Curriculum

5 Modules      |    18 Sessions     |    1 hour 10 min Total Time  

label_important

Recap Decision Trees

Sessions: 3 | Time: 16 min expand_more

label_important

Build Decision Trees in PySpark

Sessions: 8 | Time: 19 min expand_more

label_important

Tuning the Tree with Pipelines

Sessions: 1 | Time: 12 min expand_more

label_important

Self Assessment

Sessions: 3 | Time: 5 min expand_more

label_important

XGBoost model using PySpark

Sessions: 3 | Time: 18 min expand_more

Requirements

  • Courses Page1 Basics of Python
  • Courses Page1 Foundational knowledge of Data Science
  • Courses Page1 High school maths

Who should attend this course?

  • Data Science Aspirants

  • Data Science Professionals

  • Professionals working with large datasets

  • Software/Data engineers interested in quantitative analysis

  • Data analysts, economists, researchers

Instructor

Selva Prabhakaran Principal Data Scientist

My name is Selva, and I am super excited to mentor you on this project!
I head the Data Science team for a global Fortune 500 company and over the last 10 years of my data science experience I’ve deployed 20+ global products. I’m also the Founder & Chief Author of Machine Learning Plus, which has over 4M annual readers.
I specialize in covering the in-depth intuition and maths of any concept or algorithm. And based on my existing student requests, I’ve put up the series of courses and projects with detailed explanations – just like an on the job experience. Hope you love it!
  • 4.5+Instructor rating

  • 200+ reviews

  • 10K+students

  • 15+ Courses

Validity Period: 365 days

Already Subscribed? Click here to access your courses

  •  Workshop Certificate
  • Code Walkthroughs
  • Practice Data
  • Money Back if not satisfied
  • Algorithms explanation
  • 2h 26m of Self-paced videos
  • Downloadable resources
  • Q&A sessions with experts