top of page
  • Writer's pictureKen Jee

Kaggle Project From Scratch Part 3

Notebook 1:

Notebook 2:

I do two main analyses in this video. First, I look build more advanced graphs to compare differences in skills and characteristics across data science roles using the kaggle developer survey data. I then create a function to make this task more scalable.

In the second part of the video, I use a few different techniques to determine if there is gender bias in data science particularly relating to salary.

In the notebook I:

- First visualize and normalize gender differences in the sample

- Run a multiple linear regression to understand which factors contribute most to earning potential

- Run a lasso regression to narrow variable set and try to quantify the extent gender impacts earning potential

- Run a random forest on same data to evaluate feature importance (A nonlinear model like this is a good check)

- Compare models for just subsets of women and men to hopefully normalize for more variables

Part 1:

Part 2:

26 views0 comments

Recent Posts

See All


bottom of page