top of page
  • Writer's pictureKen Jee

Kaggle Project From Scratch Part 3


Notebook 1: https://www.kaggle.com/kenjee/kaggle-project-from-scratch

Notebook 2: https://www.kaggle.com/kenjee/analyzing-gender-and-earning-potential-in-tech


I do two main analyses in this video. First, I look build more advanced graphs to compare differences in skills and characteristics across data science roles using the kaggle developer survey data. I then create a function to make this task more scalable.


In the second part of the video, I use a few different techniques to determine if there is gender bias in data science particularly relating to salary.


In the notebook I:

- First visualize and normalize gender differences in the sample

- Run a multiple linear regression to understand which factors contribute most to earning potential

- Run a lasso regression to narrow variable set and try to quantify the extent gender impacts earning potential

- Run a random forest on same data to evaluate feature importance (A nonlinear model like this is a good check)

- Compare models for just subsets of women and men to hopefully normalize for more variables


Part 1: https://www.youtube.com/watch?v=r-DR9HBaipU&ab_channel=KenJee

Part 2: https://www.youtube.com/watch?v=KQ80oD_boBM&feature=youtu.be&ab_channel=KenJee

26 views0 comments

Comments


bottom of page