Adjustments to the Model (Nov 6, 2020)

After a tough week of upsets, I decided to make some more adjustments and improvements to the model.


Throughout the season so far, the model has not accounted for the coach of the team. Of course coaching matters, but I hadn't thought of adding it to the model because I was too focused on the stats and numbers of gameplay and what happens on the field each game.


However, I had a call with a data-minded friend, and she asked me if I was controlling for the coach and I was like... "no but that's actually a super reasonable factor to add."


So I looked through some football pages and found that pro-football-reference has an easily scrapable way to get the coaches by season.


Check out the video below to see how I actually scraped the 2020 season as an example of how easy it is to scrape. If there is significant interest in web-scraping, I'm happy to make a more in-depth tutorial series on how to scrape (it's not as hard as it might seem).


Once I was able to scrape 2020 data, it was very simple to alter the code to scrape all the years since 1970.


So, sitting on top of all the coaching data, I then had to engineer the actual rows that I would be using. This is called data cleaning. The data basically showed the coaches record on the season, and for teams with multiple coaches, it would show the coach for the first few games and then the coach for the next games. So after about 1-2 hours of tweaking and engineering, I was able to get the data in a format where it showed the coaches name attached to the game number of the season that they coached.


With this new clean dataset, I was able to merge that to the existing dataset and retrain the model with this information, and the model improved by over 2% in prediction accuracy!! That seems like a small number, but it really is huge - every percent of prediction accuracy helps.

37 views1 comment

Recent Posts

See All