Ranking College Football Coaches
Background
I’m an avid college football fan with a keen interest in data science. For my first dive into data science, I will use some basic statistics and data manipulation to evaluate coaching in college football.
Defining the Problem
Football is the crown jewel and breadwinner in any collegiate athletic department. A successful football program is not only a source of revenue for the university, it is also a powerful marketing tool, as studies have shown winning to be associated with increased donations, academic reputation, and lower acceptance rates. Therefore, it’s no surprise that coaches wield significant power and lofty salaries.
Coaches also carry the burden of high expectations, poor job security, and an average tenure of only 3.8 years. While the decision to hire or fire a coach is a multi-factorial process, it is primarily driven by their record. Consequently, a coach’s value, particularly with donors and fans, is inextricably tied to their performance on the field.
However, wins and losses are driven by several factors, many of which are outside a coach’s control, including conference affiliation, prestige, location, historical success, and financial investment from the university. As such, a coach’s record is not the most objective way to assess their value. In this project we will set out to define an approach for evaluating coaches while minimizing the impact of these confounding factors. Let’s get started!
Continue reading on Medium: Using Data Science to Evaluate Recruiting and Player Development in College Football
Code can be downloaded at: https://github.com/arsakhar/NCAAF
Technical Skills
- Data Visualization
- Histograms, scatterplots, qq plots, boxplots
- Data Scraping
- Scraping data across 4 different websites
- Data Cleaning
- Removing empty rows in dataframe
- Removing non-numeric / nan rows in dataframe
- Filtering dataframe by a specific keyword or attribute
- Data Manipulation
- Joining dataframes
- Transforming values on a dataframe column
- Applying a function elementwise on a dataframe column
- Filtering dataframe based on a specific attribute or keyword
- Aggregating on a dataframe column (standard deviation, mean, min, max, sum, count)
- Statistical Analysis
- Shapiro-Wilks test for normality
- Kruskal-Wallis non-parametric test for group differences
- Dunn post-hoc testing for pairwise comparisons
- Box-cox for transforming skewed distribution to a gaussian
Packages
- Beautiful Soup
- Scipy
- Matplotlib
- Pandas
- Numpy
- Scikit
- StatsModels