top of page
Search

# The data and the model

Let’s say I want to predict the number of points the 49ers will score next week. As a sports fan what do you do?

You start thinking about their opponent, is it a home game, what’s their record, the QB rating, power rankings etc. A computer does exactly the same thing - the only difference is that, typically, a computer can pick up on very subtle relationships between these kinds of factors that humans just can’t process. Like perhaps, the number of rushing yards last week is directly proportional to the number of passing yards next week - or playing an away game directly after a bye week is actually a recipe for success! These two relationships are made up for sake of example, but computers are able to find these relationships with much better accuracy than we are as humans.

The foundation of this predictability is the quality and quantity of data as well as the statistics and probability behind the scenes, called a model. Let’s define these terms more formally:

Data is simply facts, statistics, or information. Data could be numeric, it could be categorical, text, images, videos - almost everything has data attached to it!

A model is simply a standardized set of patterns and relationships between data. For example, if you look at the relationship between height and weight, typically the taller someone is, the more they weigh as well [“typically” is underlined here because, though not always true, the relationship between these two variables is a sort of model to follow].

So in order to create a model that works well, we need to “teach” a computer about the relationships that exist between our variables. There are a lot of ways to do this in machine learning including regression, decision trees, neural networks etc. At the end of the day, they all try to accomplish the same thing: to predict an outcome given inputs.

For our height and weight example, let’s say we have two data points:

• Amber is 5 feet tall and weighs 100 pounds

• Bob is 6 feet tall and weighs 200 pounds

A computer might look at this data and say - “okay, when someone is 1 foot taller, they weigh 100 pounds more!” Great, that is a reasonable relationship the computer came up with given the data. So when you ask the computer, “how much will someone weigh if they are 7 feet tall?” the computer will tell you that they will weigh 300 pounds. Not a bad prediction.

However, let’s introduce another data point:

• Connor is 5 feet tall and weighs 150 pounds.

Now the computer has to change its guess - it now has data on two people that are 5 feet tall with different weights, and this could throw off the accuracy. One way to help the computer create better predictions is to give it more data! If we tell the computer that Amber is a woman and both Bob and Connor are men, the computer now has additional data to try to predict new weights based on these two variables.

This example can easily be extrapolated to our football example. We are trying to predict NFL scores using team statistics. So we teach the computer relationships between statistics and score outcomes so that the model can predict it’s best guess for the score.

That is the foundation of machine learning and statistical analysis - input data and get the best prediction possible. The better the predictions, the better the model and then there is a way to create value!

In the example above, when we introduced the model to more data about Amber, Bob and Connor, the computer actually learned and improved its predictions. This is analogous to the model being fed new team statistics that I pull each week. In theory, as the model is given new data each week, it can learn more from this information and create more accurate predictions.

Of course, this is all based on statistics and probability! There is always a chance that everything is absolutely correct or everything is wildly off. At the end of the day, there is so much randomness and factors that contribute to outcomes of NFL games - things that cannot be 100% predicted. But it’s about finding that slight edge and making something happen with it.