The Signal and the Noise: Breaking Down NBA Box Scores (Part 3)

Now it is time to conclude this three-part story arc with an analysis on NBA centers.  I specifically looked at all centers (some of whom are also listed as “Forward-Centers”) who averaged at least 20 minutes per game during the 2013 regular season, a sample size of 36 players.

Understanding Key Relationships

As I started exploring the correlations between key player stats (i.e., field goal percentage, points), I decided to first establish some intuition around how influential certain stats were in terms of determining plus/minus scores for centers.  I started by using a multiple linear regression with plus/minus score as the response variable and all of the key player stats as the explanatory variables that could potentially influence a player’s plus/minus score.  For those interested in the specific details of this analysis, please click here.

From this model, I found that blocks, number of seasons with current team, and turnovers make a really big difference in determining plus/minus scores.  These were the only statistically significant inputs for the regression.

Given that there were only three statistically significant inputs for the multiple regression model used to predict plus/minus scores, I was curious how a model with only blocks, number of seasons with current team, and turnovers as inputs would fare in comparison.  Let’s call the original model (with 12 inputs) the full model, and the other model (with just 3 inputs), the reduced model.  Both of these models are nested, since they contain the same terms but the full model has at least one additional term (in this case it has 9 more).

So how do we decide whether the more complex (full) model contributes additional information about the relationship between plus/minus scores and the player stat inputs?  Luckily, since we are dealing with two “nested” models with the same response variable (plus/minus score), we can use an F-test.  The p-value from this comparison was 0.12 (larger than our previously determined alpha-level of 0.10), implying that the reduced model is NOT significantly worse than the full model.

Now, let’s try to understand why blocks, tenure with current team, and turnovers seem to have such a strong impact on plus/minus scores, while other key stats like field goal percentage and rebounds do not.

1. Blocks are out of this world. 

Though blocks have a moderately strong correlation with plus/minus scores (+0.41), they do not seem to trend strongly with other key box score stats.  Surprisingly, blocks have almost no correlation with points (+0.01), rebounds (+0.05), and turnovers (+0.03).  In addition, blocks have a weak negative correlation with assists (-0.21).

This suggests that blocks seem to capture either a specific style of play or characteristic of a player that is absent in other core box score stats.  In addition, blocks and personal fouls have a slight positive correlation (+0.29).  Anecdotally, I have noticed that those who rack up more blocks tend to be more aggressive players, so this is not entirely surprising.  Here is a player profile looking at blocks v. plus/minus scores among big men in the league:


2. Turnovers and number of seasons with current team suggest a high-volume style of play.

Turnovers and team tenure are relatively strong signals for points, with correlations of +0.62 and +0.38, respectively.  Turnovers also trend strongly with assists (+0.58), average minutes played (+0.67), and fouls (+0.43).  Number of seasons with current team has a strong correlation with assists (+0.55) and offensive rebounds (+0.46).

This makes sense intuitively because those who have a long tenure with a team tend to be stars locked up in longer contracts.  This metric is also strongly correlated with the number of years that a player has been in the league, so experience could play a role.  It is also more likely for that player to have stronger chemistry with his teammates and coach.  In addition, players who play more minutes and get a large number of touches over the course of a game (generally the stars of a team) are more susceptible to turnovers.  Thus, these two inputs do not in themselves suggest a player’s value but they trend so closely with a combination of key stats that they end up being useful (though potentially confounding) factors.  Here are two player profiles looking at turnovers and team tenure v. plus/minus scores among big men in the league:

      Image    Image

Visualizing Player Performance 

Given these interesting relationships (or lack thereof) between key box score stats, it is difficult to visualize individual player performance in an objective way.  Earlier in this post, I focused on understanding how key stats were moving plus/minus scores.  However, from my previous analyses on guards and forwards, it was clear that plus/minus scores in the NBA do have some clear shortcomings (think Mario Chalmers).

From the previous section of this post, we know that blocks, number of seasons with current team, and turnovers are the leading indicators for plus/minus scores among centers.  So, why don’t we focus on just these three player features and try to extrapolate player performance, independent of plus/minus scores?

Since these stats are highly variable from player to player, I decided to “cluster,” or put players into groups based on how similar they are across the three aforementioned dimensions.  (For a quick primer on the k-means clustering approach I used, click here.)

I ended up with four clusters (shown below).  I color-coded each player by whether they were an All-Star in 2013 (green), in the 2013 Playoffs (beige), and all “others” (red).





Cluster A is a group of players who tend to have: HIGH blocks, HIGH turnovers, HIGH team tenure.  This cluster has a majority of the centers who were 2013 All-Stars and the rest are strong players on playoff teams.  This is a set of some of the most talked about big men in the league.

Cluster B is a group of players who tend to have: LOW blocks, HIGH turnovers, HIGH team tenure.  LaMarcus Aldridge is the only 2013 All-Star in this cluster, but there are a handful of players who were in the 2013 Playoffs.  The “others” (red), for instance Kevin Love and DeMarcus Cousins, are pretty solid players who were not on playoff-contending teams last season.

Cluster C is a group of players who tend to have: MED blocks, LOW turnovers, MED team tenure.  Most of the players in this cluster are either up-and-coming stars or solid players who get decent minutes but are not stars.

Cluster D is a group of players who tend to have: LOW blocks, LOW turnovers, LOW team tenure.  With the exception of Tyson Chandler, who is a monster on the boards and an above-average shot blocker, most of the big men in this group were under-performers last season.

It is interesting that the clusters described above seem to put players into groups that are consistent with the conventional wisdom of an average fan.  What is even more surprising is that the inputs for the clustering were three unintuitive features.  The output from my clustering analysis gives me some more confidence about the importance of blocks, number of seasons with current team, and turnovers when evaluating the overall performance of a center.  These key features seem to provide a strong signal that is worth paying attention to.  They also provide a stroke of insight that can often be lost in all of the noise of a standard box score.


3 thoughts on “The Signal and the Noise: Breaking Down NBA Box Scores (Part 3)

  1. Cool stuff. I am surprised that points did not make it in the top 3. Number of seasons a player is on a team is a hard one to wrap my head around. It is an interesting tracking variable for a descriptive analysis such as this, though.

  2. Really great analysis! I would not have thought to include a non-traditional metric like team tenure. My one hesitation is that team tenure is one of those things that is pretty confounding, as you mentioned, and it would probably not be useful to find undervalued players. I would be interested to see how you would pivot this to be predictive rather than descriptive.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s