Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

Let’s look at the data. There is no comprehensive data source on the socioeconomics of NBA players. But by being data detectives, by utilizing data from a whole bunch of sources—basketball-reference.com, ancestry.com, the U.S. Census, and others—we can figure out what family background is actually most conducive to making the NBA. This study, you will note, uses a variety of data sources, some of them bigger, some of them smaller, some of them online, and some of them offline. As exciting as some of the new digital sources are, a good data scientist is not above consulting old-fashioned sources if they can help. The best way to get the right answer to a question is to combine all available data.

The first relevant data is the birthplace of every player. For every county in the United States, I recorded how many black and white men were born in the 1980s. I then recorded how many of them reached the NBA. I compared this to a county’s average household income. I also controlled for the racial demographics of a county, since—and this is a subject for a whole other book—black men are about forty times more likely than white men to reach the NBA.

The data tells us that a man has a substantially better chance of reaching the NBA if he was born in a wealthy county. A black kid born in one of the wealthiest counties in the United States, for example, is more than twice as likely to make the NBA than a black kid born in one of the poorest counties. For a white kid, the advantage of being born in one of the wealthiest counties compared to being born in one of the poorest is 60 percent.

This suggests, contrary to conventional wisdom, that poor men are actually underrepresented in the NBA. However, this data is not perfect, since many wealthy counties in the United States, such as New York County (Manhattan), also include poor neighborhoods, such as Harlem. So it’s still possible that a difficult childhood helps you make the NBA. We still need more clues, more data.

So I investigated the family backgrounds of NBA players. This information was found in news stories and on social networks. This methodology was quite time-consuming, so I limited the analysis to the one hundred African-American NBA players born in the 1980s who scored the most points. Compared to the average black man in the United States, NBA superstars were about 30 percent less likely to have been born to a teenage mother or an unwed mother. In other words, the family backgrounds of the best black NBA players also suggest that a comfortable background is a big advantage for achieving success.

That said, neither the county-level birth data nor the family background of a limited sample of players gives perfect information on the childhoods of all NBA players. So I was still not entirely convinced that two-parent, middle-class families produce more NBA stars than single-parent, poor families. The more data we can throw at this question, the better.

Then I remembered one more data point that can provide telling clues to a man’s background. It was suggested in a paper by two economists, Roland Fryer and Steven Levitt, that a black person’s first name is an indication of his socioeconomic background. Fryer and Levitt studied birth certificates in California in the 1980s and found that, among African-Americans, poor, uneducated, and single moms tend to give their kids different names than do middle-class, educated, and married parents.

Kids from better-off backgrounds are more likely to be given common names, such as Kevin, Chris, and John. Kids from difficult homes in the projects are more likely to be given unique names, such as Knowshon, Uneek, and Breionshay. African-American kids born into poverty are nearly twice as likely to have a name that is given to no other child born in that same year.

So what about the first names of black NBA players? Do they sound more like middle-class or poor blacks? Looking at the same time period, California-born NBA players were half as likely to have unique names as the average black male, a statistically significant difference.

Know someone who thinks the NBA is a league for kids from the ghetto? Tell him to just listen closely to the next game on the radio. Tell him to note how frequently Russell dribbles past Dwight and then tries to slip the ball past the outstretched arms of Josh and into the waiting hands of Kevin. If the NBA really were a league filled with poor black men, it would sound quite different. There would be a lot more men with names like LeBron.

Now, we have gathered three different pieces of evidence—the county of birth, the marital status of the mothers of the top scorers, and the first names of players. No source is perfect. But all three support the same story. Better socioeconomic status means a higher chance of making the NBA. The conventional wisdom, in other words, is wrong.



Among all African-Americans born in the 1980s, about 60 percent had unmarried parents. But I estimate that among African-Americans born in that decade who reached the NBA, a significant majority had married parents. In other words, the NBA is not composed primarily of men with backgrounds like that of LeBron James. There are more men like Chris Bosh, raised by two parents in Texas who cultivated his interest in electronic gadgets, or Chris Paul, the second son of middle-class parents in Lewisville, North Carolina, whose family joined him on an episode of Family Feud in 2011.

The goal of a data scientist is to understand the world. Once we find the counterintuitive result, we can use more data science to help us explain why the world is not as it seems. Why, for example, do middle-class men have an edge in basketball relative to poor men? There are at least two explanations.

First, because poor men tend to end up shorter. Scholars have long known that childhood health care and nutrition play a large role in adult health. This is why the average man in the developed world is now four inches taller than a century and a half ago. Data suggests that Americans from poor backgrounds, due to weaker early-life health care and nutrition, are shorter.

Data can also tell us the effect of height on reaching the NBA. You undoubtedly intuited that being tall can be of assistance to an aspiring basketball player. Just contrast the height of the typical ballplayer on the court to the typical fan in the stands. (The average NBA player is 6’7”; the average American man is 5’9”.)

How much does height matter? NBA players sometimes fib a little about their height, and there is no listing of the complete height distribution of American males. But working with a rough mathematical estimate of what this distribution might look like and the NBA’s own numbers, it is easy to confirm that the effects of height are enormous—maybe even more than we might have suspected. I estimate that each additional inch roughly doubles your odds of making it to the NBA. And this is true throughout the height distribution. A 5’11” man has twice the odds of reaching the NBA as a 5’10” man. A 6’11” man has twice the odds of reaching the NBA as a 6’10” man. It appears that, among men less than six feet tall, only about one in two million reach the NBA. Among those over seven feet tall, I and others have estimated, something like one in five reach the NBA.

Seth Stephens-Davidowitz's books