NBA Data Wants To Be Public

The NBA is making my life difficult. Yet again, they’re making it harder to programmatically access their data, rendering Vorped kinda useless.

If you’re curious how everything works, Vorped sources its data from nba.com/stats, periodically scraping data from the site as games complete. Underneath the fancy and unintuitive nba.com site are “links” to raw data files, and the collection of “links” are typically called an API, or application programming interface. APIs are a little more complicated than that, but think of it as a common language that allows computer programs anywhere in the world to share and update data with each other.

In mid-March, the NBA decided to make it more difficult for people like me to programmatically gather data from these undocumented APIs. I’m not the only one, as you can see from the comments on this conversation on Github.

Why would the NBA do this? I don’t know for sure, but I can speculate on some reasons why:

  • It’s expensive to serve up all that data, specifically paying for the bandwidth. The NBA website is likely accessed by millions of people per day, and paying the bandwidth costs to support all those requests adds up.
  • Perhaps it’s actually not that expensive to serve up the data, but third parties like me comprise the vast majority of requests, making those bandwidth costs a lot more expensive than they should be. Because scripts and bots can make exponentially more requests for data than the typical human physically clicking around a site, NBA’s bandwidth costs might be exponentially inflated.
  • Protecting ad revenue. Allowing others to replicate data found on nba.com means people can consume that data outside of nba.com. And if you don’t visit nba.com, you don’t notice all the advertisements for SAP, who seems to have paid to sponsor the site. It wouldn’t be great for SAP if they could no longer get exposed to all those eyeballs visiting nba.com.

From a short-term accounting perspective, this all makes sense for the NBA’s business. Reduce unnecessary costs, and protect existing revenue streams. The economist in me applauds the efficiency.

That’s great for the NBA, but terrible for me. Because instead of spending my time immersing myself in the data and understanding what’s going on with the league (aka being a fan of the league), I’m playing pointless cat and mouse games with the nba.com programmers. And it’s getting tiresome.

What’s most frustrating is there seems to be an obvious, straightforward technical solution that would make both me and the NBA happy: publicize the API, and rate limit it. Let people register with nba.com as a consumer of data, and limit how many data requests can be submitted over some length of time, say 10 requests per minute. All of which would help regulate these presumed runaway bandwidth costs while enabling the NBA to partner with entities that can extend the NBA’s reach into nontraditional areas of the population, now and in the future.

APIs are pretty common nowadays. Whatever tech company or product you can think of, they almost certainly have an API. Twitter, Facebook, Slack, Pinterest, Instagram, etc. While there are certainly limitations around how a person can use these APIs, companies understand the real value is in their data, and that these API developers act as partners/champions/ambassadors of their company, mavens who extend the reach and influence of their company’s data far beyond the boundaries of the company’s products, creating virtuous loops back to those products and benefiting companies in the long run.

For example, Twitter allows bloggers to embed tweets in their own blog posts, and it’s really simple. You don’t need a computer science degree to do it. Today, Twitter plays a central role in global public discourse, and their public APIs have no doubt helped them ascend to that standing.

Haven’t you ever wondered why we basketball fans couldn’t do the same simple embedding with box scores, or player stat lines, or shot charts? The nba.com site tries to provide this functionality, but it’s so unintuitive that it FEELS like you need a computer science degree to use it competently.

Twitter understands that you don’t have to be on a Twitter app or website to consume information residing in Twitter. Similarly, you don’t need to visit nba.com to consume NBA data.

And if you think about it, this has never been the case with NBA information. In the past we would get our NBA fix from ESPN, radio, newspapers, or the local TV station. Today, in addition to those traditional media channels, we can also choose to consume NBA information from blogs like SB Nation, discussion forums like Reddit, or on any of your preferred social networks.

Public APIs are nothing more than a modern, programmatic manifestation of this same idea. While NBA data live in a central place, public APIs allow that data to be consumed not just directly by human beings, but by other applications, websites, and computer programs, all of which ultimately get consumed by even more human beings, who aren’t necessarily the same people as those who directly consume.

Consumers today have so many choices where they consume information, so it makes sense to bring the data where they already are, rather than assume they’ll always come to where you are. Which the NBA has always done with highlights and analysis, just not with their raw data.

Consumers also have so many more choices than before in how they entertain themselves, and a public API allows the NBA to remain adaptable to whatever next disruption occurs with media, whether that be evolutions in consumer tastes or new ways of interacting with content or people. And chances are good that disruption will have something to do with synthesizing large volumes of data, as the rest of world becomes increasingly inundated with and driven by data and algorithms (in some cases, literally driven). APIs will likely play a key role in such a world.

Like all other consumers, I also have many choices in how I spend my time. Because of this increased, neverending friction in accessing NBA data, my interest in the NBA has waned, and I’m considering spending my time with other data sets in other domains of knowledge, where the utility and opportunity and challenges seem plentiful. It would be great to continue analyzing basketball data along with the rest of this awesome basketball statistics ecosystem, but the NBA’s current choices around data transparency are making that decision to move on from basketball more apparent.