I created the first iteration of Vorped 6 years ago. At the time, basketball data seemed underutilized: shot chart data existed in pockets, and play-by-play text were widely available but lacked real insight about what happened in a game. Units data was also hard to come by.
Nowadays, many resources exist to find this data. NBA’s stats website now provides all this information (but dear god is it difficult to use). Awesome other efforts, like Nylon Calculus, NBA Savant, and the variety of parsing libraries for the hidden nba.com API, all enable the average NBA fan to be more informed about the league. I began wondering whether Vorped needed to exist anymore.
Within that same time period, my career has evolved, my challenges have changed, and my interests have shifted. And to be honest, Vorped hasn’t been as useful as I imagined, even if it’s just my nightly side project. Feeling a lack of accomplishment, I considered shutting this site down.
But here’s the thing: I still have fun running Vorped, and I’m still learning a TON in the process.
About 4-5 months ago, I decided not to shut the site down, but instead invest a little more effort into it. With that said, I want to share what I’ve been working on, and what I hope to accomplish with this website going forward.
A short history on Vorped, and what’s wrong with it today
I’m a data analyst, not a programmer.
When assessing what’s bad (and good) about Vorped today, I come back to this basic fact. Vorped was optimized for performing data analysis, not for scale or reliability or anything that a competent software developer would care about.
When I first started out, my choices were driven by naive logic. This was the exact logic:
- I need data. I’ll alter a bunch of Python scripts I found on the internet, adapt it to scraping basketball data websites, and run it at 11pm Pacific every night.
- I need access to data. I know Excel and some PHP, but Excel is kinda annoying. I’ll make a PHP app on my laptop.
- Wow, this is sorta useful… I bet other people can use this app. How can I give them access? I’ll buy (cheap) shared web hosting, and port my Python scrapers and PHP app onto those servers.
To my surprise, this worked pretty well. It was a good minimum viable product (or MVP, to annoying Silicon Valley types like me), where a few people told me they found it useful, and where I didn’t invest inordinate amounts of time or effort.
Yet, products rarely remain minimum: people always want more features, me included.
To satisfy that hunger for more, I began tacking on additional features onto this simple application. Over time, I built features like feeds, game flows, game loggers, and automated recaps. All interesting ideas. But, not many of those ideas proved useful, and all of the code supporting those ideas were terribly written, and a huge pain to maintain or improve.
As I struggled to produce features that could keep up with my growing aspirations, my rate of analysis fell off a cliff (as of this writing, I had not written a new blog analysis in over 3 years). Increasingly, I was solving software problems, not analytical problems. This made me feel conflicted. If I couldn’t produce analysis myself, while also promoting a tool that supposedly enables others to do analysis, would I have any credibility? Was I becoming a fraud of an analyst, the type of faux intellectual that this website aimed to challenge and discredit?
No. The real problem was that I was still building only for me, not for others. I expected that if I repeat the 3-step MVP process over and over again, based solely on what I need, things would work fine. I began realizing that MVP is necessary but not sufficient, and that there are at least 2 more steps to delivering a solid tool for others:
- How do I keep this app running day-to-day?
- How quickly can the app recover when something unexpected happens?
Note how none of this concerned basketball analytics. If I wanted to truly create a useful tool, I would need to commit to solving problems I didn’t initially set out to tackle. Software problems, not basketball analytics problems. Others’ problems, our problems, not necessarily my problems. To move forward, I needed to let go of my needs. So I put aside basketball analytics, and decided to directly address these software problems.
Vorped and the cloud
I want to make a better basketball analysis tool. But first, it’s important to define what “better” means. And for me, “better” means:
- Able to capture and analyze data, for any basketball league
- Able to ask quick, deep, ad hoc questions of the data, and share it easily to the world
- Not too expensive
- Flexible enough to adapt to product features I haven’t yet imagined
- Minimizes ops drudgery (i.e. provisioning machines), but enables responsive to process failures
The biggest hurdle was point 2. It turns out, cheap $10/month shared web hosting machines weren’t optimized for supporting my ad hoc, memory-intensive SQL queries. If I needed the ability to execute these kinds of queries, I required a different solution.
To achieve this, I decided to move my database to Amazon Web Services (AWS). As a test, I pointed Vorped at this AWS’s relational data service (RDS) during the NBA offseason. And to my surprise, it worked admirably. Moving to RDS helped me achieve points 2) and 5) above: RDS minimized ops drudgery (fewer failures, automated backups, scheduled updates), taking care of areas of expertise I had no interest in developing. Ad hoc questions also became less painful to do. With automatic snapshots of the database, I could quickly create a copy of the database, provision a new database instance, and hammer the living hell out of it.
And when I dug deeper into AWS and all the other tools it provides, I realized I could accomplish a lot more with AWS if I re-thought the architecture of my scripts and apps.
After taking a step back and examining the intricate tapestry of interconnected logic and other crap that were my scripts, web apps, and databases, I realized I could break down Vorped into a handful of logical applications, each with clearly-defined purpose and relationships with the other apps.
If you’re curious, the 4 major applications are:
- League Manager (metadata about leagues, teams, and schedules)
- Scraper (manages getting and conforming data from external sources)
- Core (your typical “data warehouse” where aggregates are calculated and against which ad hoc queries are executed)
- Consumer (web app, the thing you’re probably looking at right now)
The past 4 months have been a standard exercise in decoupling logic, and fitting them into the AWS ecosystem. Much of this work heavily relied on Lambda, a relatively new tool that enables you to process data in response to an event (like a game ending), without having to worry about managing the machines that run those processes. With this, Vorped now updates game data and shot charts about 30 minutes after a game ends. And best thing about Lambda: it’s pretty inexpensive (I’m a cheap person).
This decoupling ultimately helps me achieve points 1) and 4) above. Managing data processing across different leagues (with different rule sets and court dimensions and period counts) are much easier to maintain.
This flexibility also makes it easier to iterate on new statistics, new visualizations, and even integrate new datasets to the website. For example, nba.com redesigned their website (again) in early December 2016, changing how they serve certain types of data. I was able to adapt to the new data within a few nights, all in my spare time after my day job.
The one downside: this all costs more than I’d prefer to spend (point 3). It’s not that expensive, really (it costs less than the gym membership that I
rarely never use), but again, I’m cheap.
Is this better?
Technically- and egotistically-speaking, one could describe this updated system as a service-oriented architecture utilizing serverless computing to process streaming big data in the cloud to enable machine learning scenarios. Sounds impressive, it has to be objectively better, right?
Well, it’s definitely more complex than before. On the downside, I can’t just run the same scripts on my laptop anymore. I have to think a lot more about the meta-processes that glue everything together. The dev workflow feels weird still. And dealing with Lambda without having direct control over the computing has its own quirks. Thing are more scalable, more fault-tolerant, and I have more out-of-the-box monitoring, so there are a ton of benefits. But better?
I suppose the definition of “better” depends on what your goals are. If I were trying to perform analysis for myself only, this system would be unnecessarily complex and costly, definitely not “better.” But in letting go of my own analytical aspirations, I hope I’ve set up a system that can be useful for you today, and be adaptable enough to be useful for you months from now. From that perspective, yeah, it’s probably better.
I’ve come to understand that making a tool for yourself is much, much, much simpler than making a tool for everyone, by at least an order of magnitude. In any product, very little effort goes into the things the users see and touch. Rather, most of the effort goes into the sub-tools and sub-systems behind the scenes that ensure the tool can run, and keep running. Put another way, the tools that build the tool are far more important than the tool itself. If you don’t believe me, maybe you’ll believe this guy. He stole that idea from me /s.