Data Laziness

Ridership on the New York City subways declined last year because, well, they’re not sure, really:

The [Metropolitan Transit] authority’s acting chairman, Fernando Ferrer, said on Thursday that several factors could be contributing to the decline: rising subway delays, the popularity of Uber and other apps, and weekend maintenance work that disrupts service.

“It may be all of the above,” Mr. Ferrer told reporters after an authority board meeting. “I’m very glad that our ridership is at historic highs. If it declines a little bit — and I’ve seen those numbers, and it’s a little bit — there is no reason for alarm.”

You want “reason for alarm?”  I’ll give you reason for alarm: the MTA’s chairman can’t be bothered to run a simple Excel spreadsheet.  Let’s call this “data laziness” and show you how easy it would be to get a more definitive answer.

Actually, Excel is a much more useful tool here

The greatest challenge that prevents organizations’ use of data lies not in the math–most of the math requires maybe eighth grade competency.  Rather, it lies in knowing a data opportunity when you see one.

Case in point, the MTA’s ridership data.  Although this gets into the weeds of urban transportation, you could easily apply the thinking I will share for marketing.

While ridership actually increased during the week, it actually declined 3% during the weekends, when NYC’s subways have more construction, hence delays.  They also feature more late-night parties when people take Uber rides.  So, what caused decline?

I don’t have the data, but here’s what I’d do if I did:

  1. Create a simple Excel spreadsheet with weekend ridership for each of the system’s 469 stations for 2015 and 2016; use a column format for easier scrolling.
  2. Create a simple conditional formatting logic to highlight cells in the 2016 column where ridership went down from 2015 to 2016
  3. Add two more columns to show number of delays (or percent of trips delayed) by station and one for construction days (or percent of time under construction) by station
  4. I’d eyeball the sheet to see whether the highlighted cells correspond to either stations with lots of delays or lots of construction days or both; if you’ve got a better idea for a function that would make correlations more accurately, please add in the comments below

From there, it’s basic sleuthing.  If the highlights correspond to delays or construction, then that’s probably your answer.  If the highlights correspond to both delays and construction, then you could probably surmise that construction causes delays and that drives people out of the subway.  However, if highlights correspond to neither, then you could go with your Uber thesis.

This analysis wouldn’t give the definitive answer, of course.  However, it would be a lot better than “it may be all of the above.”

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.