Here I’ll show some of the data analysis stuff I’ve done. I’m still pretty amateur, but fortunately the field of data journalism is young, so I’ve been able to hack it so far. Hopefully I’ll get better at programming and we’ll see some pretty cool, sophisticated stuff appear here.

I mainly work in Excel MVB and Python. I’ve done stuff in R, Stata, SPSS, and ArcGIS before, but I’ve gotten rusty.


Stolen Cars in Delhi

My first major data mining project, I used Excel MVB to scrap Delhi police servers to compile data on stolen cars in Delhi in 2014, including neighborhood, make, model, time stolen, and all that jazz. Pretty interesting stuff. Even cooler, when my reporter partner Atul and I went to Meerut, a nearby city notorious for being a hub of stolen car parts, we met a former police chief there who requested access to the data for his own investigative work!

Fast Statistics - Stolen Cars in Delhi-page-001



I made five staTOIstics, which are neat little inserts in the Times of India that graphically depict useful stats for our readers. Here’s a few samples.

Statoistics 2014.12.6 A Smoother Ride-page-001                                     Statoistics 2014.12.3 The Air We Breathe-page-001                                    Statoisitcs 2014.12.4 Size in the City-page-001



Pollution Comparison in India

When I got interested in air pollution data in India, I was shifting through some global figures from the WHO when I realized that there was enough data to do a comparison between cities in India alone (some 125 data points). The results were startling, and pretty interesting – North Indian cities are far more polluted than South Indian cities. Naturally, I had to do a little digging, and this article was the result! Through some stroke of dumb luck, it landed on the front page and was the lead story for the entire country that day.




Check out my on-going project text mining Islamic State documents at my blog!

Word Frequencies eBooks          Word Frequencies Dabiq