All assignments are due by the 12 noon on Wednesday (the day of class).  E-mail all assignments directly to the professor with “Homework Week X” in the subject line, where X is number of the week. If we can’t find your homework because an incorrect subject line, you won’t get credit for it.

Data Festival Schedule:

pilar.desha 10/14/2015
nuria.saldanha 10/21/2015
rajashree.chakravarty 10/28/2015
aliza.chasan 11/4/2015
maura.ewing 11/11/2015
kathryn.long 11/18/2015
anthony.kane 11/25/2015
christina.jensen 12/9/2015
maria.arcel 12/16/2015

Every week one student will choose a data driven story to present in class. Prepare to discuss the strengths and weaknesses of the story, the authors’ use of data as well as their use of interactivity, and to identify the underlying technology.

Participation : 20%
Your grade is determined by three factors: participation, successful completion of all solo homework assignments, and successful completion of the two team stories and the one solo story. Your participation includes attending all classes, being active in discussions, workshops and critiques, presenting your story for the Data Festival, and participating in all in-class hands-on activities. Your assignments will be evaluated in terms of use of data, story and context, interactivity, and design.

Grades for your two team stories are further broken down as follows:

Grades for your individual stories are further broken down as follows:

This means that if you complete a brilliant story but don’t put real effort into the initial pitch or rough draft, you can’t get better than a C on the story.

What do I mean when we say “Pitch” or “Rough draft”?
This is what I mean:
Pitches: A complete pitch should tell us who cares, why we care now, and what pre-reporting you’ve done. You must include:
a proposed title or headline
a news hook, or explanation of why this story matters now. Why should I care?
your nut that captures the essence of the story. 1-2 sentences only.
a description of and link to the data (which means you must already have your data)
one source you have already spoken with or at least three potential expert sources and your plans for reaching them
Storyboards: A storyboard organizes your content conceptually and spatially. This semester, when you turn in storyboards, you should also include a revised pitch. We use wireframe and storyboards interchangeably here. We’re looking for a simple sketch (on paper, in Word, or PowerPoint, Illustrator, or any number of online storyboarding tools) that shows us how you intend to integrate your visualizations, words, and navigation elements. Use simple boxes to tell us where your different elements will be positioned in a design, and how a user will navigate through the content. Check out Mark Luckie’s thoughts on sketching/storyboarding, with examples, from 10,000 Words.
Rough Drafts: A rough draft does not have the polish of a final project, but it should be close. You should have created all the visualizations that you plan to use. Your classmates should be able to evaluate a rough draft on its merits, without a guided tour of forthcoming features. A complete rough draft includes:

Final Story: Your story must be posted to the class blog. The blog will be kept private to keep rough drafts private. If you wish to host your final story elsewhere, you may, but you still need to post a headline, excerpt, image and link to the class blog.


WEEK 1: Finding Data

WEEK 2: Spreadsheets and Pivot Tables

Part 1:From the NYPD’s traffic data portal, the Motor Vehicle Collision Report set gives total collision numbers citywide and by borough.  The collision reports are further broken down into the contributing factors, type of vehicle, brand of vehicle and severity of injuries sustained (

Download the Excel spreadsheet here for December 2014 data:

Answer the following questions:

  1. How many total motor vehicle collisions occurred citywide in December 2014?
  2. Which borough had the most vehicle collisions? How many collisions?
  3. Which borough had the least vehicle collisions? How many?
  4. How many taxis were involved in vehicle collisions in Queens?
  5. What percentage of all vehicle collisions in Queens involved taxis?
  6. What percentage of all vehicle collisions in Manhattan involved taxis?

Part 2: The National Center for Injury Prevention and Control at the Centers for Disease Control maintains a database of fatal injuries in the U.S. Search their database by the following criteria:

You’ll find a link to “Download Results in a Spreadsheet (CSV) File” at the bottom of the page. Explore the data, find something interesting and tell us what you found. What sorts of differences (or similarities) are there when comparing ages, gender, or race in the past 11 years? (warning: there are some “blank” cells that will throw your data off; see if you can figure out what’s causing those blanks and how they affect your Pivot Table analysis.) Also, remember that you can not simply combine crude rate by averaging the numbers.

Look at how Race and Ethnicity are handled differently. You’re examining All Races, Non-Hispanic. What does that mean?

Your findings should be in the form of a short paragraph (1-3 sentences) that tell us why your data or finding is interesting and what you found. Imagine this text as the nut graf of your data story on your findings. Good luck!


WEEK 3: Cleaning and Exploring Data


  1. The NY State Board of Elections Campaign Financial Disclosure site ( lists all contributors to candidates for elected office. Andrew Cuomo already won his second term for Governor, but we’ll use his campaign finance data for this exercise.
  2. I’ll save you the time and hassle and provide you with a direct link to a Google spreadsheet that has all the records from 7/16/2013 to 8/29/2014. I also want to make sure everyone is working from the same dataset. The contributor information for Andrew Cuomo’s 2014 race is here ( Choose File>Download As>CSV.
  3. There are 3026 records. The data won’t need cleaning, but I want you to use the dataset to answer the following questions related to out-of-state influences: What percentage of Cuomo’s total donations (from this period) came from out-of-state? Which state, other than New York, gave the most, and how much? Who was the top donor from California? Hint: you’ll need to rely on the Excel functions that I talked about to parse out the state codes from the “Contributor” column. Then you can analyze the geographic distribution of the donations. Remember there is MID(), LEFT(), and RIGHT(), which can grab characters from the middle, left, or the right of a piece of text.


  1. The website maintains a database of casualties from the wars in Iraq and Afghanistan. Run by one person, the data has been cited and used by other news organizations. Here is a link to the data for US casualties as of August 2013(
  2. Choose File > Download As > CSV, and open the dataset in Open Refine to clean. The data is messy, and needs to be cleaned.
  3. Use Facet > Text Facet on the “Branch” column, and choose “Cluster” to see if their are any variations of labels. Do the same with the “Rank”. How many variations of “Sergeant” are there?
  4. Continue cleaning the data set and explore the data with Refine, Excel, and Pivot tables. Find something interesting in this data and put it in the form of a nutgraf.

E-mail me your answers to part 1 and your nutgraf for part 2 with the subject line “Homework Week 4” before class on Wednesday.

WEEK 4: Pitch a story

WEEK 5: Rough drafts

WEEK 6: Finals