All assignments are due by the 12 noon on Wednesday (the day of class). E-mail all assignments directly to the professor with “Homework Week X” in the subject line, where X is number of the week. If we can’t find your homework because an incorrect subject line, you won’t get credit for it.
Data Festival Schedule:
Every week one student will choose a data driven story to present in class. Prepare to discuss the strengths and weaknesses of the story, the authors’ use of data as well as their use of interactivity, and to identify the underlying technology.
Participation : 20%
- Homework assignments: 20%
- Story 1 (team): 20%
- Story 2 (individual): 20%
- Story 3 (team): 20%
Grades for your two team stories are further broken down as follows:
- Pitch (25%)
- Storyboard (12.5%)
- Draft (25%)
- Final (25%)
- Revision (12.5%)
Grades for your individual stories are further broken down as follows:
- Pitch (20%)
- Final (40%)
- Revision (40%)
This means that if you complete a brilliant story but don’t put real effort into the initial pitch or rough draft, you can’t get better than a C on the story.
What do I mean when we say “Pitch” or “Rough draft”?
This is what I mean:
Pitches: A complete pitch should tell us who cares, why we care now, and what pre-reporting you’ve done. You must include:
a proposed title or headline
a news hook, or explanation of why this story matters now. Why should I care?
your nut that captures the essence of the story. 1-2 sentences only.
a description of and link to the data (which means you must already have your data)
one source you have already spoken with or at least three potential expert sources and your plans for reaching them
Storyboards: A storyboard organizes your content conceptually and spatially. This semester, when you turn in storyboards, you should also include a revised pitch. We use wireframe and storyboards interchangeably here. We’re looking for a simple sketch (on paper, in Word, or PowerPoint, Illustrator, or any number of online storyboarding tools) that shows us how you intend to integrate your visualizations, words, and navigation elements. Use simple boxes to tell us where your different elements will be positioned in a design, and how a user will navigate through the content. Check out Mark Luckie’s thoughts on sketching/storyboarding, with examples, from 10,000 Words.
Rough Drafts: A rough draft does not have the polish of a final project, but it should be close. You should have created all the visualizations that you plan to use. Your classmates should be able to evaluate a rough draft on its merits, without a guided tour of forthcoming features. A complete rough draft includes:
- Clean data in spreadsheets, already normalized, sorted, manipulated
- Visualizations of the data with proper labels, keys, and/or legends
- A headline
- At least three hyperlinks to other reporting, integrated in the body of your text that puts your story in a broader context.
- Text that incorporates reporting from at least one human source. You’re not required to quote your source, but you do need to be able to tell the class what insights your human source provided.
Final Story: Your story must be posted to the class blog. The blog will be kept private to keep rough drafts private. If you wish to host your final story elsewhere, you may, but you still need to post a headline, excerpt, image and link to the class blog.
WEEK 1: Finding Data
- Identify 2 datasets, tell us why they’re interesting
- Cairo: The Functional Art, Reading part 1: pages 25-31, 36-44, on thinking through a visualization as a tool for the reader; what graphical form best serves the goal?
- Send me a link to one of your FMS Interactive chart or map stories
WEEK 2: Spreadsheets and Pivot Tables
Part 1:From the NYPD’s traffic data portal, the Motor Vehicle Collision Report set gives total collision numbers citywide and by borough. The collision reports are further broken down into the contributing factors, type of vehicle, brand of vehicle and severity of injuries sustained (http://www.nyc.gov/html/nypd/html/traffic_reports/motor_vehicle_collision_data.shtml).
Download the Excel spreadsheet here for December 2014 data: http://www.nyc.gov/html/nypd/downloads/zip/traffic_data/2014_12_acc_excel.zip
Answer the following questions:
- How many total motor vehicle collisions occurred citywide in December 2014?
- Which borough had the most vehicle collisions? How many collisions?
- Which borough had the least vehicle collisions? How many?
- How many taxis were involved in vehicle collisions in Queens?
- What percentage of all vehicle collisions in Queens involved taxis?
- What percentage of all vehicle collisions in Manhattan involved taxis?
Part 2: The National Center for Injury Prevention and Control at the Centers for Disease Control maintains a database of fatal injuries in the U.S. Search their database by the following criteria:
Cause or Mechanism: Firearm
Census Region: United States (default)
Race: All Races (default)
Sex: Both Sexes (default)
Hispanic Origin: Non-Hispanics
Ouptut: Standard Output (default)
All Ages (default)
Use 2000 as the Standard Year (default)
You’ll find a link to “Download Results in a Spreadsheet (CSV) File” at the bottom of the page. Explore the data, find something interesting and tell us what you found. What sorts of differences (or similarities) are there when comparing ages, gender, or race in the past 11 years? (warning: there are some “blank” cells that will throw your data off; see if you can figure out what’s causing those blanks and how they affect your Pivot Table analysis.) Also, remember that you can not simply combine crude rate by averaging the numbers.
Look at how Race and Ethnicity are handled differently. You’re examining All Races, Non-Hispanic. What does that mean?
Your findings should be in the form of a short paragraph (1-3 sentences) that tell us why your data or finding is interesting and what you found. Imagine this text as the nut graf of your data story on your findings. Good luck!
WEEK 3: Cleaning and Exploring Data
- The NY State Board of Elections Campaign Financial Disclosure site (http://www.elections.ny.gov/recipientstext.html) lists all contributors to candidates for elected office. Andrew Cuomo already won his second term for Governor, but we’ll use his campaign finance data for this exercise.
- I’ll save you the time and hassle and provide you with a direct link to a Google spreadsheet that has all the records from 7/16/2013 to 8/29/2014. I also want to make sure everyone is working from the same dataset. The contributor information for Andrew Cuomo’s 2014 race is here (https://docs.google.com/spreadsheets/d/1R_q2JjtADQvPI94Q5p5XlFitjqBE99uFaQuxFaAKOis/edit?usp=sharing). Choose File>Download As>CSV.
- There are 3026 records. The data won’t need cleaning, but I want you to use the dataset to answer the following questions related to out-of-state influences: What percentage of Cuomo’s total donations (from this period) came from out-of-state? Which state, other than New York, gave the most, and how much? Who was the top donor from California? Hint: you’ll need to rely on the Excel functions that I talked about to parse out the state codes from the “Contributor” column. Then you can analyze the geographic distribution of the donations. Remember there is MID(), LEFT(), and RIGHT(), which can grab characters from the middle, left, or the right of a piece of text.
- The website icasualties.org maintains a database of casualties from the wars in Iraq and Afghanistan. Run by one person, the data has been cited and used by other news organizations. Here is a link to the data for US casualties as of August 2013(https://docs.google.com/spreadsheet/ccc?key=0AkQfGqQEgO0gdFUzMTNqRWdrci1XeGM4QnVyeWV0X2c&usp=sharing).
- Choose File > Download As > CSV, and open the dataset in Open Refine to clean. The data is messy, and needs to be cleaned.
- Use Facet > Text Facet on the “Branch” column, and choose “Cluster” to see if their are any variations of labels. Do the same with the “Rank”. How many variations of “Sergeant” are there?
- Continue cleaning the data set and explore the data with Refine, Excel, and Pivot tables. Find something interesting in this data and put it in the form of a nutgraf.
E-mail me your answers to part 1 and your nutgraf for part 2 with the subject line “Homework Week 4” before class on Wednesday.
WEEK 4: Pitch a story
- Team – pitches for your first story. A complete pitch should tell us who cares, why we care now, and what pre-reporting you’ve done. Review the section in the syllabus on what is expected in a pitch.
- Pitches must be posted to the class blog, in the “story 1 pitches” category by Wednesday before class.
- Read Chapter 4: Choose Appropriate Visual Encodings in Designing Data Visualizations by Steele and Iliinsky (in Library)
- Read Cairo: The Functional Art, Reading part 2: pages 118-129, on Cleveland & McGill’s perceptual accuracy
WEEK 5: Rough drafts
- Bring in rough drafts into class (post on blog by Wednesday morning) for feedback.
- Read Cairo: The Functional Art, Reading part 3: pages 73-86, on presentation
WEEK 6: Finals
- Final stories due. Link your Bootstrap story presentation from the blog by Wednesday before class
- Read selections from Tufte, Quantitative Display of Information, on e-reserve in the Library: pages 91-105, 176-190.