JOUR 72312 Data-Driven Interactive Journalism

Fall 2015
Wednesdays, 2:00-5:15PM
Room 442

It isn’t hyperbole: journalists today have access to more data than ever before, as well as better tools to understand that data and expose the stories buried in the numbers. From election results, budgets and census reports, to Facebook updates and image uploads, journalists need to know how to find stories in data and shape them in compelling ways. This hands-on course teaches you to gather and analyze data, and visualize interactive data-driven stories. This burgeoning discipline touches on information and interactivity design, mapping, graphing, data analysis, and a bit of code. Participants are expected to pitch, report, and produce stories working alone and in teams. You’ll learn to use spreadsheets and online Web tools such as CartoDB, Refine, and HighCharts, and integrate them in a non code-intensive development environment. Familiarity with HTML/CSS is helpful, but not required. This is not a course in programming, but we will be dealing with some code.

Course objectives

This three-credit course explores complex storytelling using data. Students will pitch, report, conceptualize, design, and produce informative and compelling data-driven pieces. The course emphasizes:

Course outcomes


Jonathan Stray,
Jonathan Stray is a freelance journalist and a computer scientist. He teaches computational journalism at the Tow Center for Digital Journalism at Columbia University, and leads the Overview Project, an open source visualization system to help investigative journalists make sense of very large document sets. He was formerly the Interactive Technology Editor at the Associated Press, a freelance reporter in Hong Kong, and an algorithm designer for Adobe Systems.

This syllabus was developed by Russel Chun, and used with his kind permission.

WordPress, Etherpad

You will upload your stories to this blog. You can set your posts to private if you like, so that only the instructor and other students can see them. Students are encouraged to submit superior and/or timely work for publication elsewhere, including school outlets such as the New York City News Service.

Links to everything we look at in class:

Software Requirements


Readings are available on e-reserve from the Research Center. We will be reading from:

Readings: and look for instructor “Stray”
Password: datadrivef15


Your grade is determined by three factors: participation, successful completion of all solo homework assignments, and successful completion of the two team stories and the one solo story. Your participation includes attending all classes, being active in discussions, workshops and critiques, presenting your story for the Data Festival, and participating in all in-class hands-on activities. Your assignments will be evaluated in terms of use of data, story and context, interactivity, and design.

Participation : 20%
Homework assignments: 20%
Story 1 (team): 20%
Story 2 (solo): 20%
Story 3 (team): 20%

Grades for your two team stories are further broken down as follows:

Pitch (25%)
Storyboard (12.5%)
Draft (25%)
Final (25%)
Revision (12.5%)

Grades for your individual stories are further broken down as follows:

Pitch (20%)
Final (40%)
Revision (40%)

This means that if you complete a brilliant story but don’t put real effort into the initial pitch or rough draft, you can’t get better than a C on the story. All assignments are due at the beginning of class. E-mail all assignments directly to the professor with “Homework Week X” in the subject line, where X is number of the week. If we can’t find your homework because an incorrect subject line, you won’t get credit for it. What do I mean when we say “Pitch” or “Rough draft”? This is what I mean:

Pitches: A complete pitch should tell us who cares, why we care now, and what pre-reporting you’ve done. You must include:

Storyboards: A storyboard organizes your content conceptually and spatially. This semester, when you turn in storyboards, you should also include a revised pitch. We use wireframe and storyboards interchangeably here. We’re looking for a simple sketch (on paper, in Word, or PowerPoint, Illustrator, or any number of online storyboarding tools) that shows us how you intend to integrate your visualizations, words, and navigation elements. Use simple boxes to tell us where your different elements will be positioned in a design, and how a user will navigate through the content. Check out Mark Luckie’s thoughts on sketching/storyboarding, with examples, from 10,000 Words.

Rough Drafts: A rough draft does not have the polish of a final project, but it should be close. You should have created all the visualizations that you plan to use. Your classmates should be able to evaluate a rough draft on its merits, without a guided tour of forthcoming features. A complete rough draft includes:

Final Story: Your story must be posted to the class blog. The blog will be kept private to keep rough drafts private. If you wish to host your final story elsewhere, you may, but you still need to post a headline, excerpt, image and link to the class blog. Plagiarism It is a serious ethical violation to take any material created by another person and represent it as your own original work. Any such plagiarism will result in serious disciplinary action, possibly including dismissal from the CUNY J-School. Plagiarism may involve copying text from a book or magazine without attributing the source, or lifting words, code, photographs, videos, or other materials from the Internet and attempting to pass them off as your own. Please ask the instructor if you have any questions about how to distinguish between acceptable research and plagiarism. Copyright In addition to being a serious academic issue, copyright is a serious legal issue. Never “lift” or “borrow” or “appropriate” or “repurpose” graphics, audio, or code without both permission and attribution. This applies to scripts, audio, video clips, programs, photos, drawings, and other images, and it includes images found online and in books. Create your own graphics, seek out images that are in the public domain or shared via a creative commons license that allows derivative works, or use images from the AP Photo Bank or which the school has obtained licensing. If you’re repurposing code, be sure to keep the original licensing intact. If you’re not sure how to credit code, ask. The exception to this rule is fair use: if your story is about the image itself, it is often acceptable to reproduce the image. If you want to better understand fair use, the Citizen Media Law Project is an excellent resource. As with plagiarism, when in doubt: ask. Deadlines Deadlines on assignments – as in any newsroom – are sacrosanct and should not be missed without exceptionally good reason, and only when your instructors have been notified in advance. If you are taking the course for credit, late assignments will be assessed a one-half grade penalty for every day overdue. Absences and Tardiness Participation and attendance are important ingredients to your success in the class, especially in this course where your major assignments are team-based. Please be on time for class and back to class from breaks. Repeated tardiness will result in a reduction of grade in participation. Notify the instructors of any absences before class, or as soon as you know you will be out.


Lecture:What you can expect from us Homework:What we expect from you (due)
  1. 9/16 Course intro. What is data?
  2. 9/30 Numeracy, spreadsheet review
  3. 10/7 Cleaning data, adv spreadsheets
  4. 10/14 Graphical encodings: charting
  5. 10/21 Geographic encoding: mapping
  6. 10/28 Presentation, Information design, Ethics
  7. 11/4 Critique of Story 1
  8. 11/11 Open Workshop
  9. 11/18 Adv tools: SQL, Interactive tables
  10. 11/25 Open workshop
  11. 12/2 Critique Story 2, decide on class project
  12. 12/9 Pitches for class project
  13. 12/16 Critique of Story 3, wrap-up
  1. Data journalism pre-reading/viewing
  2. Datasets
  3. Spreadsheet exercise
  4. Clean a dataset
  5. Pitch 1 team project
  6. Rough draft 1
  7. Story 1 final
  8. Revisions 1, Pitch 2 solo project
  9. Story 2  draft
  10. Story 2  final
  11. Pitch 3 class project
  12. Rough draft 3
  13. Story 3 final


Festival of Data: Every week one student will choose a data driven story to present in class. Prepare to discuss the strengths and weaknesses of the story, the authors’ use of data as well as their use of interactivity, and to identify the underlying technology.


Suggested readings:

Due on the first day of class: Watch Geoff McGhee’s Knight Fellowship Report on Data Journalism at

1 | Defining and Finding Data Course introduction (expectations, syllabus review) What is data, what are data stories? Reactions to McGhee’s data journalism video report. Data Viz Pre-test. Discussion: work in groups to evaluate recent data driven stories. Discussion: Looking for data, where to look and how to look?

HOMEWORK: Find two datasets that interest you. Tell us who maintains it, where the data can be found (the URL) and in 1-2 sentences explain why the data is interesting. Read Cairo: The Functional Art, Reading part 1: pages 25-31, 36-44, on thinking through a visualization as a tool for the reader; what graphical form best serves the goal? On e-reserve in the Library

2 | Finding the Story in Your Data Discuss homework: Problems, challenges, solutions, Discuss: provenance and staying organized Spreadsheet review: data types, rows and columns, sorting, copy and paste, selections, formulas. Review Pivot tables. Conditional formating. In-Class Exercise: Using spreadsheets and pivot tables on the Titanic data.

HOMEWORK: Spreadsheet assignment. See Homework page for details.

3 | Cleaning Data Cleaning data and advanced spreadsheets Open Refine and common spreadsheet formulas: split, concatenate, unique, countif, sum. In-Class Exercise: working with Refine to clean data. Follow the steps in this handout on this data file: universityData.csv

HOMEWORK: Clean a dataset (I will provide you with one) with Refine and tell us your findings in a nutgraf. Email your cleaned dataset and nutgraf under the subject “Homework Week 3”. See the “Homework” page for more details.

4 | Graphical Encodings/Charting Data Discuss homework Discuss anatomy of a news-chart: all the little pieces Chart types – what they’re good for, what they aren’t. Cleveland and McGill’s findings on readability of chart types Excel to Illustrator, SVG charts, Chartbuilder, Google Charts, Raw, Charted. Pitching a story: what we expect, what you’re thinking. Choose teams for your first story.


  • Team – pitches for your first story. A complete pitch should tell us who cares, why we care now, and what pre-reporting you’ve done. Review the section (above) in the syllabus on what is expected in a pitch.
  • Pitches must be posted to the class blog, in the “story 1 pitches” category by Wednesday morning.
  • Read Chapter 4: Choose Appropriate Visual Encodings in Designing Data Visualizations by Steele and Iliinsky (in Library)
  • Read Cairo: The Functional Art, Reading part 2: pages 118-129, on Cleveland & McGill’s perceptual accuracy

5 |  Geographic encoding/Mapping Data Discussion: Looking at map examples In-Class Exercise: Mapping with CartoDB. Geocoding, Shapefiles, Fusing two data sets, customizing infoboxes, colors, using filters. SQL for selectors. Workshop: Pairs of teams work with each other to discuss pitches.


  • Read Cairo: The Functional Art, Reading part 3: pages 73-86, on presentation
  • Team – Rough drafts of your first story: Rough Drafts: A rough draft does not have the polish of a final project, but it should be close. You should have created all the visualizations that you plan to use. Your classmates should be able to evaluate a rough draft on its merits, without a guided tour of forthcoming features. Post the draft on the blog. A complete rough draft includes:
    • Clean data in spreadsheets, already normalized, sorted, manipulated
    • Visualizations of the data with proper labels, keys, and/or legends
    • Captions
    • Credits
    • A headline
    • At least three hyperlinks to other reporting, integrated in the body of your text that puts your story in a broader context.
    • Text that incorporates reporting from at least one human source. You’re not required to quote your source, but you do need to be able to tell the class what insights your human source provided.

6 |  Presentation, Information design, ethics Integrating the presentation: Annotating the data, Design, and Interactivity Discussion: storyboards. Intentional use of space Discussion: Principles of design – grids, hierarchies, color, typography, white space, scale, repetition, consistency. Discussion: Ethics, avoiding distortion, responsible presentation of data What do we expect in a “rough draft”?


  • Team – post final story on the class blog in the category “Story 1 Final”. See the rest of the checklist to ensure that your story is complete (See Checklist page)
  • Read selections from Tufte, Quantitative Display of Information, on e-reserve in the Library: pages 91-105, 176-190.

7 | Critiques Critique our first finished data stories. Assign Teams for Second Story. Discuss solo story ideas.


  • Story 2, solo – pitch a story to be completed in one week as individuals.
  • Team – Revise your first story.

8 | Advanced tools Discussion: Revisions and solo pitches In-class exercise: working with CartoDB and styling maps with CSS. HighCharts, Mr. DataConverter, and understanding different data formats. Using Odyssey.js for storytelling with maps.

HOMEWORK: Story 2, solo – Rough drafts.

9 | Critique and  Interactive Tables

HOMEWORK : Story 2, solo – final story.

10 | Pitches for class project

HOMEWORK : Story 3 team– pitches

11 | Open Workshop

HOMEWORK : Story 3 team– rough drafts

12 | Open Workshop

HOMEWORK : Story 3 team – post final story on the class blog in the category “Story 3 Final”. See the rest of the checklist to ensure that your story is complete.

13 | Critique of final story
Fill out student evaluations