Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hacks / Hackers #8

Avatar for Thomas Preusse Thomas Preusse
February 19, 2015
130

Hacks / Hackers #8

Avatar for Thomas Preusse

Thomas Preusse

February 19, 2015
Tweet

Transcript

  1. • Part I: Workflows in the Newsroom • History of

    data journalism at NZZ • Part II: Two Examples of Working with Big Text Collections • 2014.nzz.ch: Chronicle of Major News Events • Kazakhstan Connection: Searching through Mail Messages • Outlook • Best practice in a perfect world • Q & A Table of Contents
  2. How it Works at The Guardian • We locate the

    data or receive it from a variety of sources, from breaking news stories, government data, journalist’s research and so on • Start looking at what we can do with the data • Tidy up spreadsheets • Perform calculations – is there a story or not? • Process the output
  3. In the Beginning... • Sylke Gruhnwald was the sole data

    journalist at NZZ • responsible for all things modern on the internet • Data Journalism, cooperations (eg. IXT, Open Data City), multimedia reports (Fukushima, Iouri, Texas)
  4. Which led to... • The team was augmented to two

    people • Still responsible for all things flashy and new • More newspaper editors became aware of what we do, more ideas started spreading (“can you make this interactive for me?”)
  5. This resulted in... • The creation of an interactive team

    • Visibility of Data/Interactive within the editorial team • Introduction of Art Direction / Production Team to multimedia storytelling (Grenzwächter Naef)
  6. Chronicle 2014 Brief: A year-end retrospective. Do something cool for

    online. Cover major events. Idea: We should investigate what we reported on. What does a year in NZZ articles look like? Are there hidden themes? Lets investigate our own data. Thesis: If we plot all NZZ articles over time grouped by topics, clusters will form around major news events.
  7. In natural language processing, latent Dirichlet allocation (LDA) is a

    generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. For example, if observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. Wikipedia
  8. x = topics y = date area = relevance to

    topic uniformly low opacity (~.05)
  9. x = topics y = date area = daily publication

    volume (sum article relevance)
  10. Special Thanks Susanna Rusterholz coordinating content, 3AM email correspondence Lynn

    Cherny introducing me to LDA, project advice and consultation Peer Teuwsen trusting us to experiment with his project Volodymyr Fertak proof reading my SQL queries NZZ Interactive design and support NZZ Editorial Staff all the text
  11. Kazakhstan Connection Brief: There are lots of emails on this

    website. We need to search through them!
  12. Ready-Made Tools NUIX • Commercial • Does Extraction • ProPublica

    uses it Overview • Open Source • Does Analysis • AP & Knight Foundation backed
  13. Advice • Extract everything • Meta, Attachments, etc. • Make

    everything searchable • convert everything to plain text • convert alphabet • include meta • Track and snapshot your source websites • Secure your communication (internally as well) • Signal on iOS (big update soon), RedPhone and Textsecure on Android • PGP for mails and files • Visualize
  14. How it Works at NZZ • Two people locate the

    data or receive it from a variety of sources, from breaking news stories, government data and from 200+ journalist’s ideas • Two designers and one developer try to implement multimedia reports, data driven stories and visualizations
  15. • too little manpower leads to a lot of frustration

    • data journalism only works as a team effort • editors can’t get their stories told • in an environment of expert knowledge, the role of data journalists needs to be defined • need for a thorough selection of stories / angles from the editorship • stories need to be visible to act as a reference for whistleblowers and other informants The Challenges
  16. In future, there might be a skilled data journalist in

    every department - or better yet: there will be no more departments and every story is visited from different angles, always including a data angle. «Ich bin überzeugt, der Begriff Datenjournalismus wird in ein paar Jahren wieder verschwinden – weil sich die Techniken in alle Sparten des Journalismus integrieren werden. Die Methoden und Tools des Datenjournalismus jedoch werden den Journalismus verändern, wie ihn auch die Erfindung des Telefons verändert hat. Man sagt ja heute auch nicht Telefonjournalismus, trotzdem arbeitet jeder mit dem Telefon. Übrigens auch der Datenjournalist.» Julian Schmidli, SRG Insider The Future