2.6. Follow the protocol: words, from manual curation to Tableau

Duration: 45 min

Overview tuto 2.6

Goals

  • Curate a list of words in a Google Spreadsheet
  • Check a new notebook to harvest the edit history from a semantic angle
  • Visualize the data you have harvested in Tableau
  • Check how the curation impacts the final result

Case

The topic of energy conversion using the same two Wikipedia articles as in the tutorial 1.2:

Data

Download these two CSV files:


wikipedia-articles-tuto-2.6.csv


This file just contains the list of the two articles.


words.csv


It just contains the list of 2,488 words mentionned in either articles. It was extracted from the text-enriched version of the article list passed into this 🍕 notebook. The text content of the articles was previously harvested by this other 🍾 notebook.

Protocol

We ask you to enact this protocol (notice that parts of it has already been completed to get to the list of words you downloaded above):

Overview tuto 2.6

  • Make a selection of 5 to 20 words from the list, with the following criteria:
    • Each word must be represented in both articles
    • The words should be as varied as possible (so that they make differences appear)
    • If there are too many words to pick, then those mentionned the most must be prioritized
  • Put these words in a small CSV file. It may look like this.
  • Extract the revisions of either article that contain either of your words.
  • Visualize that data in Tableau (no need to annotate). Use the same approach as in the tutorial 1.4.
  • Bonus: do you see a data-driven story in this visualization?

It might look like this:

Tableau

Remark: The curation of terms is different to that of the tutorial 1.4, so even if the case and the visualization approach is the same, the data-driven narrative is not the same.

Documents produced

Keep somewhere, for sharing, the following document:

  • The (unannotated) visualization (JPEG or PNG)

Next tutorial

Take a break before this:

 2.7. Extend the protocol: natural language processing (45 min)


Relation to the course readings

  • Thoughts and principles on query design are covered in Rogers, Richard. (2017). Foundations of Digital Methods: Query Design The Datafied Society: Studying Culture through Data, eds: M. Schaefer and K. van Es
  • The process of getting data through scraping, crawling and calling APIs is covered in Chapter 6: Collecting and curating digital records of Venturini, T. & Munk, A.K. (2021). Controversy Mapping: A Field Guide.
  • The intricacies of Wikipedia and the different ways in which the platform may be reappropriated for controversy analysis are covered in Weltevrede, E., & Borra, E. (2016). Platform affordances and data practices: The value of dispute on Wikipedia Big Data & Society, 3(1).