2.6. Follow the protocol: words, from manual curation to Tableau

Duration: 45 min

Goals

Curate a list of words in a Google Spreadsheet
Check a new notebook to harvest the edit history from a semantic angle
Visualize the data you have harvested in Tableau
Check how the curation impacts the final result

Case

The topic of energy conversion using the same two Wikipedia articles as in the tutorial 1.2:

Data

Download these two CSV files:

wikipedia-articles-tuto-2.6.csv

This file just contains the list of the two articles.

words.csv

It just contains the list of 2,488 words mentionned in either articles. It was extracted from the text-enriched version of the article list passed into this 🍕 notebook. The text content of the articles was previously harvested by this other 🍾 notebook.

Protocol

We ask you to enact this protocol (notice that parts of it has already been completed to get to the list of words you downloaded above):

Make a selection of 5 to 20 words from the list, with the following criteria:
- Each word must be represented in both articles
- The words should be as varied as possible (so that they make differences appear)
- If there are too many words to pick, then those mentionned the most must be prioritized
Put these words in a small CSV file. It may look like this.
Extract the revisions of either article that contain either of your words.
- Use this notebook: 🍱 Wikipedia words and articles to edit list with words.
- One input is the CSV article list you downloaded above
- The other input is your small list of words
- You should obtain a file like this one.
Visualize that data in Tableau (no need to annotate). Use the same approach as in the tutorial 1.4.
Bonus: do you see a data-driven story in this visualization?

It might look like this:

Remark: The curation of terms is different to that of the tutorial 1.4, so even if the case and the visualization approach is the same, the data-driven narrative is not the same.

Documents produced

Keep somewhere, for sharing, the following document:

The (unannotated) visualization (JPEG or PNG)

Next tutorial

Take a break before this:

2.7. Extend the protocol: natural language processing (45 min)

Relation to the course readings

Thoughts and principles on query design are covered in Rogers, Richard. (2017). Foundations of Digital Methods: Query Design The Datafied Society: Studying Culture through Data, eds: M. Schaefer and K. van Es
The process of getting data through scraping, crawling and calling APIs is covered in Chapter 6: Collecting and curating digital records of Venturini, T. & Munk, A.K. (2021). Controversy Mapping: A Field Guide.
The intricacies of Wikipedia and the different ways in which the platform may be reappropriated for controversy analysis are covered in Weltevrede, E., & Borra, E. (2016). Platform affordances and data practices: The value of dispute on Wikipedia Big Data & Society, 3(1).

Anders K. Munk & Mathieu Jacomy