This is the introductory unit in a practical learning series on using text-as-data for socialist purposes. In this tutorial, I will introduce readers to three tidy methods of scraping, cleaning, and processing text from common sources and in the process build a tidy corpus of machine readable Marxist texts based the three volumes of Karl Marx’s Capital.
Welcome to Unit 2 of Using Text as Data with R to Advance the Cause of Socialism! In the first unit, we covered the basics of tidy text scraping, cleaning and processing with the tidytext package in order to put together a machine readable text corpus of all three volumes of Marx’s Capital from the Marxists Internet Archive.
Creative use of text-based data for socialist data science purposes is a main area of interest for my personal research. This is the first in a series of tutorials on using topic models to explore and categorize text-based data.