Digital Services Manager
Montana Historical Society
Datasets are a treasure trove of information for historians and social scientists who draw on relatively recently developed methods of historical analysis to support theories, develop new interpretations, and think deeply about the implication of patterns. While a blog post is too short to delve deeply into this topic, the MHS extends notice of datasets in our collections and we encourage the use and analysis of big data.
MHS Datasets
MHS recently shared three datasets on the Socrata data portal currently supported by the State Information Technology Services Division (SITSD). Since the software, people, and commitments are outside the organizational control of the MHS, researchers should assume links may change and should prepare citations which reference the fact that the dataset is held by the MHS. The MHS will maintain copies of the datasets and we will commit to ensuring access, we will also provide data accuracy and integrity statements. Datasets are presented under a public domain license, which permits researchers to export, use, and append the dataset.
Current
datasets
Preparing the dataset
Historical datasets can be complicated to develop since
historical data is not always structured consistently and handwritten data can
be difficult to read. When data is structured for machine readability, it is
fairly easy to map data into new fields, parse information, or aggregate data.
Standardized information sets such as a handwritten table are also fairly easy
to structure, but unstructured data must be hand-entered and the dataset
creator must make decisions about field names, content standards, and
normalization. In practical terms, this means that the dataset of a handwritten
ledger (Figure 1) will easily map to a table or XML file (Figure 2). However,
the dataset creator of military enlistment cards (Figure 3) will need to make
the following decisions:
- Field names – i.e., metadata terms, local terms or drawn from a professional authority;
- Data content standards – if none are present, a standard will need to be defined or developed. Content standards are simply the rules for data entry which ensure consistency.
- Data normalization –the process of organizing and cleaning data in order to reduce redundancy.
Figure 1. Enlistment records from Fort Assiniboine (identified as Assinniboine in original ledger) in a structured table, from MC 46. |
Figure 2. Table of data in Excel (left) and structured data in xml format (right) |
Example Methodology – State Prison Records
The State Prison Records dataset is drawn from digitized
prison records which are presented on the Montana Memory Project in the
collection Montana State Prison Records, 1869-1974.
A team of stalwart volunteers—Marie McAlear and Anthony Schrillo—led by staff member
Caitlin Patterson spent eight years digitizing, collecting metadata, and
uploading the materials from the highly used public documents. Information
about intriguing and unusual cases is recorded elsewhere on this blog. In order to understand larger patterns, though,
researchers need access to the dataset created through metadata development.
We normalized the dataset by reducing the ~28,000+ lines of
metadata down to ~15,000 unique records, standardized the content, parsed
columnar data, and quantified some of the information. By presenting the
metadata as a dataset, researchers may filter fields – Crime, Location, Gender,
Descent, Occupation, and Religion—and may look for spatial or temporal patterns
using Location or Incarceration Date.
However, simply filtering for a crime or demonstrating a
pattern will result in flabby analysis. Trends identified in datasets need to
be comparatively analyzed using state and local demographics, labor and culture
statistics, and/or national crime data. Broad patterns of movement and human
activity must be known and taken into consideration. Secondary sources read in
order to understand historical context, original records reviewed, military enlistment cards searched, newspaper accounts studied, and researchers might even visit the Old Montana Prison
and Montana towns to reflect on the social, economic, cultural, and
environmental conditions which lead to crime and incarceration. It’s also
important to look for the impact of incarceration and perhaps use a network
analysis to look for generational trends, recidivism, and the haunting social
impact of incarceration.
Big data analysis is a powerful tool for historical
research, but it is not an end. Look at the numbers, but feel for the pulse.
Please contact Tammy Troup, ttroup@mt.gov for more information.