Dataset Assignment and Blog

This assignment asks you to apply what you are learning about spreadsheets, datasets and databases by transforming a digitized primary source into a sample dataset about the early Americas topic you’ve selected for your final project. Then you will explain your process of dataset creation in a blog post.

Part 1, due Tuesday 3/26 at midnight:

Select an appropriate digitized primary source (or small group of sources) about the early Americas topic you’ve selected for your final project. You may use a source you already found for your Primary and Secondary Source assignment or find a different source using SMU’s library catalog and/or databases. As we’ve discussed in class, the source(s) must contain some categorical or quantitative (statistical) or information. Consult with the instructor if you are having difficulties finding a source or are unsure if your source is appropriate for this assignment.
Using Google sheets, create a simple spreadsheet for your data. First, decide on the variables (categories of data) that you will record, and then use these variables to create column headings. Make sure to include at least 4 variables. Then, transcribe your data. Use a new row for each “observation” or item. Transcribe at least 5 rows of data from your digitized source.
Write a short report (150-200 words) about your preliminary work. Be sure to explain:
- what source(s) you selected, what it is about, and why you selected it
- what are some of the variables you selected, and why do you think they are important to record
- any difficulties you’ve run in to so far
Submit 3 items in Canvas by Tuesday 3/26 at midnight:
1) Sample image of your digitized source
2) A public link to your Google spreadsheet
3) Short written report of you work so far (see directions above)

Part 2, due Thursday 3/28 at midnight:

Like we practiced in class, transform the data from your Google spreadsheet into an Airtable database. Decide on a schema or structure of your database. Your database must have at least 3 tables: two which store information (ex. people, places, sources, etc.), and one which shows discrete instances of your dataset by connecting or linking to data from your other tables.
Write a blog post about your sample dataset and post it to your website. To earn full points, your blog must include:
- A citation of the primary source(s) that you data comes from, according to the Chicago Manual of Style conventions.
- A link to your publicly available Airtable database.
- 1 screenshot of your Google spreadsheet AND at least 1 screenshot of one table from your Airtable database.
- 500 words reflection on your work with spreadsheets and databases. It should use full sentences and paragraphs with topic sentences. Be sure to write about:
  - What source did you select and why? What topic, period, and place of history does this source tell us about? How does this relate to the topic of your final project?
  - Describe the process of creating your initial Google spreadsheet and the decisions you made (i.e. variables you selected, process of transcription, etc.). How does transcribing structured data differ from transcribing a text document?
  - Discuss the process of transforming your spreadsheet into a database. How are spreadsheets different than databases (in purpose, in configuration)? What decisions did you make in developing your database schema (i.e., number of tables, which tables would hold which content, field types, the relationships between tables)? How did you standardize any anomalies?
  - What challenges did you face in creating your dataset?
Submit your blog post URL on Canvas by Thursday 3/28 at midnight.

Course Details