What Can You Do With Crowdsourced Digitalization?

As mentioned in a previous post, crowdsourcing is basically the idea that the general public can create or edit contents online. Crowdsourcing has been put to good use in the creation of sites like Wikipedia. Now libraries, archives, museums and educational institutions are getting in on the game by finding new ways to harness the potential of an engaged audience. In this digital age, many institutions are trying to create digital copies of the documents in their collections that can viewed in a searchable database online. If the text in these documents is typed or printed, then OCR (Optical Character Recognition) technology can be used to convert the image of the text into searchable data. But what happens if the documents are handwritten (and many older documents are) and each writer’s script is unique and cannot be read by OCR? Or what if the OCR makes a mistake when converting an image to data? Normally, editors have to spend countless hours fixing the OCR mistakes and transcribing handwriting documents. But crowdsourcing is changing that.

Now, instead of reliving on internal manpower (and trying to find funding to pay the editors) to carry out these transcription and correction tasks, increasingly institutions are reaching out to the general public and asking for its help in completing these tasks. While some have complained of the exploitive nature of these projects, if institutions are  doing a good job, crowdsourced projects can be mutually beneficial to both the public and the institution. The institution reaps the benefit of the “many hands make light work” concept and the public feel like they are making a civic contribution. Many questionaries from users participating in crowdsourced projects indicate that they find their work highly meaningful, even though they are providing free labor. Another benefit for both institutions and the public is increased engagement with cultural heritage, a goal of many institutions. Participants learn about something unique and institutions are able to draw in different users that might not interact with their materials.

There are a variety of tasks that can be completed by members of the public. Some of the most basic work is correcting OCR text. For example, through the National Library of Australia’s Trove project, you can search through newspaper articles and fix any mistakes you find in a window of the lefthand side of the screen. These corrections can then be submitted and are updated in Trove’s full-text searchable collection. Another common crowdsourcing task is document transcription. Examples of this would be the Papers of the War Department project or Transcribe Bentham. These let the user view a  large image of the original document and then provide a text box for users to type in their transcribed text. Most transcription submissions are monitored by an editor who decide when the transcription is good enough to be added to the document’s file. Besides these two basic crowdsourcing tasks, institutions are coming up with other creative ways for the public to help. For example, the New York Public Library’s Building Inspector project asks contributors to help map building outlines on old NYC maps by fixing the outlines created by a computer program. By looking at an overlay on the map, one can click and drag to create a polygon that matches the footprint of the building. These different types of tasks vary in difficulty and time commitment. In my experience, completing full text transcription is the most demanding of these tasks, simply because it is sometimes hard to read the handwriting and it takes a longer time commitment. On the other hand, “Checking Footprints” on the Building Inspector site was easy to do (click one of three buttons) and could be done in a matter of seconds.

Members of the public decide to become contributors to these projects for various reasons. Many people who contribute have particular research interests that complement the subject of the project. For example, contributors to the Papers of the War Department are reportedly interested in early US politics, Indian affairs or are current or former military members. Contributors to Transcribe Bentham are generally interested in philosophy or history or specifically Jeremy Bentham’s writings. A lot of people who contribute to Building Inspector are interested in NYC or like old maps. Secondly, many participants are interested in genealogical research. For example, Trove draws many who are interested in local and family history. Finally, as mentioned before, many contributors are intrinsically motivated. They find the work is relaxing and enjoyable. They like being a part of something bigger. They like giving back. They feel like they are contributing to the greater good of society. And most importantly, they feel like their work is valuable. For me, transcription is the most rewarding. When I complete a transcription, I feel like I have really contributed something. Even though I have labored though it, I have a product to submit at the end. On the other had, correcting building footprints leaves me feeling like I haven’t accomplished anything or that I have not really made any difference.

In order to keep contributors engaged in a project, there must be a high quality user experience. An important part of this is the user interface. Contributors are likely to continue with their work if the interface is engaging and easy to use. In my own experience, this is true. For example, when working with the Papers of the War Department, the user is given a horizontally split interface instead of the normal vertical split screen, which is very well suited to transcription work. It is nice to have the document on top across the page, so you can see a whole line of text at one time, while simultaneously typing your transcription in the text box at the bottom of the screen. Another essential part of a transcription project’s interface is the toolbar. The one for the Papers of the War Department project worked well, allowing you to put in superscripts and line breaks. While I wouldn’t call the interface particularly “engaging,” it helped me do what I was asked to do.

The second part of the user experience is the interaction between contributors and the project managers. Feedback and moderation are necessary for retention of contributors. For example, after each transcription in the Transcribe Bentham project is completed, it is reviewed and edited by moderators and editorial staff. The TB_Editor communicates with the volunteer transcribers by leaving messages on their user pages. In the Papers of the War Department project, each document has a discussion page attached to it. Contributors can leave comments or questions for other contributors or for the project staff. The more that staff can monitor these pages and answer the contributors’ questions, the more likely the project is to retain their contributors. Communication from the project staff also act as an acknowledgement that the contributor’s work is valued.

In general, crowdsourcing done right can be a positive experience for both the institution and the general public serving as contributors. Once a positive relationship has been formed between the two groups, transcription and correction works gets done at a surprising pace. Contributors learn new information and get the satisfaction of making a a difference. Institutions engage with a wider audience and gain valuable data. That sounds like a win-win situation to me!

Leave a Reply

Your email address will not be published. Required fields are marked *