Transcribing between the lines: crowd-sourcing historic data collection

Nicole Kearney, Museum Victoria, Australia, Elycia Wallis, Museums Victoria, Australia

Abstract

Archival field diaries are an invaluable source of scientific and historic data, providing insights into species’ past abundance and distribution, references to significant people and events, and personal descriptions of historic expeditions. Despite the wealth of information they contain, they are an underutilised resource because they are inaccessible in their original state. As hand-written documents they are hard to read, and they are often uncatalogued. This means that neither their contents nor their very existence is searchable. In this paper, we will explore the evolving field of online transcription, with a particular emphasis on archival field diaries. Using Museum Victoria’s recent transcription projects as key case studies, we will discuss the transcription platforms available, the standards required for success, and, most importantly, what we are doing to capture all the data.

Keywords: field books, crowd-sourcing, digitization, biodiversity, transcription, taxonomic referencing

Introduction

For hundreds of years, naturalists, scientists and collectors have explored the natural world and documented their discoveries. This documentation has often taken the form of diaries written in the field during expeditions, or as an adjunct to other work or adventures. Many of these diaries have since found their way into museum collections, but can also be found in Archives and Libraries.

These diaries contain descriptions of new discoveries and frontiers, personal accounts of the trials and wonders experienced, and references to historically-significant people, places and events. They are also filled with scientific data: historic observations of animals, plants, rocks and fossils. Field diaries also provide rich contextual information, such as animal behaviour, interactions between individuals and species, habitat descriptions, weather conditions and past field techniques and collecting methods. Most importantly, diarists are often meticulous in their recordings and ensure that each observation is accompanied by a date and location, the two key pieces of information that make an observation useful to science.

The data locked away in historic field diaries is of great use to modern scientists. It can provide invaluable insights into species’ past abundance and distribution. This information can be used to more effectively plan future biological surveys and to inform threatened species management (e.g. Rowe, 2014). Data in diaries from collecting expeditions can be used to clarify specimens’ provenance and to re-find localities (Thomer, 2014). Historic occurrence records, such as those found in field diaries, are also starting to be recognised as a critical baseline for climate change studies (e.g. Hill, et al., 2012; Moritz, et al., 2008; Tingley et al, 2012). The value of these primary source materials, produced by our earliest field naturalists, cannot be underestimated.

Despite the wealth of information they contain, field books are an underutilised resource (Sheffield, 2011). Written as personal records, they exist only as physical documents, as a single hard copy, stored in a single location. Anecdotally, very few museums have a centralised procedure for storing, preserving or cataloguing field diaries. Many field diaries have simply remained associated with the science collections they refer to. These are often uncatalogued, their very existence known only to a few staff members. Some field diaries make their way into history or library departments, and many end up in museum archives. Scattered across departments, the field book collections of most museums remain undiscovered.

Historic field books are not just hard to find, they are hard to read. Many are penned in scripts no longer used today. Their legibility is further reduced by the fact that these documents were not neatly written at a desk; they were scribbled in the field, often in less-than-favourable conditions. As handwritten documents, their contents can’t be searched. Even when digitised, handwriting is unreadable by computers. Even the neatest, most consistent handwriting resists OCR (optical character recognition). With their treasure troves of data hidden within unsearchable text, field diaries are often simply ignored and forgotten.

Developing an efficient method for digitising and transcribing field notes is critical for capturing the data they contain, and not just for historic field notes. While a few scientists are beginning to record their field data digitally, most still use pencil and paper. In 2015, all scientific curators at the American Museum of Natural History expressed a desire to digitize and preserve their field notebooks, reporting that these documents contain the most important data from their work (Steeves, 2015).

Case Study: The field diaries of A. Graham Brown

Ten years ago, a Curator in Museum Victoria’s history collection discovered a box of diaries within the museum’s bird mount store. The box was labelled “Estate of A. Graham Brown – note books.” It contained five hand-written field diaries and six folders of sightings records. Arthur Graham Brown is one of Victoria’s most eminent ornithologists, president of the Royal Australian Ornithological Union and an Honorary Associate of Museum Victoria.

Soon after their discovery, the significance of Graham Brown’s diaries was recognised by both our science and history curators. Brown kept meticulous records of his bird observations and his diaries and made repeated expeditions to locations of great conservation interest to our museum, such as the Grampians National Park. The historic occurrence records contained Brown’s diaries could be compared with our recent survey data to study changes in species distribution and abundance over time. In order for this research to proceed, however, the diaries (and the data they contained) needed to be made more accessible. Thus began a project to digitise and transcribe the Graham Brown field diaries.

Cataloguing

The first step in the Graham Brown field diary project was to catalogue the contents the box. We created an object record for each diary in our collection management database and linked these to the biographical record for the author. These records contain numerous searchable fields, making the records of the diaries discoverable by all who use Museum Victoria’s internal database. We would eventually make the diaries discoverable by a much wider audience, but first we needed to digitise them.

Digitisation

Museum Victoria has been digitising published material in our library collection since 2010, when we began contributing to the Biodiversity Heritage Library (BHL) (Chmiel, 2011; Wallis & Matthews, 2014). Digitising unpublished material was new for us, but the method was the same. Our digitisation equipment consists of two digital SLR cameras (Canon 6D) mounted on a book scanning platform (Atiz Bookdrive Pro). We use three lenses (focal length 50mm, 85mm and 100mm), which can be manually changed to allow for imaging documents up to A2 size. Larger items, including fold-outs, are photographed separately on a flat-bed platform. The built-in lighting consists of four LED floodlights, giving a near daylight colour temperature light. The pages are turned manually to minimise damage. The images of each page were then cropped and colour-matched to the original. All of this work was completed by our in-house digitisation volunteers.

The high-quality scans were then uploaded into Museum Victoria’s image management database and, once every page had an image record, the images were shared across into our collection database and linked to the new object records. We now had a complete digitised and discoverable version of each diary. The next step was to transcribe the diaries and make the contents searchable.

Choosing a transcription tool

A number of organisations around the world have had great success using crowd-sourced volunteers to transcribe hand-written material online. Transcription projects using online volunteers include, among others, the Field Book Project (http://www.mnh.si.edu/rc/fieldbooks/), Transcribe Bentham (http://blogs.ucl.ac.uk/transcribe-bentham/), What’s on the Menu? (http://menus.nypl.org/), Citizen Archivist (http://www.archives.gov/citizen-archivist/transcribe/), Pitch in! (http://www.slq.qld.gov.au/about-us/pitch-in/transcribe), Notes from Nature (http://www.notesfromnature.org/), Operation War Diary (http://www.operationwardiary.org/) and Old Weather (http://www.oldweather.org/).

The transcription platforms used for these projects are almost as varied as the projects themselves. Most are now more complex than a simple free text field for transcribers to type into. Some provide formatting tools for volunteers to mark up their transcribed text. For example, Transcribe Bentham uses TEI (Text Encoding Initiative) XML tags; WikiSource uses a special syntax called Wikitext or Wiki markup. Some transcription software also provide annotation tools, allowing transcribers to make use of wiki-style syntax to tag key words within the text, such as people’s names, place names, or names of species, e.g. FromThePage (http://beta.fromthepage.com/), WikiSource (https://en.wikisource.org/wiki/Main_Page) and the Public Records Office Victoria (PROV) semantic wiki (http://wiki.prov.vic.gov.au/index.php/Transcribe). The tags can be labelled to separate different types of data, e.g [taxon], [date], [location] and the tagged terms can then be extracted to form a data set.

This method of data mining can be very effective when the connected pieces of data are close together in the text, such as on a specimen label. It becomes problematic when that connected data is spread across several pages of a document, such as in a long diary entry. This can be complicated further when, as is often the case in diary entries, a single mention of a date and location apply to multiple sightings. It seemed to us that, for our project, data mining using annotation would require a great deal of manual input and validation. Surely it would be more efficient to simply compile the data into a table during the transcription process.

Transcribing data into a table with pre-determined fields is exactly what is done by organisations involved in the mass digitisation and transcription of specimen labels, such as iDigBio (https://www.idigbio.org/) and Notes from Nature (http://www.notesfromnature.org/#/). In these projects, specimens are photographed and their hand-written labels are transcribed into a table, usually in the form of a user-friendly online template. The Atlas of Living Australia (ALA), in conjunction with the Australian Museum, launched their volunteer transcription portal (now called DigiVol) in 2010 (Flemons & Berents, 2012). It was built specifically for transcribing specimen labels and is used for this purpose by the Smithsonian, the US National Herbarium, the South African National Biodiversity Institute, the Vermont Center for Ecostudies, the Australian National Insect Collection and the University of Melbourne Herbarium. Like records of sightings, records of specimens held in museum collections are scientifically useless unless the identification of the species is accompanied by a date and location for the collecting event. DigiVol’s templates are designed to collect this data; the same data we wanted to collection for each sighting in our field diaries.

DigiVol has now expanded their transcription service, creating templates for transcribing collection registers, survey sheets and field diaries. Their field diary template consists of two sections: a box for transcribing the verbatim text, and a table for collecting sightings data (scientific name, common name, date, location). The DigiVol staff kindly added an addition field for our project: a field for collecting mentions of people and organisations (name, date, location). This customised template allows volunteers to transcribe each page of the diary, while simultaneously recording all elements of the data required by both our history and science curators.

Establishing transcription standards

The most successful transcription projects are those that provide their transcribers with clear and detailed instructions. For online transcription projects, where face-to-face training is not an option, instructions usually take the form of written guidelines or tutorials. As transcription projects vary enormously in their content, format and style, a specific transcription tutorial is usually produced for each project.

There is no right method of transcribing; how a document is transcribed will depend on the intended audience and purpose of the transcription. For example, the aim of the New York Public Library’s What’s on the menu? is to create a “database of dishes” (http://menus.nypl.org/). Volunteers are instructed only to transcribe things you can eat, drink or smoke, to ignore all other text (menu, headings, restaurant names, etc) and to convert text into individual dishes (“Clams, stewed or fried” becomes “Stewed clams, Fried clams”). Similarly, the aim of Zooniverse’s Operation War Diary is to create an index of the people mentioned in First World War documents (http://www.operationwardiary.org/#/). Volunteers place tags throughout the pages (names, dates, times, locations) and transcribe only this information.

The aim of our project was to preserve the integrity of our field diaries. Our curators wanted a transcription that would faithfully represent every textual aspect of the manuscript. This is called a “diplomat transcription”. Most transcription projects follow “semi-diplomatic” principles, varying in the extent to which they follow an author’s formatting, correct spelling and grammatical errors and spell out abbreviations.

We asked our transcribers to produce an exact copy of the original, following the author’s formatting (line breaks, page breaks, etc). In the absence encoding tags on DigiVol, the volunteers had to manually mark-up words that were underlined or crossed out. While some of DigiVol’s templates included buttons for adding symbols (e.g. ♀, ♂), these were not currently available on their field book templates. The volunteers were therefore encouraged to use Alt codes (Alt Numpad input method) to input these and other symbols. We also asked that our volunteers transcribe the exact common and scientific names used by the author, rather than out-dated names with currently accepted ones. The importance of capturing the original wording of an occurrence record (the verbatim name term) was highlighted by Thomer et al. (2012).

Anything added by a transcriber in the interest of clarity and readability (corrections, clarifications, valid species names, etc) was added between square brackets. Transcribers were also asked to use the verbatim names and locations used by the author into the data collection table. Transcribers could add notes about the pages they had transcribed in a separate comments field. This ensured that the transcribers’ additions were clearly distinguishable from the author’s original text.

Building an online community

We uploaded the first Graham Brown diary onto DigiVol on a Friday afternoon. We had arranged for two of our in-house volunteers to start the transcription the following week. Once they had tested the process using our online tutorial (and we had ironed out any issues), we planned to promote the project externally in the hope of attracting some online transcribers.

By Monday morning, however, the diary was already 20% transcribed. Existing volunteer transcribers already registered with DigiVol were racing through it. As we looked through their work, we noticed that the volunteers had not just been transcribing, they had been communicating. They had asked questions, provided answers and shared ideas. An online community had already formed around our little project.

One feature of DigiVol is that volunteers can contact each other directly or start online forums about particular topics. These topics range from requests for help deciphering an individual word, to suggestions about how to improve the online template. As administrators, we received notifications of forum posts and were able to view the transcribed pages online. This meant that issues and errors were picked up quickly. Being able to directly communicate with our transcribers allowed us keep them updated about the project’s progress, encourage their participation on the next diary, and, most importantly, to personally thank them for their work.

Not all crowd-sourcing platforms facilitate communication between volunteers, and many online projects struggle to build a functioning community. For example, Thomer et al. (2012) used WikiSource as their online platform. While they were able to take advantage of its existing wiki-community, their attempts to engage with their volunteers fell on deaf ears. They were not even able to discover the names of their “mystery annotators” for the purpose of acknowledgment. DigiVol volunteers must enter an email address when they sign up. This may dissuade some potential volunteers from contributing, but for others the thriving online community is an attractant in itself (Flemons, et al., 2015; Prater, 2015).

Validation

Our communication with the online community helped us with another key part of the project: finding a volunteer able, and willing, to check the work of his/her peers. DigiVol transcriptions benefit from a two-step process: once a volunteer has transcribed a task, they submit it for validation. It is then checked by a validator, someone who double checks the transcriptions and ensures that the data has been correctly copied into the sightings/mentions table.

Validation (sometimes called reviewing) is part of the process of many online transcription portals, but who does this validation work varies between projects. DigiVol validators must be selected and assigned to a project by the project administrator. They are usually experienced volunteer transcribers who have shown great attention to detail. To ensure consistency we chose to only have one validator per project. Other transcriptions do not have such a rigorous validation process. Anyone can validate transcribed pages on the Smithsonian Transcription Center (https://transcription.si.edu/), but only once they have set up an account and provided their email address (account-holding is optional for transcribers). This means that while each page is only checked by one person, there can be many people validating a single diary. There is no validation process in the Operation War Diary project (http://www.operationwardiary.org/#/). Rather, numerous volunteers annotate/transcribe each page, with each new pair of eyes hopefully picking up something the others have missed.

The validator we selected for the Graham Brown diaries was a volunteer transcriber who stood out early on in our project: her work was excellent and she was an active participant in the online forums, answering questions raised by others and offering insightful and considered solutions to issues we haven’t even thought of. We were very fortunate that she agreed to work as the validator for our first diaries, and has continued to work in this role on our subsequent projects.

Preparing the transcriptions for display

It took five months for 46 online transcribers and our single validator to complete the transcriptions of the five Graham Brown field diaries. It was then a matter of extracting the verbatim transcripts from DigiVol and creating a clean version in Word. We then replaced the line breaks, page breaks and other formatting tags inserted by the volunteers with the actual formatting. The five final documents, each an exact page-by-page match the original, were then uploaded into our collection database and linked to the object records for the diaries. The diaries were now discoverable and searchable within our internal systems, accessible by any staff member. But in order for the diaries to be truly accessible, they needed to be online.

Bringing the field diaries to the world

The Biodiversity Heritage Library (BHL) is the world’s largest online repository of biodiversity literature and archival materials. The Australian component of BHL (BHL-Au) is managed by Museum Victoria and funded by the Atlas of Living Australia (http://www.ala.org.au/). To date, BHL-Au has digitised and uploaded over 500 rare books and historic journals. This represents an addition of over 133,000 pages of Australia’s published biological heritage to the massive BHL corpus of nearly 47 million pages. If our unpublished field diaries were to be accessible to the global biodiversity community, BHL is where they needed to be.

The process of uploading digitised items onto BHL involves first passing them through a custom-built metadata collection tool produced by the Smithsonian Institution Libraries, called Macaw. Macaw allows users to manage the scanned pages of “book-like things” via a web browser and to input page-level metadata. The result is a complete digital version of the item that can then be exported to other systems, such as the Internet Archive and the BHL.

It is only over the past couple of years that the BHL has started to accept field notes and other unpublished material into its system. Unpublished material presents a problem, as the upload process requires a MARC record (MA chine-Readable Cataloguing Record, the uniquely-labelled computer-readable record containing the bibliographic data elements) for the item. In order to get around this problem, our librarians created a record for the unpublished diaries that has enough MARC elements filled to make a valid record, which they then suppressed in their system so that it would not be shared into catalogues viewable by external parties.

The original field diaries and the transcripts were now ready to be shared with the world. The completed items, along with their attached metadata, were first uploaded into the Internet Archive (the online repository and scanning partner for the BHL). They were then harvested across into BHL, along with several derivatives produced by the Internet Archive, including the OCR produced for each page file (more on this below).

Museum Victoria’s first digitised field diaries and their transcriptions appeared in BHL in August 2015. Their contents are now discoverable, accessible and searchable by anyone, be they a scientist studying the impact of climate change on Victorian birds, a historian researching past presidents of the Royal Australasian Ornithologists Union, or an online volunteer proud to see their work online.

Future Directions

How to best display transcriptions online?

There is no doubt that the Biodiversity Heritage Library (BHL) is where our digitised field diaries belong. As well as providing intuitive search tools, various view options, and access to a massive online community, the BHL website provides a beautiful user interface for viewing historic literature. However, BHL does not (yet) provide the option of viewing a hand-written document alongside its transcription. Being able to simultaneously scroll through the two documents would greatly enhance the user experience. The romance of the originals – historic scripts, whimsical sketches, fingerprints, ink smudges, printed photographs, pressed flowers and feathers – are all lost in the typed, black and white (albeit legible and searchable) transcriptions.

The option of viewing an original document alongside its transcribed text is possible in a number of transcription portals, including the one produced by the Smithsonian, The Smithsonian Transcription Center (https://transcription.si.edu/). This is also possible on the DigiVol website, but the pages are only viewable by DigiVol account holders. However, the audience of these websites are online volunteers undertaking transcription, not the larger biodiversity community who seek to use the content.

In the absence of a side-by-side viewing option in BHL or in the Internet Archive, we have uploaded each transcription as a second volume of its original. However, there may be another possibility for displaying the transcribed text and original document concurrently, one that would use existing functionality on the BHL website. We propose that each page of the transcription be copied into the existing OCR field for the corresponding page of the original. “View/hide OCR” is an existing option above the page view for every digitised page in BHL. Selecting it opens a side bar that is populated with automatically-generated text. It is well accepted that OCR generated from handwriting is poor. “When you try to feed handwriting to OCR, you get a lot of gibberish” (Brumfield, 2013). This is certainly the case for the OCR generated from the relatively neat hand of Graham Brown. This side bar could certainly be of more use if it were populated with our human-generated transcriptions. Our request for this change in functionality was favourably received by BHL and we hope that the option will be available in the near future.

Making the most of the data

Another outcome we are working toward is the taxonomic referencing of the thousands of legacy occurrence records extracted from the field diaries: the matching of the species names used in the diaries with the currently accepted scientific names. A scientific name is the name uniquely assigned to a species. Despite there being only a single scientific name per species, these names can change over time as new information about a species is uncovered. Modern genetic work has resulted in dramatic changes in our understanding of phylogenetic relationships, with many species being merged, split and renamed. This means that the names used by our historic authors can be quite different from the names we use today.

Taxonomic referencing ensures that data attached to out-dated names can be connected to the modern name. This crucial next step in our project will ensure that the information in the diaries is discoverable and accessible. And now that the field diaries are catalogued in our collection database, taxonomic referencing will allow us to link historic contextual data to specimens in our collection. Graham Brown’s five field diaries yielded 5611 animal sightings, complete with date and location data. These occurrence records will increase our understanding of past species’ distribution and abundance, and inform future survey work and management efforts.

Brown also made mention of 547 people and organisations. Many, such as ornithologists Graham Pizzey and Frank Knight, are historically significant in their own right. Their names in the diaries can be linked to the biographical records for their names in our collection database, which will ensure that this information will be available for future research. These sorts of links invariably lead to future discoveries.

A Word about Online Volunteers

We uploaded our first field diary onto DigiVol in November 2014. It was the first of many. At the time of writing (August 2015), we had uploaded fourteen diaries and the volunteers had completely transcribed ten of these. A total of 65 online volunteers have contributed to our transcriptions. Of these, only one third have worked on more than one diary, with some completing only a single page of transcription. A small number of dedicated volunteers have worked on all 10, with some diaries being over 50% transcribed by a single volunteer.

The contribution made by our online volunteers is invaluable, but it is extremely difficult to quantify. This is because field diaries are highly variable in their length, format and legibility. Measures such as the number of words transcribed per hour are diary-specific and can only be used to estimate the time it will take to transcribe a similar work by the same author. Estimates for future projects will take also need to take into account the transcribing rates of individual volunteers, which vary enormously based on their experience, time available, attention to detail, and interest in the project.

In our experience the two most reliable factors in determining the (relative) rate at which a transcription project will proceed are 1) legibility of the handwriting, and 2) the volunteers’ interest in the content. Graham Brown had beautiful handwriting and his diaries were a joy to read. He wrote the dates in the margins, underlined all species names and provided an index of locations at the end of each diary. His diary entries were filled with fascinating descriptions of his travels and the birds he encountered, interspersed with photographs and details of his own life. By the time they had finished transcribing his diaries (in record time), the volunteers had become quite attached to the (then) young ornithologist, and so had we.

Starting with such an easy-to-transcribe project allowed us to gain experience and develop procedures, while the online transcription seemed to run itself. Our subsequent transcription projects were differed greatly in their legibility and content, and initially the transcription process was excruciatingly slow. A few volunteers struggled through scrawled notes and dry content, but most dropped off. We soon found, however, that by putting a little effort into promotion and support, we could make the diaries both easier to transcribe and more interesting.

People work best when they feel that their work has value. This is particularly true for volunteers. We ensure that we put effort into writing an introduction, highlighting the reputation of the author, the significance of the particular diary to be transcribed, and the value of the content to both scientists and historians. We also used mailing lists and other contacts to promote the new project to volunteers who had worked on our previous projects. Finally we wrote blog posts about each new author, including scanned images from the diaries and links to the transcription portal, and shared these posts via our social media channels (Kearney, 2015a,b). In each of these communications we emphasised that this crucial work would not be possible without the contribution of volunteers.

Throughout this process, we kept a close eye on the transcriptions, contacting volunteers individually to welcome them to the project and to provide them with feedback. We updated the tutorials to include newly-encountered issues and common errors, produced lists of frequently-used names and places, and kept the volunteers informed of these developments. While this level of support was time consuming, it greatly increased the numbers of volunteers recruited and retained, the rate of the transcription, and the quality of the final transcription. We also found that the additional effort was only necessary when uploading the first diary by a new and difficult-to-read author.

The DigiVol administration, based at the Australian Museum, seeks to recognise and reward the hard work of its volunteers in a number of ways. These include appreciation awards, public honour boards, personal achievement pages, virtual reward badges, and “how you’re making a difference” reports. However, in a recent DigiVol survey most volunteers stated that the honour board was not important to them. Rather, the strongest motivational drivers given for their volunteering on DigiVol were an interest in “natural history and cultural museum collections”, “doing something worthwhile” and “making a contribution in the field of biodiversity” (Flemons et al., 2015).

These drivers are certainly strong. DigiVol has just attracted its 1000^th volunteer transcriber (Faber, 2015) and their current recruitment rate is about fifty new volunteers per month (DigiVol Admin Newsletter, July 2015). These volunteers are highly skilled, with a great deal of professional and life experience. Fifty-three percent are over fifty and forty-one percent have a postgraduate degree. In order to appeal to this growing volunteer workforce, the value of the work and the impact it will have on our current understanding of biodiversity cannot be understated.

Conclusion

Now that we have established a successful and efficient procedure for digitising, transcribing and data-mining field diaries, and for making these diaries and their transcriptions available online, we plan to continue this work. As news of our project spreads around our museum (and beyond), historic field diaries are coming out of the woodwork. These include uncatalogued diaries from our own museum as well as surprises from outside: after we started transcribing the diaries of Allan McEvey, one of his retired assistants made a donation containing photographs from the expeditions we had just transcribed and one of her own field diaries. These diaries will soon start their journey towards being discoverable and accessible. Who knows what insights we will glean from these treasures?

Acknowledgements

Rebecca Carland was the Curator who rediscovered the box of Graham Brown field diaries and sent us tumbling into the world of transcription. Her dedication and passion for these historic documents have fuelled this project since day one. She and Dr Karen Rowe provided us with essential background information and assisted with the transcription tutorials and templates.

The digitisation side of this project ran seamlessly because of the super-efficient BHL team at Museum Victoria: Hayley Webster (Manager, Museum Victoria Library), Cerise Howard (Digitisation Coordinator), Jim Healey (Technical Support Officer), and our wonderful digitsation volunteers: Bob Griffith, Heidi Griffith, Susan Halliwell, Jade Koekoe, Alan Nankervis and Tiziana Tizian.

The team at DigiVol were extremely helpful and generous with their time, particularly Rhiannon Stephens and Paul Flemons at the Australian Museum. We also thank the Atlas of Living Australia for funding both BHL-Australia and DigiVol.

Finally, we are immensely grateful to the 65 online volunteers, particularly Teresa Van Der Heul and Catherine Cowan who did a significant portion of the transcription work. Erin Headon, our volunteer validator, single-handedly reviewed every page of this project. It is her meticulous attention to detail that has given us such a consistent and high-quality transcription. She worked tirelessly, completing the validation of the final diary only a week after the volunteers finished transcribing it. We cannot thank her enough.

References

Brumfield, B. (2013). Improving OCR Inputs from OCR Outputs? Collaborative Manuscript Transcription Blog. February 14, 2013. Consulted August 3, 2015. http://manuscripttranscription.blogspot.com.au/2013/02/improving-ocr-inputs-from-ocr-outputs.html

Chmiel, K. (2011). BHL launch. MV Blog. July 14, 2011. Consulted August 3, 2015. http://museumvictoria.com.au/about/mv-blog/jul-2011/bhl-launch/

Faber, M. (2015). DigiVol reaches 1000 volunteers and more! ALA Blog. July 13, 2015. Consulted August 3, 2015. http://www.ala.org.au/blogs-news/digivol-reaches-1000-volunteers-and-more/

Flemons, P. & P. Berents. (2012). “Image based Digitisation of Entomology Collections: Leveraging volunteers to increase digitization capacity.” In V. Blagoderov & V.S. Smith (eds). No specimen left behind: mass digitization of natural history collections. ZooKeys 209: 203–217.

Flemons, P. et al. (2015). DigiVol: A new way of volunteering. Presentation for Ignite Volunteering Conference, June 1, 2015. Consulted August 12, 2015. http://www.volunteering.com.au/wp-content/uploads/2015/06/Paul-Flemons-DigiVol-a-new-way-of-volunteering-CFV-Conference-2015.pdf

Hill, A. et al. (2012). “The notes from nature tool for unlocking biodiversity records from museum records through citizen science.” Zookeys 209, 219–233. August 4, 2015. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3406478/

Kearney, N. (2015a). Transcribing field diaries. MV Blog. March 19, 2015. Consulted August 4, 2015. 2015. http://museumvictoria.com.au/about/mv-blog/mar-2015/transcribing-historic-field-diaries-/

Kearney, N. (2015b). Read our historic field diaries online. MV Blog. August 14, 2015. Consulted August 14, 2015. http://museumvictoria.com.au/about/mv-blog/aug-2015/read-our-historic-field-diaries-online/

Moritz, C. et al. (2008). “Impact of a century of climate change on small-mammal communities in Yosemite National Park, USA.” Science 322 (5899), 261-264.

Prater, L. (2015). DigiVol: Volunteers making a difference. AM Blog, 25 Jun 2015 http://australianmuseum.net.au/blogpost/museullaneous/digivol-volunteers-making-a-difference Accessed 3 August 2015

Rowe, K. C. et al. (2014). “Spatially heterogeneous impact of climate change on small mammals of montane California.” Proceedings of the Royal Society B: Biological Sciences 282 (1799).

Sheffield, C. et al. (2011). “Merging Metadata: Building on Existing Standards to Create a Field Book Registry.” Libreas: Library Ideas 7, 66–74. Consulted August 4, 2015. http://www.libreas.eu/ausgabe18/texte/08sheffield.htm

Steeves, V. (2015). The Next Frontier of Stewardship: the Value of Field Books in a Digital Age. Field Book Project Blog, Smithsonian Institute. March 19, 2015. Consulted August 14, 2015. http://nmnh.typepad.com/fieldbooks/beyond-the-field-book-project/

Tingley, M.W. et al. (2012). The push and pull of climate change causes heterogeneous shifts in avian elevational ranges. Global Change Biology 18 (11), 3279–3290.

Thomer, A. (2014). Sourcing Primary Materials: Notes from A Workshop. So You Think You Can Digitize Blog. April 1, 2014. Consulted August 3, 2015. https://soyouthinkyoucandigitize.wordpress.com/2014/04/01/resourcing-primary-materials-notes-from-a-workshop/

Thomer, A. et al. (2012). “From documents to datasets: A MediaWiki-based method of annotating and extracting species observations in century-old field notebooks.” Zookeys 209, 235–253.

Wallis, E. & Matthews, D. (2014). Collaborating locally, contributing globally: the Biodiversity Heritage Library in Australia. Paper presented at VALA2012, Melbourne, February 6 – 9, 2012. Consulted August 3, 2015. http://www.vala.org.au/direct-download/vala2012-proceedings/441-vala2012-session-14-wallis-paper/file

Cite as:
. "Transcribing between the lines: crowd-sourcing historic data collection." MWA2015: Museums and the Web Asia 2015. Published August 14, 2015. Consulted .
https://mwa2015.museumsandtheweb.com/paper/transcribing-between-the-lines-crowd-sourcing-historic-data-collection/