Skip to Main Content

English and American Literature

How do I get data from these sources?

Each data source has their own procedures to extract data. Some use third-party programs, some have a convenient data package you can download in one click, and some require going through an Application Programming Interface (API). Each source has more detailed steps listed below. Please contact AskDSP@udel.libanswers.com for assistance. 

Data is delivered in a variety of formats.

Books and Primary Sources

Adam Matthew Collections

This spreadsheet lists primary source collections that the Library has access to. For text and data mining purposes, the data must be requested from Adam Matthew. Contact a librarian at AskDSP@udel.libanswers.com to get started. 

Gale Digital Scholar Lab

  • Gale DS Lab allows users to discover and create data sets from the content in the Gale Primary Sources purchased and licensed by the University of Delaware Library. The lab also includes tools for text analysis and visualization. 

HathiTrust Digital Library

  • HathiTrust (pronounced hah-tee) is a partnership of academic and research institutions, offering a collection of millions of titles digitized from libraries around the world. Many materials in HathiTrust are only accessible to member institutions (University of Delaware is a member). The Data API lets researchers access content in the digital library.  

JSTOR

  • JSTOR is a digital library of more than 2,000 journals and more than 25,000 books in the humanities, social sciences, and sciences. JSTOR Data for Research provides a portal through which to retrieve large amounts of data. Users must create a free JSTOR account to download datasets. 

Project Gutenberg

  • Project Gutenberg is a volunteer-driven, free digital library that offers over 56,000 free eBooks for public use. They offer works in many languages, but most books are in English. Please note that some works are still copyrighted material. See this page on automated access to the collection for directions on retrieving data. 

Digital Public Library of America

  • DPLA provides digital access to many collections of “America’s libraries, archives, museums, and other cultural heritage institutions.” Materials include books, photos, audio and video recording, and other media. Request an API key to gain access to the DPLA API. Data delivered in JSON-LD format.

World Digital Library

  • The World Digital Library, sponsored in part by the Library of Congress, archives digitized images of historical materials, both texts and images, from across the globe. Access the WDL API to retrieve data. Data delivered in XML format.

Biodiversity Heritage Library

  • The Biodiversity Heritage Library is an online collection of scientific texts focused on natural history, biology, botany, and other natural sciences. It contains both scholarly journal articles and books. Access the BHL API to retrieve data. Data delivered in JSON or XML format.

Women Writers Online

  • Women Writers Online is the digital library of the Women Writers Project out of Northeastern University. The library contains text of early women's writing in English, from 1526 to 1850. Review the information on their text database, and email the team at wwp@neu.edu, with a brief description of your research plans.

Newspapers

Chronicling America

  • Chronicling America is the website portal of the National Digital Newspaper Project, and contains digitized American newspapers from 1789 to 1963. 

Europeana 

  • Europeana is a digital library focused on European materials, including an extensive digitized newspaper collection. Europeana offers multiple APIs depending on the researcher's needs.

Gale Digital Scholar Lab

  • Gale DS Lab allows users to discover and create data sets from the content in the Gale Primary Sources purchased and licensed by the University of Delaware Library. The lab also includes tools for text analysis and visualization. 

New York Times Archive

  • The New York Times keeps archives of the newspaper’s past issues dating back to 1851. Recent articles require a NYT subscription to access. Members of the UD community have access to a subscription through the library. See the Newspapers guide for more information. Access one of NYT's APIs to gather text data. 

Social Media and More

Documenting the Now 

  • Documenting the Now collects tweet data (tweet IDs) and publishes them as an Open Access data sets. They also maintain a tool called Hydrator that turns the tweet IDs into full tweets.

TAGS (Twitter Archiving Google Sheet)

  • TAGS is a complex Google Sheets template to retrieve Twitter data. This platform supports basic network analysis visualization.

Case.law

  • Case.law is a project aiming to make case law more publicly accessible. Over six million court documents have been digitized from the Harvard Law Library's collections, covering cases from 1658 to 2018.

Genius

  • Genius, formerly Rap Genius, is a reliable web source of song lyrics from all genres. They also publish news, interviews with artists, and other content related to popular music.