报告题目：The HathiTrust Research Center: Creating New Big Data Scholarly Research Services
报告人：Prof. Dr. J. Stephen Downie & Dr. Peter Organisciak
GSLIS, University of Illinois at Urbana-Champaign, USA
主持人：叶鹰 教授 (Prof. Dr. Fred Y. Ye)
J. Stephen Downie is the associate dean for research and a professor at the School of Information Sciences at the University of Illinois at Urbana-Champaign. He is also the Illinois co-director of the HathiTrust Research Center. Professor Downie has been an active participant in the digital libraries and digital humanities research domains. He is best known for helping to establish an vibrant music information retrieval research community. Since 2005, he has directed the annual Music Information Retrieval Evaluation eXchange (MIREX). He also was a founder of the International Society Music Information Retrieval (ISMIR) and its first president.
Peter Organisciak is a Postdoctoral Research Associate at the HathiTrust Research Center, with a focus on large-scale text analysis and crowdsourcing. His work has been featured in New Scientist and Slate, and has received paper awards from the Association for the Advancement of Artificial Intelligence (AAAI) and the Association for Information Science and Technology (ASIS&T). Peter's work in the digital humanities has won an Outstanding Contribution Award from the Canadian Society for Digital Humanities.
The HathiTrust Digital Library contains over 15 million volumes (over 5 billion pages). Unfortunately, legal restrictions make it difficult to share roughly 9 million of these volumes, and the complexities of scale present hurdles to scholarly research over the collection. To overcome these problems, the HathiTrust Research Center (HTRC) is creating a set of “non-consumptive research” services to make these materials more open and more useful to scholars. This talk introduces a set of non-consumptives services, including “Data Capsules,” “Extracted Features” and the “Bookworm + HathiTrust” tool. Each HTRC service is designed to open new points of access to otherwise closed data while still respecting all copyright limitations.