
image from The card catalogue: a practical manual for public and private libraries via Open Library
When people ask me what skills will be useful for the 21st Century Librarian one of the things I frequently mention is being able to work with giant datasets. This is true for many professions such as journalism but the past few years, even the past few months have really shown some exciting opportunities for people who work with libraries, and peopla who love metadata. Harvard’s release of 12 million bibliographic records was only the most recent giant dataset made available. Interested data manipulators also have metadata from the University of Mighigan, Cambridge University, the British Library, some records from the Library of Congress, University of North Carolina, Toronto Public Library and more smaller libraries and archives can be found via the Internet Archive. Exciting times to be sure.
Apparently the Copyright Office is trying to figure out how to put its card catalog of pre-1978 records online: http://www.infodocket.com/2012/05/01/u-s-copyright-office-posts-request-for-information-to-build-a-virtual-card-catalog-of-records/
People in the English-speaking MARC-library world seem to focus on this world, thus overseeing big open library data publications from other countries and in other formats. For example the hbz – disclaimer: I work there – made the first bigger open data publication in March 2010 (see the press release). Initially it had been 5 Million records, we are now up to more than 10 Million (see entry at the Data Hub) and hope to release the whole 18 M records in the next months.
Another exampley from Germany: Recently, the German library networks BVB and KOBV released their whole catalogs (more than 23 Milli9on records) in MARC format and in RDF, see here.
For all interested in open bibliographic data (according to the Principles on Open Bibliographic Data), there is a group at the open data registry “the Data Hub” which is maintained by the OKFN Working Group on Open Bibliographic Data: http://thedatahub.org/group/bibliographic.