big metadata sets that anyone can have


image from The card catalogue: a practical manual for public and private libraries via Open Library

When people ask me what skills will be useful for the 21st Century Librarian one of the things I frequently mention is being able to work with giant datasets. This is true for many professions such as journalism but the past few years, even the past few months have really shown some exciting opportunities for people who work with libraries, and peopla who love metadata. Harvard’s release of 12 million bibliographic records was only the most recent giant dataset made available. Interested data manipulators also have metadata from the University of Mighigan, Cambridge University, the British Library, some records from the Library of Congress, University of North Carolina, Toronto Public Library and more smaller libraries and archives can be found via the Internet Archive. Exciting times to be sure.

why you can’t google a library book

The Guardian has a long article about what the mechanisms are that keep local library catalogs form being effectively spidered and Googleable. They dip into the complicated area that is policies around record-sharing and talk about OCLCs changed policy concerning WorldCat data. This policy, if you’ve been keeping close track, was slated to be effective in February and, thanks in no small part to the groundswell of opposition, is currently being delayed until at least third quarter 2009.

file under: big big datasets

I’ve been chitchatting with Simon as he’s been compiling and data-cleaning his set of LoC authority records. He’s at ALA now, and the data has been released into the wild. There’s something that warms my little librarian heart getting to read raw MARC on my own little laptop. Try it yourself!