I’ve mentioned Daniel Reetz’s DIY portable book scanner here before. It’s a great combination of an interesting thing to look at, an interesting project to contemplate and a bit of a gauntlet tossed down as far as bigger questions of why we leave scanning up to the big companies, etc. At the end of my Tiny Tech talks I usually mention it as something in the realm of the possible, even if in a Dream Big way. Daniel was at D is for Digitize last month — a conference I missed because I was in Nevada — and I noticed some interesting back and forth about his scanner project show up in the Library Law blog.
I was lucky enough to catch Brewster Kahle talking with Amy Goodman on Democracy Now on my drive home from NJLA. I feel like I’m pretty up on what’s going on with Google and the Internet Archive and book scanning. What I didn’t know is how Google’s agreements with libraries are hindering the IA’s access, not because of the contracts, but just because of differing priorities. The video and transcript are now available online.
AMY GOODMAN: Explain what you mean when you say it’s not legally required. You mean in the contract, what they have with Google? And so, if Google was here, they’d say, “We didn’t say they couldn’t give it to Internet Archive. That’s their prerogative.”
BREWSTER KAHLE: Correct, that basically Google didn’t put it in their contract. Yet from a library’s perspective, why have a book scanned twice? It’s wear and tear on the books. If they think that—and they wouldn’t have signed it if they didn’t think that the Google thing was a good idea. But now that they’ve signed this with Google, they don’t want it scanned again. And this is a problem, because the books, even the out-of-copyright books, are locked up perpetually.
I went to the Belmont Public Library this weekend because it’s my boyfriend’s local library and he is, as you might suspect, a heavy library user. The library is in an old building that is clearly reaching the end of its usefulness as a 21st century library, but they seem to do the best they can. They are part of the Minuteman Library Network which means they have access to a lot of consortium-level technology which can really help out when you’re working in an institutional-green building with furniture from the late seventies. I had a good time there in any case and I took some pictures including the one above.
What first got me about this book was that it hadn’t been taken out since 1963. Well, that’s not quite correct. We know it was checked out in 1963 and was possibly checked out after [whatever date the OPAC took over] a date I don’t know. What occurred to me later as I looked at this picture is how much else we know about this book simply by looking at this card.
- the date the book was acquired by the library
- the title of the book
- the last name of the author of the book
- The patron number of the person who checked the book out last
- the call number of the book
- the library the book is from
- the lending period of the book
- the date the book was last checked out (before the OPAC)
- the fact that the library card pocket was union made
That’s a lot of data. I can also, using that data, find the full text of this book both at the Internet Archive (a little messed up, for some reason) and as PDFs (with images) at the Google Books project which is searchable. In fact, there appear to be three versions of this book on Google Books (1, 2, 3) only one of which includes page two which has a photo of the author. Nothing much else to add, just finding this whole exploration process interesting.
In light of the recent Google Books/APA settlement, Harvard has examined the details and decided not to be part of the project after all.
Harvard’s university-library director, Robert C. Darnton, wrote in a letter to the library staff, “the settlement provides no assurance that the prices charged for access will be reasonable, especially since the subscription services will have no real competitors [and] the scope of access to the digitized books is in various ways both limited and uncertain.” He also expressed concern about the quality of the scanned books, which “in many cases will be missing photographs, illustrations, and other pictorial works, which will reduce their utility for research.”
Update: According to the comments, I had this sort of wrong. Harvard is deciding to not have Google scan their copyrighted books but the digitzation project proceeds apace. Thanks Jon.
photo originally from akseabird
I’m pretty skeptical when people call anything for sale “revolutionary.” However, a friend sent me this photo which was up on Flickr. It’s a tool called the Bookeye book scanner. It’s a library digitzation product, but if you look at the photo, it’s being used as a tool for the public — or University of Alaska at Anchorage students — to scan documents to PDF, JPG, TIFF or PNG and then save to USB drive, burn them to a CD, ftp them, save them to a network drive or email them to themselves. Their website even has usage stats that shows what people did with the first million pages they scanned. Good data, and it’s broken down by library type which is even more interesting to me, to see the differences in usage patterns. [thanks manuel]
The University of Michigan has hit the “one million books scanned” milestone. As far as I know Michigan is the first library to have one million books from its own collections digitized and available for search (and, when in the public domain, available for viewing.)
For more about the scanning project generally including some insight into why people call it controversial, there’s a good long article from Campus technology (link to printable version, all on one page) which gos into the logistics of the scanning program in some depth.
When it comes down to it, then, this brave new world of book search probably needs to be understood as Book Search 1.0. And maybe participants should not get so hung up on quality that they obstruct the flow of an astounding amount of information. Right now, say many, the conveyor belt is running and the goal is to manage quantity, knowing that with time the rest of what’s important will follow. Certainly, there’s little doubt that in five years or so, Book Search as defined by Google will be very different. The lawsuits will have been resolved, the copyright issues sorted out, the standards settled, the technologies more broadly available, the integration more transparent.