Cornell removes restrictions on public domain repros

An ongoing debate in the copyright wars is whether an institution that is making reproductions of public domain materials available should be allowed to dictate terms (usually involving payment) for use of those items. We all know that libraries need money. It’s also true that having digital copies of rare materials available helps preserve the original items. So, if I want to download a public domain book from Google Books — say John Cotton Dana’s book A Library Primer — I get usage guidelines from Google attached to the pdf I’ve downloaded.

Usage guidelines
Google is proud to partner with libraries to digitize public domain materials and make them widely accessible. Public domain books belong to the public and we are merely their custodians. Nevertheless, this work is expensive, so in order to keep providing this resource, we have taken steps to prevent abuse by commercial parties, including placing technical restrictions on automated querying.

We also ask that you:
+ Make non-commercial use of the files We designed Google Book Search for use by individuals, and we request that you use these files for personal, non-commercial purposes.
+ Refrain from automated querying Do not send automated queries of any sort to Google’s system: If you are conducting research on machine translation, optical character recognition or other areas where access to a large amount of text is helpful, please contact us. We encourage the use of public domain materials for these purposes and may be able to help.
+ Maintain attribution The Google “watermark” you see on each file is essential for informing people about this project and helping them find additional materials through Google Book Search. Please do not remove it.
+ Keep it legal Whatever your use, remember that you are responsible for ensuring that what you are doing is legal. Do not assume that just because we believe a book is in the public domain for users in the United States, that the work is also in the public domain for users in other countries. Whether a book is still in copyright varies from country to country, and we can’t offer guidance on whether any specific use of any specific book is allowed. Please do not assume that a book’s appearance in Google Book Search means it can be used in any manner anywhere in the world. Copyright infringement liability can be quite severe.

These are all “suggestions” as near as I can tell. As with the Chicken Coupon fiasco of a few days ago, the implied threat that comes along with this item puts a bit of a damper on the joy that is the public domain. Bleh. We’ve seen other big corporations and libraries doing this as well.

However, this post is mostly to say “Yay” about Cornell’s decision to remove all restrictions on the use of its public domain reproductions. Here’s their press release about it and here is the web page with the new policy. What’s their reasoning? Well among other thigns it’s hard to support a misson of open access and at the same time go out of your way to make materials more difficult to get ahold of and interact with. You can see some of Cornell’s 70,000 public domain items at the Internet Archive.

unintended consequences of Google Books project

I was lucky enough to catch Brewster Kahle talking with Amy Goodman on Democracy Now on my drive home from NJLA. I feel like I’m pretty up on what’s going on with Google and the Internet Archive and book scanning. What I didn’t know is how Google’s agreements with libraries are hindering the IA’s access, not because of the contracts, but just because of differing priorities. The video and transcript are now available online.

AMY GOODMAN: Explain what you mean when you say it’s not legally required. You mean in the contract, what they have with Google? And so, if Google was here, they’d say, “We didn’t say they couldn’t give it to Internet Archive. That’s their prerogative.”

BREWSTER KAHLE
: Correct, that basically Google didn’t put it in their contract. Yet from a library’s perspective, why have a book scanned twice? It’s wear and tear on the books. If they think that—and they wouldn’t have signed it if they didn’t think that the Google thing was a good idea. But now that they’ve signed this with Google, they don’t want it scanned again. And this is a problem, because the books, even the out-of-copyright books, are locked up perpetually.

Libraries of the future – here for you now

One of the fun parts of the Symposium this wekeend was seeing Brewster Kahle talk about stuff. He started out by talking about this book Libraries of the Future that he wanted to scan and put on the Internet Archive. He then talked further about how figuring out who owned the copyrights for it was a total pain in the ass. I’m not even sure if he ever did figure it out; he even had MIT’s librarians working on it. The book is online anyhow. I haven’t looked at books in the Open Library project in a while but how slick is this? Full and slightly messy text here which, amusingly, ends with: PLEASE DO NOT REMOVE CARDS OR SLIPS FROM THIS POCKET.

on metadata and the printed word

last checked out in 1963

I went to the Belmont Public Library this weekend because it’s my boyfriend’s local library and he is, as you might suspect, a heavy library user. The library is in an old building that is clearly reaching the end of its usefulness as a 21st century library, but they seem to do the best they can. They are part of the Minuteman Library Network which means they have access to a lot of consortium-level technology which can really help out when you’re working in an institutional-green building with furniture from the late seventies. I had a good time there in any case and I took some pictures including the one above.

What first got me about this book was that it hadn’t been taken out since 1963. Well, that’s not quite correct. We know it was checked out in 1963 and was possibly checked out after [whatever date the OPAC took over] a date I don’t know. What occurred to me later as I looked at this picture is how much else we know about this book simply by looking at this card.

  • the date the book was acquired by the library
  • the title of the book
  • the last name of the author of the book
  • The patron number of the person who checked the book out last
  • the call number of the book
  • the library the book is from
  • the lending period of the book
  • the date the book was last checked out (before the OPAC)
  • the fact that the library card pocket was union made

That’s a lot of data. I can also, using that data, find the full text of this book both at the Internet Archive (a little messed up, for some reason) and as PDFs (with images) at the Google Books project which is searchable. In fact, there appear to be three versions of this book on Google Books (1, 2, 3) only one of which includes page two which has a photo of the author. Nothing much else to add, just finding this whole exploration process interesting.

Brewster Kahle at TED, discussing free digital libraries

“I’m a librarian. What I am trying to do is bring all of the world’s knowledge to as many people as want to read it. The idea of using technology is perfect for us.” Brewster Kahle gives a twenty minute talk about free culture and libraries and digitzation at TED.

Librarians on the Internet Bookmobile

Many of us have a bookmobile fetish. I know I do. I was heavy in negotiations with the Internet Archive to get to drive their bookmobile around NH/VT with Casey this Summer but life intervened and it didn’t happen. How happy was I, then, to see my friends James and Shinjoung from FreeGovInfo as well as Sarah from the September Project [and a colleague of mine from MaintainIT] driving the adorable van around Northern California. Steve Cisler wrote about the Internet Bookmobile for First Monday several years ago and it’s an article worth reading.

Sarah’s bookmobile posts are here, James and Shinjoung’s posts are here. (hint for drupal blog maintainers, you’ll get better results in Google if you change the URLs for your texonomy to include the term not just a number). They’re still going, through September 15th, if you’re in Northern California, see if you can see them.

Announcing Open Library

Someone asked me during one of my talks if I knew of any projects that were actually trying to open source cataloging records and the idea of authority records. I said I didn’t, not really. It’s a weird juxtaposition, the idea of authority and the idea of a collaborative project that anyone can work on and modify. I knew there were some folks at the Internet Archive working on something along those lines, but the project was under wraps for quite some time. Now, it’s not. Its called Open Library and it’s in demo mode. You can examine it and I encourage you to do that and give lots of feedback to the developers. Make sure to check the “about the librarianship” page

Imagine a library that collected all the world’s information about all the world’s books and made it available for everyone to view and update. We’re building that library.

Open Library/Open Content Alliance announcement from Archive.org

Hi. This is the presentation that Andrea and I are watching right now in San Francisco. The Open Library. Brewster Kahle is talking now and doing a book scanning demonstration. I like how he says “librarians” a lot.

Vision of an Open Library

The Web is So post-1996, what about older content?

Everyone is part of it: Amazon helps “expand the bookstore” but we’re looking for inclusivity.

“A great library for the published works of humankind, accessible to all… everybody involved… libraries LIVE based on the publishing system, they will be involved.”

3 to 4 billion of the 12 billion libraries spend every year goes to publishing. Let’s have more of that go to fairly compensating everyone.

“For the near term, we’re making books from books.” It’s hard to digitze a book that looks like the original, this is the proof that can work.

1. Selection. librarians choose books. Start with out of copyright materials, work towards in print, orphans next. “we’re not going to run out”
2. Scanning. 500 dpi “scribe system” 30-60 min per book. “we can read a 2 pt typeface, straight on” metadata, saved to archive
3. Cataloging. Use library data and coordinate between scanning centers using MetaFetch. Groups like RLG are coordinating.
4. Copyright. Copyright law is “a little confusing” Evidence based interface allows a Q&A “is this book under copyright” interrogation. Many books not re-registered copyright-wise. Already scanned copyright renewal records into a searchable database. Larry Lessig is bringing a suit re: orphan works and whether they can be in the virtual library. Other for-profits are working back the other way. It’s “tricky but doable”
5. Storage. 6 GB per book, hard to scale. Built a petabyte-scale machine “petabox” [I saw it] low power, runs cool, “set top boxes” not full computers with OSes etc. Object is not to have one box in an earthquake zone, but distributed system in flood zones & elsewhere.
6. Readers. Software. Check it out at openlibrary.org. UC librarians chose early set of books already scanned. Also looking into PDFs for printing. Also working with lulu.com for print on demand. Also, you can listen to these books.

Other mentioned projects: ICDL, Internet Archive Bookmobile [buck a book!]. BookShare will use this content for access for the blind. $100 laptop will be integrating books from this project onto their laptops [big news!]. Open Content Alliance to create protocols and formats.

Brewster Kahle: “I don’t know what it will be like to have books from our libraries injected into our culture again, but I’d like to see it”

“Knowledge for the World” is the mantra that all the funders [on and off the podium, 30 seconds each: Smithsonian (museums/content), Yahoo, Sloan Foundation (funding), Johns Hopkins (content/tech), RLG (cataloging), Adobe (display/doc formatting), HP (scan), LizardTech (data compression), Lulu.com (printing), MSN Search (search/funding) etc]

Guy from Yahoo “Finally a library I won’t get thrown out of” and “Find, use, share, and expand all human knowledge”

Andrea has more, including some links that I missed.