Posted in 'puters | Friday, December 17th, 2010 | 2 Comments »
Tags: googlebooks, googlelabs, hitler, ocr

So hey this is interesting. I’ve skipped a lot of the Google Books ebookstore stuff lately because I’m honestly not sure what to make of it. And I don’t buy books anyhow. But a friend mentioned this Google Labs Ngram viewer, a fun tool that lets you search the full corpus of the Google Books databases. Here’s a New York Times article about it and data geeks should read the article Quantitative Analysis of Culture Using Millions of Digitized Books (free reg. required – click for PDF ILL) or nose around in the datasets. I did my own dopey search pictures above – Hegel vs. Hitler. And here’s what’s interesting. The big jump in the late 1940′s is fairly predictable, but who was talking about Hitler in 1620?
I clicked through and poked around some and here’s what I found. No one was talking about Hitler. OCR is, as you know, imperfect. So the words that Google Books’ optical character recognition thought of as “Hitler” were actually words like “Ruler” and “bitter” and “herbe.” How about that?
Posted in books | Thursday, February 25th, 2010 | 3 Comments »
Tags: bibliohoax, books, fortsas, googlebooks, hoax
“Jean Nepomucene Auguste Pichauld, Comte de Fortsas, was a man with a singular passion. He collected books of which only one copy was known to exist…. [W]hen he died on September 1, 1839 he possessed only fifty-two books, but each of them was absolutely unique. His heir, not sharing the old man’s passion for book collecting, arranged for an auction to sell off the library”
Compelling no? The auction really happened, the rest of it is made up, the creation of a local antiquarian, having a bit of a practical joke. Read more at blacksundae, or see the auction catalog, itself a rarity, on Google Books.
Posted in hi | Tuesday, November 3rd, 2009 | 2 Comments »
Tags: bookscanner, danreetz, diybookscanner, fakeproject, googlebooks, scanning
I’ve mentioned Daniel Reetz’s DIY portable book scanner here before. It’s a great combination of an interesting thing to look at, an interesting project to contemplate and a bit of a gauntlet tossed down as far as bigger questions of why we leave scanning up to the big companies, etc. At the end of my Tiny Tech talks I usually mention it as something in the realm of the possible, even if in a Dream Big way. Daniel was at D is for Digitize last month — a conference I missed because I was in Nevada — and I noticed some interesting back and forth about his scanner project show up in the Library Law blog.
Posted in blogz | Wednesday, August 26th, 2009 | 1 Comment »
Tags: berniemargolis, google, googlebooks, lfpl, lsw, wordpress
I’ve been scooting around a little bit lately and here are some things that have been crossing my virtual desk. I’ve also dealt with two wordpress issues [a hack! and an outdated sidebar navigation element] and I’ve upgraded to the latest version of WordPress. If you’re on a Summer schedule, I’d suggest upgrading before things get hectic.
Posted in books | Wednesday, August 19th, 2009 | 2 Comments »
Tags: ala, alaoitp, copyright, copyrightadvisorynetwork, financialtimes, ft, google, googlebooks

With the Google Books settlement coming up, a lot of people have been talking about copyright. I think this is generally speaking a really good thing. Here are some useful visualizations that may help you get your head around it.
- From the Financial Times is this article about what the Google business model could mean for out of print books and orphan works. According to their graphic [above] there are a lot of books wiht unclear status in US libraries that we should be concerned about.
- From ALA’s Copyright Advisory Network (a project of the Office of Information and Technology policy) comes a few helpful tools for looking at copyright as it pertains to libraries
Posted in access | Tuesday, July 21st, 2009 | Comments Off
Tags: eff, google, googlebooks, laws, lawsuit, prvacy
Normally I’m not much of a joiner, but… “EFF is gathering a group of authors (or their heirs or assigns) who are concerned about the Google Book Search settlement and its effect on the privacy and anonymity of readers. This page provides basic information for authors and publishers who are considering whether to join our group.”
You can join too, if you’d like.
Posted in access | Tuesday, May 12th, 2009 | 2 Comments »
Tags: books, copyright, cornell, googlebooks, internetarchive, publicdomain
An ongoing debate in the copyright wars is whether an institution that is making reproductions of public domain materials available should be allowed to dictate terms (usually involving payment) for use of those items. We all know that libraries need money. It’s also true that having digital copies of rare materials available helps preserve the original items. So, if I want to download a public domain book from Google Books — say John Cotton Dana’s book A Library Primer — I get usage guidelines from Google attached to the pdf I’ve downloaded.
Usage guidelines
Google is proud to partner with libraries to digitize public domain materials and make them widely accessible. Public domain books belong to the public and we are merely their custodians. Nevertheless, this work is expensive, so in order to keep providing this resource, we have taken steps to prevent abuse by commercial parties, including placing technical restrictions on automated querying.
We also ask that you:
+ Make non-commercial use of the files We designed Google Book Search for use by individuals, and we request that you use these files for personal, non-commercial purposes.
+ Refrain from automated querying Do not send automated queries of any sort to Google’s system: If you are conducting research on machine translation, optical character recognition or other areas where access to a large amount of text is helpful, please contact us. We encourage the use of public domain materials for these purposes and may be able to help.
+ Maintain attribution The Google “watermark” you see on each file is essential for informing people about this project and helping them find additional materials through Google Book Search. Please do not remove it.
+ Keep it legal Whatever your use, remember that you are responsible for ensuring that what you are doing is legal. Do not assume that just because we believe a book is in the public domain for users in the United States, that the work is also in the public domain for users in other countries. Whether a book is still in copyright varies from country to country, and we can’t offer guidance on whether any specific use of any specific book is allowed. Please do not assume that a book’s appearance in Google Book Search means it can be used in any manner anywhere in the world. Copyright infringement liability can be quite severe.
These are all “suggestions” as near as I can tell. As with the Chicken Coupon fiasco of a few days ago, the implied threat that comes along with this item puts a bit of a damper on the joy that is the public domain. Bleh. We’ve seen other big corporations and libraries doing this as well.
However, this post is mostly to say “Yay” about Cornell’s decision to remove all restrictions on the use of its public domain reproductions. Here’s their press release about it and here is the web page with the new policy. What’s their reasoning? Well among other thigns it’s hard to support a misson of open access and at the same time go out of your way to make materials more difficult to get ahold of and interact with. You can see some of Cornell’s 70,000 public domain items at the Internet Archive.
Posted in books | Friday, May 1st, 2009 | 2 Comments »
Tags: amygoodman, democracynow, digitzation, google, googlebooks, internetarchive, scanning
I was lucky enough to catch Brewster Kahle talking with Amy Goodman on Democracy Now on my drive home from NJLA. I feel like I’m pretty up on what’s going on with Google and the Internet Archive and book scanning. What I didn’t know is how Google’s agreements with libraries are hindering the IA’s access, not because of the contracts, but just because of differing priorities. The video and transcript are now available online.
AMY GOODMAN: Explain what you mean when you say it’s not legally required. You mean in the contract, what they have with Google? And so, if Google was here, they’d say, “We didn’t say they couldn’t give it to Internet Archive. That’s their prerogative.”
BREWSTER KAHLE: Correct, that basically Google didn’t put it in their contract. Yet from a library’s perspective, why have a book scanned twice? It’s wear and tear on the books. If they think that—and they wouldn’t have signed it if they didn’t think that the Google thing was a good idea. But now that they’ve signed this with Google, they don’t want it scanned again. And this is a problem, because the books, even the out-of-copyright books, are locked up perpetually.
Posted in 'puters | Monday, April 20th, 2009 | 2 Comments »
Tags: google, googlebooks, jamesgrimmelmann, ncls, njla, talks, twitter
I’ve been reading more, typing less. My super-bloggy friends told me lat year sometime that a lot of their friends were blogging less and Twittering more. I was surprised to hear that since it hadn’t really trickled down to my neck of the woods yet, but lately it has. While I still stay on top of my RSS feeds, I suspect that I can only do that because people are blogging less. I don’t know if they’re twittering more, having babies, buying houses or doing something else. I know what I’ve been doing: reading.
I’ve also been travelling which is probably not a totally fun thing to read about [if I could delete everyone's tweets from airports, I would -- unless they're me looking for someone to hang out with when my flight has been delayed] but I go through periods of educating, followed by periods of learning, etc. I also made a resolution to myself for this year to write new talks (some similar slides okay, all similar slides against the rules) so when I give talks, they’re more work but also better, I think. I’ll be doing a 2.0 talk in upstate New York for NCLS and then a few talks at NJLA next week. Lots of writing, good stuff to pass on.
What’s been really on my mind lately is the Google Books settlement. I happen to be lucky that an old time friend of mine from the blogger days, James Grimmelmann, is one of the major players in the “explain this to everyone” field day that is going on. He’s also a keen legal mind and a great writer so it’s been a joy to read what he and others have been writing. Here are some links to essays that may help you understand things.
Posted in books | Monday, March 2nd, 2009 | 2 Comments »
Tags: books, googlebooks, internetarchive, metadata, publiclibrary, scanning

I went to the Belmont Public Library this weekend because it’s my boyfriend’s local library and he is, as you might suspect, a heavy library user. The library is in an old building that is clearly reaching the end of its usefulness as a 21st century library, but they seem to do the best they can. They are part of the Minuteman Library Network which means they have access to a lot of consortium-level technology which can really help out when you’re working in an institutional-green building with furniture from the late seventies. I had a good time there in any case and I took some pictures including the one above.
What first got me about this book was that it hadn’t been taken out since 1963. Well, that’s not quite correct. We know it was checked out in 1963 and was possibly checked out after [whatever date the OPAC took over] a date I don’t know. What occurred to me later as I looked at this picture is how much else we know about this book simply by looking at this card.
- the date the book was acquired by the library
- the title of the book
- the last name of the author of the book
- The patron number of the person who checked the book out last
- the call number of the book
- the library the book is from
- the lending period of the book
- the date the book was last checked out (before the OPAC)
- the fact that the library card pocket was union made
That’s a lot of data. I can also, using that data, find the full text of this book both at the Internet Archive (a little messed up, for some reason) and as PDFs (with images) at the Google Books project which is searchable. In fact, there appear to be three versions of this book on Google Books (1, 2, 3) only one of which includes page two which has a photo of the author. Nothing much else to add, just finding this whole exploration process interesting.