the nature of observing disturbs the observed

Danny Sullivan explains why librarians might care about what he calls “the biggest change that has ever happened in search engines” Google’s Personalized Results. [juice]

stats vs. privacy – the techsoup take

TechSoup uses Google Analytics to track site visits and other statistics. I’ve said for a while now that the more data you can get about people using your websites, the more you can translate these into requests for funding, staffing and other improvements in your institution. Elliot Harmon wrote a good article about the things to keep in mind as you start using these tools. I gave a few pullquotes for it: Site Statistics and User Privacy for Nonprofit Websites.

errors, on fixing

I read with interest this blog post over on Freedom to Tinker about the Google Book Search folks talking about finding and fixing errors in their giant catalog, metadata errors especially. The conversation seems to have largely started at this post on LanguageLog and gotten more interesting with follow-up comments from folks at Google. One of the things we have all learned in libraryland is that the ability to trawl through our data with computers means that we can find errors that might have otherwise stayed buried for years, or perhaps forever. Of course computers also help us create these errors in the first place.

What’s most interesting to me is a seeming difference in mindset between critics like Nunberg on the one hand, and Google on the other. Nunberg thinks of Google’s metadata catalog as a fixed product that has some (unfortunately large) number of errors, whereas Google sees the catalog as a work in progress, subject to continual improvement. Even calling Google’s metadata a “catalog” seems to connote a level of completion and immutability that Google might not assert. An electronic “card catalog” can change every day — a good thing if the changes are strict improvements such as error fixes — in a way that a traditional card catalog wouldn’t.

Note: thanks to people who let me know that one link was wrong, and that I managed to typo both “computers” and “interesting” in this post.

a few late summer links

I’ve been scooting around a little bit lately and here are some things that have been crossing my virtual desk. I’ve also dealt with two wordpress issues [a hack! and an outdated sidebar navigation element] and I’ve upgraded to the latest version of WordPress. If you’re on a Summer schedule, I’d suggest upgrading before things get hectic.

some copyright visualization

With the Google Books settlement coming up, a lot of people have been talking about copyright. I think this is generally speaking a really good thing. Here are some useful visualizations that may help you get your head around it.

- From the Financial Times is this article about what the Google business model could mean for out of print books and orphan works. According to their graphic [above] there are a lot of books wiht unclear status in US libraries that we should be concerned about.
- From ALA’s Copyright Advisory Network (a project of the Office of Information and Technology policy) comes a few helpful tools for looking at copyright as it pertains to libraries

why I don’t accept guest posts from spammers, or link to them

I get an email maybe once a week from someone with a human-sounding name saying they read my blog and think they have something my readers might be interested in. Or they offer to do a guest post on my blog. The link is usually some sort of vaguely useful list of something library-related but the URL of the website is not library-related. In fact the URL of the website is usually something like onlinenursepractitionerschools.com, searchenginecollege.com or collegedegree.com (which if you’ll notice is the top hit on google for a search for college degree). I sometimes see other libloggers linking to sites like these and I have a word of advice: don’t. When we link to low-content sites from our high-content sites, we are telling Google and everyone that we think that the site we are linking to is in some way authoritative, even if we’re saying they’re dirty scammers. We’re helping their page rank and we’re slowly, infinitesimally almost, decreasing the value of Google and polluting the Internet pool in which we frequently swim. Don’t link to spammers.

This is a linkless post, for obvious reasons.

EFF takes on Google Books privacy issues

Normally I’m not much of a joiner, but… “EFF is gathering a group of authors (or their heirs or assigns) who are concerned about the Google Book Search settlement and its effect on the privacy and anonymity of readers. This page provides basic information for authors and publishers who are considering whether to join our group.”

You can join too, if you’d like.

just to make sure we’re all on the same page here

“A team from Google interviewed dozens of people in Times Square the other day, asking a simple question: What’s a browser? This was in an effort to understand and improve the customer experience of Google’s own browser, called Chrome.

Turns out that over 90% of the people interviewed could not describe what a Web browser is.”

Don’t believe me? Watch the video. Granted, this comes from Google, but while we’re all being “blah blah Firefox, etc” there are many people who just see what happens when you “click the e” and go forward from there.

unintended consequences of Google Books project

I was lucky enough to catch Brewster Kahle talking with Amy Goodman on Democracy Now on my drive home from NJLA. I feel like I’m pretty up on what’s going on with Google and the Internet Archive and book scanning. What I didn’t know is how Google’s agreements with libraries are hindering the IA’s access, not because of the contracts, but just because of differing priorities. The video and transcript are now available online.

AMY GOODMAN: Explain what you mean when you say it’s not legally required. You mean in the contract, what they have with Google? And so, if Google was here, they’d say, “We didn’t say they couldn’t give it to Internet Archive. That’s their prerogative.”

BREWSTER KAHLE
: Correct, that basically Google didn’t put it in their contract. Yet from a library’s perspective, why have a book scanned twice? It’s wear and tear on the books. If they think that—and they wouldn’t have signed it if they didn’t think that the Google thing was a good idea. But now that they’ve signed this with Google, they don’t want it scanned again. And this is a problem, because the books, even the out-of-copyright books, are locked up perpetually.

finger pointing when digital archives disappear

I really enjoyed this article about Google buying up the Paper of Record digital news archives and then “disappearing” it somehow. The timeline is a little unclear and it’s back online for now, but as Google figures out how to monetize it and researchers yowl about lack of access, it raises some pretty interesting issues about scholarship. As information ownership changes hands — and I think if we weren’t talking about Google here we’d be talking about someone else, so it’s not really about them — data can literally disappear either behind a paywall or just gone. Particularly poignant in this case is the comment (sorry no permalink) on the Inside Higher Ed story by Bob Huggins the original founder/creator of the archive discussing what’s happening with the archive now.

When exactly does the cat fight end? It slays me to see the great American Us versus Them debate rage on( I comment as a Canadian). As person who pioneered the digitization of newspapers in the world with our company, Cold North Wind, I fail to see how this acrimony between Academics and Google helps ‘joe public’ access the public record. I have stated on numerous occasions that the newspaper represents ‘our’ only record of daily public life for the past 500 years with a special emphasis on the word “public”… I have been through the grinding wheels of both Google and many public institutions whose goal it seems is to preserve and present history from Newspapers. Both have let me down.