OCLC Google LC and you

There’s a lot of talk going on lately about whether cataloging as it has been done really matters in the age of Google and keyword searching. I’ve been reading about it a lot, both online and in the print materials sent to me by the Sanford Berman postal express, including his back and forth letters to the head cataloger at LC. Sometimes it seems that everyone starts with the same data point, but still arrives at different conclusions. So, the OCLC team [who has a dog in this fight] tell us that “Ordinary people do not search subject headings, Berman or LCSH. They search key words. ” which I think many people agree with. Then we read Thomas Mann [another dog-holder] who has a longish article in Library Journal about scholarly research and the ancillary functions of subject headings as more than just entry points to the information held in a catalog.

Keyword search algorithms, no matter how sophisticated their “relevance ranking” capabilities, cannot turn exactly specified words into conceptual categories. They cannot provide the linkages and webs of relationships to other terms (in a variety of languages, too), nor map out in any systematic manner the range of unanticipated aspects of a subject. Keyword searches cannot segregate the desired terms in relevant contexts distinct from the same terms used in irrelevant contexts.

In contrast, LC cataloging and classification—done by professional librarians rather than computer programs—accomplish exactly these functions that are so critical to scholarship. The search mechanisms created by librarians enable systematic searching, not merely desultory information seeking.

We all know Google is useful and is changing the way the average person searches for information. However, when we start to discuss whether Google is changing the way the average researcher does scholarship, then I think we have to be a lot more careful about understanding its [proprietary] mechanisms and thinking about what Google’s goals for Google are as well.

revisting relevance

I was poking around on Amazon.com today and noticed two things

  1. They have changed my name from Jessamyn Charity West to Jessamyn West which means that clicking on my name gets you all the books by the other Jessamyn West. I can only imagine why this happened and, to be fair, they would be changing it back to how it was before. I complained and they changed it, but not before telling me that this sort of munging of author names was “a feature” of their system. The change is recent, the Google cache still contains my full name.
  2. Amazon’s Statistically Improbable Phrases which is a whole new approach to the sticky issue of “aboutness” Add ot this the existing tools of concordance and readibility and you’ve got two things 1) strong “keeping up with the Joneses” pressure to submit to the Inside the Book program 2) the beginnings of cataloging by robots.

This all came to me a day after getting a fat envelope from Sandy Berman which included, among other things some articles he had written about “bibliocide by cataloging” where subject headings assigned by OCLC or LoC or OCLC member libraries and passed down to thousands of libraries via copy cataloging are so vague as to be essentially useless as finding aids. Do these Amazon features solve this problem or compound it? Eli also expands a bit on what I said about Google a few days ago; these issues are not disconnected.

“Why catalog in-house? Why catalog locally? And why not outsource the whole operation? Because critical, creative catalogers within individual systems are the last and only bulwarks against the often error-laden, access-limiting, and alienating records produced by giant, distant, and essentially unaccountable networks and vendors.”