Why sourcing photos matters – how misattribution is amplified on the web

I wrote an article for Computers in Libraries last week about the PicPedant account on twitter and the odd preponderance/problem of unsourced images flying around the internet. This is just a true thing about how the internet works and people have been misattributing things since forever. However, there’s a new wrinkle in this process where the combination of popular blogs/twitter accounts along with some of the “secret sauce” aspects to how Google works creates this odd phenomenon which can actually amplify misinformation more than you might expect. Here’s my example.

Hans Lansgeth

This man is Hans Langseth. I know this because I was a kid who read the Guinness Book of World’s Records a lot and I recognized him from other pictures. He has the longest beard in the world. The image on the right is a clever photoshop. However, if you Google Image search Hans Steininger, you will also find many versions of this photo. This is curious because Hans Steininger (another hirsute gentleman) died in 1567, pre-photography. His beard was also about four feet long whereas Langseth’s beard was more like 18+ feet long.

What happened? Many websites have written little lulzy clickbait articles about Steininger (sourcing other articles that themselves source actual articles at reputable-ish places like Time magazine which are inaccessible because of paywalls) and how he supposedly ironically died tripping over his own beard. They all link to the image of Langseth and don’t really mention the guy in the photograph is a different guy. The image and the name get hand-wavily semantically linked and search engines can’t really do a reality check and say “Hey, we use this image for a different guy” or “Hey, we can’t have a photograph of this guy because he lived in the 1500s”

google results for hans Steininger

Not a huge deal, the world isn’t ending, I don’t think the heirs of Langseth are up in arms about this. However as more and more people just presume the search engine and the “hive mind” approach to this sort of thing results in the correct answer, it’s good to have handy counterexamples to explain why we still need human eyeballs even as “everything” is on the web.

Fair! Google Books case dismissed.

original ferris wheel - from the Open Library

Karen Coyle has done an excellent write up of this so I will refer you there.

The full impact of this ruling is impossible (for me) to predict, but there are many among us who are breathing a great sigh of relief today. This opens the door for us to rethink digital scholarship based on materials produced before information was in digital form.

Folks can read the actual ruling (pdf) if they’d like. This is a very big deal. Thanks to folks who worked so hard on getting us to this place. I’ll add a few links here as they come in.

  • Kenneth Crews, Columbia Copyright Advisory Office: “This ruling joins court decisions about HathiTrust and electronic reserves in demonstrating that even extensive digitization can be within fair use where the social benefits are strong and the harm to rightsholders is constrained. There will be more to come as we transition into a new era of copyright, technology, and even reading.”
  • Brandon Butler, ARL Policy Notes blog; “The decision is a victory not only for transformative, non-consumptive search, but also for serving “traditionally underserved” libraries and their users, including disabled patrons.”
  • Paul Alan Levy: “This ruling provides a road map that allows any other entity to follow in Google’s path.”
  • Timothy Lee, Washington Post: “Many innovative media technologies involve aggregating or indexing copyrighted content. Today’s ruling is the clearest statement yet that such projects fall on the right side of the fair use line.”
  • Mike Masnick at Techdirt: “It all comes together in making a very strong argument that Google’s book scanning promotes the progress of the arts and sciences just like copyright is supposed to do.”
  • InfoDocket also has an updating list of links to discussion of the decision.

How to solve impossible problems with Google… by Google

Like many library people, I get annoyed when I tell people I can’t find something on their website and they tell me how to search for it. That said, I know there are things I still don’t know about searching and I like learning what they are. Greg Notess’ Search Engine Showdown is always a first stop. I also enjoyed this post–How to Solve Impossible Problems–about Google research scientist Daniel Russell’s presentation to a group of investigative journalists last week. It’s got two great parts

1. The impossible problem which is just a fun sleuthing puzzle about how to identify a randomish photo (though not so random as it turns out, solution explained)
2. Even more tips about Google that I hadn’t known including the public data explorer and using the word “diagram” when looking for schematic type stuff. Makes sense now that you think about it, hadn’t really thought about it much before.

you can’t be neutral on a moving search – skepticism about search neutrality

My inbox is full of little library links and it’s a snow day so I’m settling down to read some longer pieces that I’ve felt that I haven’t had time for. James Grimmelmann is a friend and one of the more readable writers talking about technology and law and the muddy areas where they overlap. He’s written a nice essay on search engine neutrality. What it is, why you might care, who is working on it and how attainable a goal it may or may not be. Specifically, what does it really mean to be neutral, and who decides and who legislates? Quite relevant to all information seeking and finding professionals.

Good reading for a snowy weekday: Some Skepticism About Search Neutrality.

Search neutrality gets one thing very right: Search is about user autonomy. A good search engine is more exquisitely sensitive to a user’s interests than any other communications technology. Search helps her find whatever she wants, whatever she needs to live a self-directed life. It turns passive media recipients into active seekers and participants. If search did not exist, then for the sake of human freedom it would be necessary to invent it. Search neutrality properly seeks to make sure that search is living up to its liberating potential.

Having asked the right question—are structural forces thwarting search’s ability to promote user autonomy?—search neutrality advocates give answers concerned with protecting websites rather than users. With disturbing frequency, though, websites are not users’ friends. Sometimes they are, but often, the websites want visitors, and will be willing to do what it takes to grab them.

Copyright is killing sound archiving and fair use isn’t doing so well either


Fair Use poster image by Timothy Vollmer

The Library of Congress just released its 181 page report “The State of Recorded Sound Preservation in the United States: A National Legacy at Risk in the Digital Age” talking about the challenges of digitally archiving sound recording. BoingBoing gives a nice summary “[T]he copyright laws that the recording industry demanded are so onerous that libraries inevitably have to choose whether to be law-breakers or whether to abandon their duty to preserve and archive audio.” More analysis from OSNews.

And if anyone’s wondering where I’ve been this week, the answer is “Mired in getting copyright permissions for the intellectual property in my book. Thanks for asking.” I have a pretty firm grasp of Fair Use and have been trying to follow the guidelines for Fair Use in Media Literacy Education. I signed a book contract that specifically says that I am responsible for assuring that my materials are being used with permission. Despite this, my publisher (who I am quite fond of otherwise) is risk-averse and wants to make sure I have permission anyhow. Permission that I assert that I don’t need for small screenshots of, say, Google search results or an ALA nested menu.

This gets even more confusing when some of the organizations involved claim that I need permission when I don’t. Since Fair Use, like the Americans with Disabilities Act, is mostly something that gets hammered out through litigation there is no strict set of guidelines as to what Fair Use is. So, big companies with a lot to lose err on the side of compliance with other big companies’ requests, requests that may be extralegal. So Google can’t legally tell you to only use the public domain offerings from Google Books (which they admit) but they make a polite request, a polite request that sounds a lot like a terms of service.

So right now I’m waiting to hear back from Facebook after filling out a form on their website asking for permission to use a screenshot. They say it will take 1-2 weeks. I am confident that my screenshot is fair use. My editor also thinks it is fair use. However they’re not willing to risk it. And so we wait.

librarians’ search for neutrality a precursor to debate over Google rankings

“The idea that search engines can, or should, be neutral can be traced back to a movement of leftist librarians in the 1970s. Led by Sanford Berman, one of the first to bring social rebellion into the library, radical librarians argued that the system used to organize books was inherently biased and racist because it reflected a Western perspective.”

the nature of observing disturbs the observed

Danny Sullivan explains why librarians might care about what he calls “the biggest change that has ever happened in search engines” Google’s Personalized Results. [juice]

stats vs. privacy – the techsoup take

TechSoup uses Google Analytics to track site visits and other statistics. I’ve said for a while now that the more data you can get about people using your websites, the more you can translate these into requests for funding, staffing and other improvements in your institution. Elliot Harmon wrote a good article about the things to keep in mind as you start using these tools. I gave a few pullquotes for it: Site Statistics and User Privacy for Nonprofit Websites.

errors, on fixing

I read with interest this blog post over on Freedom to Tinker about the Google Book Search folks talking about finding and fixing errors in their giant catalog, metadata errors especially. The conversation seems to have largely started at this post on LanguageLog and gotten more interesting with follow-up comments from folks at Google. One of the things we have all learned in libraryland is that the ability to trawl through our data with computers means that we can find errors that might have otherwise stayed buried for years, or perhaps forever. Of course computers also help us create these errors in the first place.

What’s most interesting to me is a seeming difference in mindset between critics like Nunberg on the one hand, and Google on the other. Nunberg thinks of Google’s metadata catalog as a fixed product that has some (unfortunately large) number of errors, whereas Google sees the catalog as a work in progress, subject to continual improvement. Even calling Google’s metadata a “catalog” seems to connote a level of completion and immutability that Google might not assert. An electronic “card catalog” can change every day — a good thing if the changes are strict improvements such as error fixes — in a way that a traditional card catalog wouldn’t.

Note: thanks to people who let me know that one link was wrong, and that I managed to typo both “computers” and “interesting” in this post.

a few late summer links

I’ve been scooting around a little bit lately and here are some things that have been crossing my virtual desk. I’ve also dealt with two wordpress issues [a hack! and an outdated sidebar navigation element] and I’ve upgraded to the latest version of WordPress. If you’re on a Summer schedule, I’d suggest upgrading before things get hectic.