jstor

There has been a lot of great writing about copyright and access to our cultural and intellectual history in the weeks since Aaron Swartz’s death. I have been retreading some of my old favorite haunts to see if there was stuff I didn’t know about the status of access to online information especially in the public domain (pre-1923 in the US) era.

I talk like a broken record about how I think the best thing that libraries can do, academic libraries in particular, is to make sure that their public domain content is as freely accessible as possible. This is an affirmative decision that Cornell University made in 2009 and I think it was the right decision at the right time and that more libraries should do this. Some backstory on this.

Cornell Press release announcing this decision
Librarian and Policy Advisor Peter Hirtle’s article for Research Library Issues about this decision, Removing All Restrictions: Cornellâ€™s New Policy on Use of Public Domain Reproductions, and the thinking that went into it.
Cornell’s 70,000 items accessible via the Internet Archive
Cornell’s guidelines for use of these materials

So, if I wanted to share an image from a book that Cornell has made available, I have to check the guidelines link above and then I can link to the image, you can go see it and then you can link to the image and do whatever you want with it, including sell it. This is public domain. The time and money that went into making a digital copy of this image have been borne by the Internet Archive and Cornell University. The rights page on the item itself (which I can download in a variety of formats) is clear and easy to understand.

Compare and contrast JSTOR. Now let me be clear, I am aware that JSTOR is a (non-profit) business and Cornell is a university and I am not saying that JSTOR should just make all of their public domain things free for everyone (though that would be nice), I am just outlining the differences as I see them in accessing content there. I had heard that there were a lot of journals on JSTOR that were freely available even to unaffiliated people like myself. I decided to go looking for them. I found two different programs, the Register and Read program (where registered users can access a certain number of JSTOR documents for free) and the Early Journal Content program. There’s no front door, that I saw, to the EJC program you have to search JSTOR first and then limit your search to “only content I can access” Not super-intuitive, but okay. And I’m not trying to be a pill, but doing a search on the about.jstor.org site for “public domain” gets you zero results though the same is true when searching for “early journal content” and also for “librarian.” Actually, I get the same results when I search their site for JSTOR. Something is broken, I have written them an email. [update: they fixed it!]

So I go to JSTOR and do a similar search, looking for only “content I can access” and pick up the first thing that’s pre-1923 which is an article about Aboriginal fire making from American Anthropologist in 1890. I click through and agree to the Terms of Service which is almost 9000 words long. Only the last 260 words really apply to EJC. Basically I’ve agreed to use it non-commercially (librarian.net accepts no advertising, I an in the clear) and not scrape their content with bots or other devices. I’ve also seemingly acquiesced to credit them and to use the stable URL, though that doesn’t let me deep-link to the page with the image on it, so I’ve crossed my fingers and deep-linked anyhow. I’m still not sure what I would do, contact JSTOR I guess, if I wanted to use this document in a for-profit project. Being curious, I poked around to see if I could find this public domain document elsewhere and sure enough, I could.

Wiley, the original publisher has the article available with no JSTOR preconditions.
Google Books has it both available and not-available depending how you look for it. Readable online, downloadable if you really fish around for it. Google has nearly identical language to JSTOR: use this non-commercially, don’t scrape content or use bots, leave Google’s watermark intact. Hathi Trust has a discovery layer for this material as well and they provide this concise explanation of how “public domain, Google digitized” is different from the public domain. Seventeen different types of rights, whoo-wee!
Worldcat shows me how to get this from a participating library
The Internet Archive has a copy, though it was a little tough to find, and it’s an OCRed version of the Google Books document that’s been ported to their interface, though this one says NO_KNOWN_COPYRIGHT (all caps not mine).

At that point, I quit looking. I found a copy that was free to use. This, however, meant that I had to be good at searching, quite persistent and not willing to take “Maybe” as an answer to “Can I use this content?” I know that when I was writing my book my publishers would not have taken maybe for an answer, they were not even that thrilled to take Wikimedia Commons’ public domain assertions.

As librarians, I feel we have to be prepared to find content that is freely usable for our patrons, not just content that is mostly freely usable or content where people are unlikely to come after you. As much as I’m personally okay being a test case for some sort of “Yeah I didn’t read all 9000 words on the JSTOR terms and conditions, please feel free to take me to jail” case, realistically that will not happen. Realistically the real threat of jail is scary and terrible and expensive. Realistically people bend and decide it’s not so bad because they think it’s the best they can do. I think we can probably do better than that.

I saw this post circulating around facebook and, of course, the word “library” caught my eye. The Boston Globe has a longer explanation about what all the kerfuffle is about, but still uses words like “hacking.” The Demand Progress blog, the organization that Aaron directs, has this statement and some additional blog posts. The New York Times seems to have the most comprehensive explanation of what happened when and has the text of the indictment.

What we do know is that the US Government has indicted Aaron Swartz [who you may know around the internet for any number of things] for, apparently and allegedly, downloading 4mil articles from JSTOR without (I think?) the proper credentials. Aaron turned himself in. At issue are many points of JSTORs terms of service and what sort of access is given to guests of the university. As Aaron is a net activist, I’m certain this is some level of intentional move on his part, I’m quite curious to see where it goes.

Update: JSTORs official statement, Wired article with more details

Tag: jstor

on public domain and “public domain”

a shot over the bow – Aaron Swartz indicted for … downloading articles from JSTOR?