Yet another report on digital preservation
It must be Digital Preservation Week.
Just a few days ago, I wrote about the Library of Congress’ new report on digital preservation (which itself followed the report of the Section 108 Study Group issued last March). Now, the Commission of the European Communities has released a green paper entitled Copyright in the Knowledge Economy, which discusses, among other things, digital preservation, the making available of digitized works, and orphan works.
Hat tip: LibraryJournal.com
New report on copyright and digital preservation
A joint report on the problems of copyright and digital preservation — International Study on the Impact of Copyright Law on Digital Preservation — was released this month by the Library of Congress National Digital Information Infrastructure and Preservation Program (”NDIIP”), the Joint Information Systems Committee, the Open Access to Knowledge (OAK) Law Project, and the SURFfoundation.
The report studies problems of digital preservation by looking at the copyright laws of four countries, including the United States. It finds:
Digital preservation is vital to ensure that works created and distributed in digital form will continue to be available over time to researchers, scholars and other users. Digital works are ephemeral, and unless preservation efforts are begun soon after such works are created, they will be lost to future generations. Although copyright and related laws are not the only obstacle to digital preservation activities, there is no question that those laws present significant challenges.
See also the Section 108 Study Group Report, issued earlier this year, which discusses copyright law and digital preservation.
GAO report and pending bill on federal e-mail retention
The GAO has released a report entitled Federal Records: National Archives and Selected Agencies Need to Strengthen E-Mail Management. The report found that “[w]ithout periodic evaluations of recordkeeping practices or other controls to ensure that staff are trained and carry out their responsibilities, agencies have little assurance that e-mail records are properly identified, stored, and preserved.” It also stated:
Although NARA [i.e., the National Archives and Records Administration] has responsibilities for oversight of agencies’ records and records management programs and practices, including conducting inspections or surveys, performing studies, and reporting results to the Congress and the Office of Management and Budget (OMB), in recent years NARA’s oversight activities have been primarily limited to performing studies. NARA has conducted no inspections of agency records management programs since 2000, because it uses inspections only to address cases of the highest risk, and no recent cases have met its criteria. In addition, NARA has not consistently reported details on records management problems or recommended practices that were discovered as a result of its studies. Without more comprehensive evaluations of agency records management, NARA has limited assurance that agencies are appropriately managing the records in their custody and that important records are not lost.
Meanwhile, the White House is threatening a veto of the Electronic Message Preservation Act. According to the National Coalition for History, the bill “would direct the National Archives and Records Administration (NARA) to establish standards for the capture, management, preservation and retrieval of federal agency and presidential electronic messages that are records in an electronic format.” (Further info on the bill here, and on issues concerning the White House’s e-mail retention practices here.)
Hat tip regarding the GAO report to David Mattison at the Ten-Thousand Year Blog. Hat tip regarding the bill to BNA’s Privacy Law Watch.
BoingBoing “unpublishing” blog posts
When is it ok to delete a blog post? Dan Solove wrote about this a few years back at Concurring Opinions, where he points to additional posts at Prawfsblawg (here, here, and here). More recently, BoingBoing faced public scrutiny when one of its authors removed posts related to blogger and sex columnist Violet Blue, although nobody noticed the removals for about a year. A message board dedicated to the issue has generated over 1600 messages since July 1, some very heated. The moderator for the board writes:
It’s our blog and so we made an editorial decision, like we do every single day. We didn’t attempt to silence Violet. We unpublished our own work. There’s a big difference between that and censorship.
We hope you’ll respect our choice to keep the reasons behind this private. We do understand the confusion this caused for some, especially since we fight hard for openness and transparency. We were trying to do the right thing quietly and respectfully, without embarrassing the parties involved.
Clearly, that didn’t work out. In attempting to defuse drama, we inadvertently ignited more. Mind you, we weren’t the ones splashing gasoline around; but we did make the fire possible. We’re sorry about that. In the meantime, Boing Boing’s past content is indexed on the Wayback Machine, a basic Internet resource; so the material should still be available for those who would like to read it.
Oddly, BoingBoing speaks in terms of “unpublishing” rather than deletion. (Their policy page states “We reserve the right to unpublish or refuse to unpublish anything for any or no reason.”) Sure, “unpublishing” sounds less big-brothery than deletion, but I don’t really see the difference.
Moreover, “unpublishing” isn’t quite accurate: BoingBoing doesn’t mean “unpublished” in the sense of a book (or blog posting) that has yet to be published. They mean disabling public access to something that has already been posted, like in the DMCA 512(c) sense where material is removed or access to it is disabled. (Wordpress does have an “unpublishing” function, but that’s still a misnomer.) A more accurate term might be deposting, depublishing, or good ‘ol deletion.
Nevertheless, it’s useful to explore a potential distinction between deletion and depublishing, and other questions raised when a blogger wants to remove posted materials:
- As a starting point, what is the meaning of “publication” in an age where materials can be changed or removed?
- Under what circumstances is depublication justified?
- What practices are needed to distinguish “depublication” from “deletion?” Is a reservation of rights declaring a right of depublication sufficient? Should a notice be posted where the materials used to be (as Dan Markel suggests)?
- BoingBoing notes that the removed materials remain on the Wayback Machine web archive. Do web archives help to justify depublication?
- Does depublication serve an important social function by severing the association between author and depublished content?
Hat tip to Noam Cohen. And a disclaimer: I did make some edits to this post after posting.
Archiving Independence Day
The National Archives and Records Administration maintains a great site called Charters of Freedom that maintains high-quality scans of key documents such as the Constitution and the Bill of Rights. It also includes the Declaration of Independence.
By the way, the picture in the sidebar is the National Archives building being built way-back when.
Happy Independence Day!
ADDENDUM: Wired has posted a short, interesting piece on the early mistreatment of the document, and more recent efforts to preserve it.

Advice for new law students, part III: avoiding your own Universal Studios fire
In an op-ed in the New York Times, UCLA film professor Jonathan Kuntz writes about the recent fire at Universal Studios. After describing the destruction of the courthouse square from To Kill a Mockingbird and Back to the Future, Kuntz notes:
More serious may be the loss of the circulating 35-millimeter theatrical prints. While not original masters, these are the copies made for screenings at repertory theaters, art museum retrospectives and in college classes. . . .
. . . .
This latest fire, I hope, will prompt Universal and its fellow majors to better preserve not just key titles like “Duck Soup,” “Dracula” or “Vertigo” — which will surely be reprinted and return to circulation — but also the other 90 percent of their inventories, the less famous and therefore more vulnerable titles that the studio may not feel justify spending thousands to save. These are exquisite samples of 20th-century American culture and deserve to always be seen in their extravagant, sensual, big-screen glory.
It sounds like after the fire, some of Universals’ assets no longer exist beyond a single remaining master copy. That’s troubling for several reasons. First, should the masters be destroyed, the best (and in some cases, only) copies will be lost. Second, for cultural use to be made of the materials, new copies must be made.
What does this have to do with law students? The same thing: the importance of archiving and the dangers of failing to do so. Every term, students suffer data catastrophes — hard drive crashes, stolen laptops, etc. — leading to lost class notes, outlines, paper drafts, etc. Law school is stressful enough without the added strain of losing a 100-page outline two days before the final exam. But sadly, it seems to happen every term.
Back up your essential files, do so regularly, and keep them in secure and geographically distinct places, such as multiple computers, external hard drives kept elsewhere, network storage, and/or online storage. Or do simple and quick backups: periodically email your essential files to yourself.
Advice for new law students, part I here.
Advice for new law students, part II here.
This posting will self-destruct in five seconds
As the Internet Archive shows, there is great value in preserving digital information for posterity. But sometimes, there is greater value in destroying information and doing so quickly. Information Week recounts the 2001 incident when an American spy plane was forced to land in Chinese territory after the plane collided with a Chinese fighter jet. The article notes that the U.S. crew was unable to erase the hard drives in time to protect the security of sensitive information. “Since then,” the article states, “researchers have been looking for a way to quickly erase computer hard drives to deny access to sensitive intelligence data.”
According to the article, researchers have developed an effective technique to erase hard drives in minutes rather than hours:
The researchers concluded that permanent magnets are the best solution. Other methods, including burning disks with heat-generating thermite, crushing drives in presses, chemically destroying the media or frying them with microwaves all proved susceptible to sensitive, patient, recovery efforts.
The military need for such technology is obvious and is a simple no-brainer. But additionally interesting are the potential commercial and consumer applications of such technology. According to the article, the researchers claimed the magnetic eraser could be used to quickly erase VHS tapes, floppy drives, data cassettes and hard drives. Maybe someday soon, it will be unacceptable and even illegal for corporations and government agencies to keep sensitive information — like your social security number — on easily stolen laptops, unless those machines are equipped an effective auto-erasure mechanism.
Inheritability of blogs: You take Aunt Esther’s silverware, I’ll take her blog…
Over at the user forums on Wordpress.com, there’s an interesting thread on “web logs and wills.” Forum user timethief writes:
What happens to . . . web logs if a person dies and their executor notifies [the weblog's host] of their demise. Can one leave their account, username, password and API key number to another person in their will?
What a great question! It reminds me of the case last year of Lance Corporal Justin Ellsworth, who died in Iraq. After his death, his family asked Yahoo for access to his emails. Yahoo refused. After a court ordered Yahoo to hand over the contents of the account, Yahoo complied. But the parallel to Ellsworth has its limits. With emails, there are significant concerns over privacy: it just cannot be assumed that every deceased person wants his or her executors and heirs poring through their private and potentially embarrassing emails.
In contrast, blogs are intended for some level of public consumption and the privacy issues generally don’t run as high. (Though even with blogs, privacy concerns can exist, such as with David Lat, the formerly anonymous “Article III Groupie” who writes Underneath Their Robes.) Indeed, although many blogs are quickly abandoned, others are intended to serve as lasting statements of authorship, whether professional or personal (or both). As timethief noted in a later post, “Blogging is now and will remain part of what defined me as a unique individual.” But blogs aren’t books or magazines. After we’re gone, existing copies of books we wrote can continue to exist without additional effort on the part of our estates or heirs. And our estates and heirs can’t force consumers to return legally acquired copies of books.
But the book analogy is hard to apply to blogs. Blogs aren’t material objects and they’ll disappear without maintenance or preservation. But long-term maintenance isn’t really practical, at least yet, for blogs whose owners have passed away. If hosting accounts aren’t kept active, or applicable payments stop, or hosting providers go out of business, or computers fail, or blogging code & databases become incompatible with future technologies, our blogs — like other web-only publications — may disappear or break. Plus, a blog might be shut down by an author’s estate or heirs, unless perhaps some sort of enforceable provisions can be made by the author that the blog be maintained posthumously.
Communal blogs like The Volokh Conspiracy stand a better chance of lengthy lives, since maintenance tasks can be undertaken as new members arrive. But most other sites, even highly successful ones like Howard Bashman’s How Appealing, are run by only one person. For an estate or heir, long-term maintenance after an author’s demise is not necessarily simple or — excuse the pun — appealing. In a rare case, successful blogs like Bashman’s could be valuable estate assets that would encourage continued maintenance and even eventual profitable transfer, but most blogs will utterly lack any such kind of maintenance incentive. (Of course, this is all illustrative, and Eugene and Howard should be blogging for many decades to come!)
This raises the question of digital preservation. Because long-term maintenance may not always be feasible, digital preservation of old sites becomes really important, and the utility of the Internet Archive’s Wayback Machine can’t be overstated. But I think that Wayback Machine is just the beginning of a dialogue over how — and when — to preserve web-only materials. Putting copyright issues to the side for the moment, the Internet Archive doesn’t archive all sites, and when it does, it archives some sites more often than others. Plus, it’s not entirely clear whether the Wayback Machine is currently capable of properly archiving all types of blogs: the Internet Archive states that sites that are database-driven or that generate dynamic web pages can’t be archived. I’d think this limitation could apply to at least some blogs (such as this WordPress blog, which is driven by a PHP & MySQL database).
But a quick review of the Wayback Machine suggests that, despite the disclaimer, the Internet Archive may be improving its ability to archive blogs — here’s links to a WordPress-run site that was archived incorrectly in March 2004, but appears to be much better represented in an archive from November 2004. Hopefully, the Internet Archive is continuing to improve its capability to archive different kinds of webpages. Needless to say, as web publishing technologies evolve, it will remain a struggle to find ways to accurately and authoritatively preserve such materials. My quick review of a number of blawgs suggests that some appear to have been pretty nicely archived, whereas others have not. I’ll address this more in a future post.
Thus, I think that timethief’s question — a really good one — leads to additional questions about whether web-only materials should be kept online, and if so, to even more questions about how, where, and by whom they should be maintained or preserved. I don’t think the answers to these questions are easy or obvious.
