A presidential “legacy” via rewritten history

Web archiving is a topic of great interest to me and the subject of an article I’m writing.  Part of the paper addresses the Bush administration’s questionable conduct regarding the content of the White house website.  For example, the White House website’s robots exclusion file — a mechanism that can be used to ask search engine and web archive spiders to stay away — is nearly 2300 lines long.  2300 lines?  Simply absurd.  (Click here for a copy of the White House robots file that I downloaded on Nov. 25, 2008.)

Today, researchers at the University of Illinois released a study showing how the White House has deleted or modified portions of its website.  Their findings are, sadly, unsurprising:

Legacies are in the air as President Bush prepares to leave the White House. How future historians will judge the president remains to be seen, but one thing is certain: future historians won’t have all the facts needed to make that judgment. One legacy at risk of being forgotten is the way the Bush White House has quietly deleted or modified key documents in the public record that are maintained under its direct control.

Remember the “Coalition of the Willing” that sided with the United States during the 2003 invasion of Iraq? If you search the White House web site today you’ll find a press release dated March 27, 2003 listing 49 countries forming the coalition. A key piece of evidence in the historical record, but also a troubling one. It is an impostor.

And although there were only 45 coalition members on the eve of the Iraq invasion, later deletions and revisions to key documents make it seem that there were always 49.

The study is a disturbing read.  Rightly or not, a primary source of history for many researchers is the web.  And any effort by the government to modify or delete historical records is appalling.  As the authors note:

Updating lists to keep up with the times is one thing. Deleting original documents from the White House archives is another. Back-dating later documents and using them to replace the originals goes beyond irresponsible stewardship of the public record. It is rewriting history.

H/T: New York Times.

Is Zoetrope the next-gen Internet Archive?

Although the Internet Archive’s Wayback Machine is a great research tool, its utility is hampered but a lack of basic search mechanisms.  One can search by URL and archived links, but basic Google-style boolean searching isn’t available.  The Archive once offered a beta boolean search tool, but it never worked and it was later withdrawn.

However, a new application may significantly expand our ability to data-mine archived webdata. Reports give a sneak peek at Zoetrope, an application being developed by researchers at Adobe and the University of Washington.  As put by the researchers:

The Web is ephemeral. Pages change frequently, and it is nearly impossible to find data or follow a link after the underlying page evolves. We present Zoetrope, a system that enables interaction with the historical Web (pages, links, and embedded data) that would otherwise be lost to time. Using a number of novel interactions, the temporal Web can be manipulated, queried, and analyzed from the context of familar [sic] pages. Zoetrope is based on a set of operators for manipulating content streams. We describe these primitives and the associated indexing strategies for handling temporal Web data. They form the basis of Zoetrope and enable our construction of new temporal interactions and visualizations.

The demo video shows how historical webdata could be manipulated and compared, as the authors note, in a variety of “novel” ways.  Even more significantly, researcher Eytan Adar “hopes to eventually incorporate information from the Internet Archive’s nearly 14 years of records.” Such a combination would massively increase the utility of web archives, but would also — as discussed in a paper I’m writing — exacerbate concerns over informational autonomy.

YouTube Preview Image.

The research paper can be found here.

BoingBoing “unpublishing” blog posts

When is it ok to delete a blog post?  Dan Solove wrote about this a few years back at Concurring Opinions, where he points to additional posts at Prawfsblawg (here, here, and here). More recently, BoingBoing faced public scrutiny when one of its authors removed posts related to blogger and sex columnist Violet Blue, although nobody noticed the removals for about a year.  A message board dedicated to the issue has generated over 1600 messages since July 1, some very heated.  The moderator for the board writes:

It’s our blog and so we made an editorial decision, like we do every single day. We didn’t attempt to silence Violet. We unpublished our own work. There’s a big difference between that and censorship.

We hope you’ll respect our choice to keep the reasons behind this private. We do understand the confusion this caused for some, especially since we fight hard for openness and transparency. We were trying to do the right thing quietly and respectfully, without embarrassing the parties involved.

Clearly, that didn’t work out. In attempting to defuse drama, we inadvertently ignited more. Mind you, we weren’t the ones splashing gasoline around; but we did make the fire possible. We’re sorry about that. In the meantime, Boing Boing’s past content is indexed on the Wayback Machine, a basic Internet resource; so the material should still be available for those who would like to read it.

Oddly, BoingBoing speaks in terms of “unpublishing” rather than deletion.   (Their policy page states “We reserve the right to unpublish or refuse to unpublish anything for any or no reason.”)  Sure, “unpublishing” sounds less big-brothery than deletion, but I don’t really see the difference.

Moreover, “unpublishing” isn’t quite accurate: BoingBoing doesn’t mean “unpublished” in the sense of a book (or blog posting) that has yet to be published.  They mean disabling public access to something that has already been posted, like in the DMCA 512(c) sense where material is removed or access to it is disabled.  (WordPress does have an “unpublishing” function, but that’s still a misnomer.)  A more accurate term might be deposting, depublishing, or good ‘ol deletion.

Nevertheless, it’s useful to explore a potential distinction between deletion and depublishing, and other questions raised when a blogger wants to remove posted materials:

  • As a starting point, what is the meaning of “publication” in an age where materials can be changed or removed?
  • Under what circumstances is depublication justified?
  • What practices are needed to distinguish “depublication” from “deletion?”  Is a reservation of rights declaring a right of depublication sufficient?  Should a notice be posted where the materials used to be (as Dan Markel suggests)?
  • BoingBoing notes that the removed materials remain on the Wayback Machine web archive.  Do web archives help to justify depublication?
  • Does depublication serve an important social function by severing the association between author and depublished content?

Hat tip to Noam Cohen.  And a disclaimer: I did make some edits to this post after posting.

This posting will self-destruct in five seconds

As the Internet Archive shows, there is great value in preserving digital information for posterity. But sometimes, there is greater value in destroying information and doing so quickly. Information Week recounts the 2001 incident when an American spy plane was forced to land in Chinese territory after the plane collided with a Chinese fighter jet. The article notes that the U.S. crew was unable to erase the hard drives in time to protect the security of sensitive information. “Since then,” the article states, “researchers have been looking for a way to quickly erase computer hard drives to deny access to sensitive intelligence data.”

According to the article, researchers have developed an effective technique to erase hard drives in minutes rather than hours:

The researchers concluded that permanent magnets are the best solution. Other methods, including burning disks with heat-generating thermite, crushing drives in presses, chemically destroying the media or frying them with microwaves all proved susceptible to sensitive, patient, recovery efforts.

The military need for such technology is obvious and is a simple no-brainer. But additionally interesting are the potential commercial and consumer applications of such technology. According to the article, the researchers claimed the magnetic eraser could be used to quickly erase VHS tapes, floppy drives, data cassettes and hard drives. Maybe someday soon, it will be unacceptable and even illegal for corporations and government agencies to keep sensitive information — like your social security number — on easily stolen laptops, unless those machines are equipped an effective auto-erasure mechanism.