NARA hosting “lite” Bush website archive

There are plenty of good changes in the new whitehouse.gov site, such as a better copyright policy that enables clearer copying and remix, and a much shorter robots.txt file, which makes it easier for search engines and archivists to index and archive the site.  (Compare the current 4-line Obama robots file to a 2300+ version from apparently late in the Bush era.)

But what about Bush’s old website?  Shouldn’t that be preserved?  (Well, yeah!)  But when President Obama took the oath of office, things switched over and the Bush site was gone from public view.  Did anybody keep a copy?  Well, yes, kind of.  The Internet Archive archives the whitehouse.gov site, but I have deep concerns about the completeness of its archive.  See below for a screen cap of the Internet Archive’s database of http://www.whitehouse.gov.

Internet Archive captures of whitehouse.gov

I think it can be taken as an axiom that in a free society, it’s vital that governmental sites are archived frequently, deeply, accurately, and made available for scrutiny quickly.  But the depth of the Internet Archive’s archive of whitehouse.gov is unclear.  First, to the extent that the Bush administration’s robots.txt file told search engines and archives to stay away, did the Internet Archive fail to archive governmental content?  (Maybe not, but how can we be sure?)  Second, the Internet Archive is not up-to-date: as of this writing, the most recent public archive of whitehouse.gov is dated Mar. 25, 2008.  Finally and even more disturbingly, the Internet Archive’s frequency is poor.  It contains only 53 captures of the main whitehouse.gov page for 2007, and only 15 have yet been posted from calendar year 2008.  We can do better.

Interestingly, it appears that government archivists are now dipping their feet in the water.  At least part of the legacy Bush 2009 website is now being hosted by the National Archives and Records Administration (“NARA”), which administers the George W. Bush Presidential Library.  According to the site:

To preserve the historical record of the George W. Bush administration’s presence on the web, the White House took a “snapshot” of the Whitehouse.gov web site. This is historical material, “frozen in time.” The web site is no longer updated and links to external web sites and some internal pages will not work.

Having NARA archivists maintain an archive is a good start.  (Though there should always be archives maintained by disinterested third parties as well.)  But it’s not enough to have a “snapshot” of a presidential website.  Not only does the archive lack temporal depth (it’s only from materials existing in January 2009), but it appears to be incomplete as well, as even some internal links are admitted not to function.  Plus, as the site indicates, the “White House” took the snapshot.  I take this to mean that it was taken by interested White House insiders rather than by (hopefully) disinterested professional archivists at NARA.

H/T on Bush Archive to BushLegacy via Twitter.

A presidential “legacy” via rewritten history

Web archiving is a topic of great interest to me and the subject of an article I’m writing.  Part of the paper addresses the Bush administration’s questionable conduct regarding the content of the White house website.  For example, the White House website’s robots exclusion file — a mechanism that can be used to ask search engine and web archive spiders to stay away — is nearly 2300 lines long.  2300 lines?  Simply absurd.  (Click here for a copy of the White House robots file that I downloaded on Nov. 25, 2008.)

Today, researchers at the University of Illinois released a study showing how the White House has deleted or modified portions of its website.  Their findings are, sadly, unsurprising:

Legacies are in the air as President Bush prepares to leave the White House. How future historians will judge the president remains to be seen, but one thing is certain: future historians won’t have all the facts needed to make that judgment. One legacy at risk of being forgotten is the way the Bush White House has quietly deleted or modified key documents in the public record that are maintained under its direct control.

Remember the “Coalition of the Willing” that sided with the United States during the 2003 invasion of Iraq? If you search the White House web site today you’ll find a press release dated March 27, 2003 listing 49 countries forming the coalition. A key piece of evidence in the historical record, but also a troubling one. It is an impostor.

And although there were only 45 coalition members on the eve of the Iraq invasion, later deletions and revisions to key documents make it seem that there were always 49.

The study is a disturbing read.  Rightly or not, a primary source of history for many researchers is the web.  And any effort by the government to modify or delete historical records is appalling.  As the authors note:

Updating lists to keep up with the times is one thing. Deleting original documents from the White House archives is another. Back-dating later documents and using them to replace the originals goes beyond irresponsible stewardship of the public record. It is rewriting history.

H/T: New York Times.

GAO report and pending bill on federal e-mail retention

The GAO has released a report entitled Federal Records: National Archives and Selected Agencies Need to Strengthen E-Mail Management. The report found that “[w]ithout periodic evaluations of recordkeeping practices or other controls to ensure that staff are trained and carry out their responsibilities, agencies have little assurance that e-mail records are properly identified, stored, and preserved.”  It also stated:

Although NARA [i.e., the National Archives and Records Administration] has responsibilities for oversight of agencies’ records and records management programs and practices, including conducting inspections or surveys, performing studies, and reporting results to the Congress and the Office of Management and Budget (OMB), in recent years NARA’s oversight activities have been primarily limited to performing studies. NARA has conducted no inspections of agency records management programs since 2000, because it uses inspections only to address cases of the highest risk, and no recent cases have met its criteria. In addition, NARA has not consistently reported details on records management problems or recommended practices that were discovered as a result of its studies. Without more comprehensive evaluations of agency records management, NARA has limited assurance that agencies are appropriately managing the records in their custody and that important records are not lost.

Meanwhile, the White House is threatening a veto of the Electronic Message Preservation Act.  According to the National Coalition for History, the bill “would direct the National Archives and Records Administration (NARA) to establish standards for the capture, management, preservation and retrieval of federal agency and presidential electronic messages that are records in an electronic format.”  (Further info on the bill here, and on issues concerning the White House’s e-mail retention practices here.)

Hat tip regarding the GAO report to David Mattison at the Ten-Thousand Year Blog.  Hat tip regarding the bill to BNA’s Privacy Law Watch.