Is Zoetrope the next-gen Internet Archive?

Although the Internet Archive’s Wayback Machine is a great research tool, its utility is hampered but a lack of basic search mechanisms.  One can search by URL and archived links, but basic Google-style boolean searching isn’t available.  The Archive once offered a beta boolean search tool, but it never worked and it was later withdrawn.

However, a new application may significantly expand our ability to data-mine archived webdata. Reports give a sneak peek at Zoetrope, an application being developed by researchers at Adobe and the University of Washington.  As put by the researchers:

The Web is ephemeral. Pages change frequently, and it is nearly impossible to find data or follow a link after the underlying page evolves. We present Zoetrope, a system that enables interaction with the historical Web (pages, links, and embedded data) that would otherwise be lost to time. Using a number of novel interactions, the temporal Web can be manipulated, queried, and analyzed from the context of familar [sic] pages. Zoetrope is based on a set of operators for manipulating content streams. We describe these primitives and the associated indexing strategies for handling temporal Web data. They form the basis of Zoetrope and enable our construction of new temporal interactions and visualizations.

The demo video shows how historical webdata could be manipulated and compared, as the authors note, in a variety of “novel” ways.  Even more significantly, researcher Eytan Adar “hopes to eventually incorporate information from the Internet Archive’s nearly 14 years of records.” Such a combination would massively increase the utility of web archives, but would also — as discussed in a paper I’m writing — exacerbate concerns over informational autonomy.

YouTube Preview Image.

The research paper can be found here.

Yet another report on digital preservation

It must be Digital Preservation Week.

Just a few days ago, I wrote about the Library of Congress’ new report on digital preservation (which itself followed the report of the Section 108 Study Group issued last March).  Now, the Commission of the European Communities has released a green paper entitled Copyright in the Knowledge Economy, which discusses, among other things, digital preservation, the making available of digitized works, and orphan works.

Hat tip: LibraryJournal.com

New report on copyright and digital preservation

A joint report on the problems of copyright and digital preservation — International Study on the Impact of Copyright Law on Digital Preservation — was released this month by the Library of Congress National Digital Information Infrastructure and Preservation Program (“NDIIP”), the Joint Information Systems Committee, the Open Access to Knowledge (OAK) Law Project, and the SURFfoundation.

The report studies problems of digital preservation by looking at the copyright laws of four countries, including the United States.  It finds:

Digital preservation is vital to ensure that works created and distributed in digital form will continue to be available over time to researchers, scholars and other users. Digital works are ephemeral, and unless preservation efforts are begun soon after such works are created, they will be lost to future generations. Although copyright and related laws are not the only obstacle to digital preservation activities, there is no question that those laws present significant challenges.

See also the Section 108 Study Group Report, issued earlier this year, which discusses copyright law and digital preservation.

Google and Viacom reach partial YouTube data agreement

The NY Times reports that Google and Viacom have reached a partial agreement regarding production of YouTube user data:

Google said it had now agreed to provide lawyers for Viacom and a class-action group led by the Football Association of England, a large viewership database that blanks out YouTube username and Internet address data that could be used to identify individual video watchers.

The parties are still working towards a separate agreement concerning YouTube employee data, an issue I wrote about yesterday.