Chinese censorship and the infoglut

Yesterday, New York Times columnist Nicholas Kristof wrote about his experiments in testing Chinese censorship of the internet. (See In China It’s ******* vs. Netizens, June 20, 2006, subscription required.) Kristof started two Chinese-language blogs and filled them with politically charged postings. He was surprised that the posts were quickly available online, with only an occasional — and apparently automated, I would think — substitution of asterisks for certain Chinese characters.

Commenting on the quick availability of his blogs, Kristof observes that it’s impossible for China’s 30,000 censors to keep up with 120 million Chinese netizens. This might be correct: the sheer quantity of internet information makes absolute control pretty much impossible. But Kristof further concludes that “the Web is beginning to assume the watchdog role filled by the news media in freer countries.” As Ethan Leib notes at PrawfsBlawg, he’s not as optimistic as Kristof, and I agree. The fact that Kristof’s postings went online mostly unscathed likely says more about the ineffectiveness of filtering programs than about governmental permissiveness. Getting things on the web and keeping them there are not the same.

To his credit, Kristof recognized that his postings might not last long, predicting that “[w]hen State Security reads this, it may finally order my blogs closed.” His prediction was proven correct, and quickly. Though the blogs were online last night, when I checked this afternoon they were gone. One, http://jisidao.blog.sohu.com/, now apparently says that the user does not exist. (Caveat: I don’t read Chinese and used Babelfish to translate.) The other, http://blog.sina.com.cn/u/1238333873, now redirects the user to the main page at http://blog.sina.com.cn/main/. Almost certainly it was humans — and not programs — that removed the sites. Automated and human censorship in China apparently work hand in hand.

Kristof’s observations do contain some seeds of optimism that Chinese censorship can be circumvented by technological and human countermeasures. He writes that young people use proxy software to reach forbidden sites and Skype to make phone calls. He also writes about Chinese blogger Li Xinde, “who travels around China with his laptop, reporting on corruption and human-rights abuses.” Xinde’s sites are closed down constantly, but “the moment a site is censored he replaces it with a new one.” Xinde uses an overseas site, http://www.lixinde.com, to inform readers of the best current internet address.

Nonetheless, I have to wonder how many Chinese citizens engage in these activities or risk imprisonment to blog about politically charged subjects. Even though automated and human censorship might be circumvented by technological and human countermeasures, the will to take such risks must exist as well. As Ethan Leib notes, “it is hard to blog from a Chinese prison.” How does one counteract fear?

Facebook: job-hunting, non-invisibility, and the creepiness factor

Note to job applicants: your potential employers aren’t just looking at Google and Yahoo.

Sunday’s New York Times includes a really interesting article by Alan Finder on how some companies now investigate job applicants on social networking sites such as Facebook, MySpace, Xanga, and Friendster. See For Some, Online Persona Undermines a Résumé.”

The article underscores a simple but important fact: users of social network sites shouldn’t assume that their postings are private. Although names like “MySpace” paint an image of personal spaces, personal doesn’t mean private. It’s not difficult to get into these sites – as the article notes, for some sites such as MySpace, you generally only need to register. For Facebook, to view entries for a particular college, you only need an e-mail address from that college.

That means an awful lot of people can view Facebook entries: alumni with email addresses (which could include potential employers), professors, even campus police. Despite this, at an emotional level, many people assume that their personal websites, blogs, and social network postings are relatively personal spaces that won’t be noticed or invaded by others. These assumptions are wrong in at least two ways.

Continue reading “Facebook: job-hunting, non-invisibility, and the creepiness factor”

Inheritability of blogs: You take Aunt Esther’s silverware, I’ll take her blog…

Over at the user forums on WordPress.com, there’s an interesting thread on “web logs and wills.” Forum user timethief writes:

What happens to . . . web logs if a person dies and their executor notifies [the weblog’s host] of their demise. Can one leave their account, username, password and API key number to another person in their will?

What a great question! It reminds me of the case last year of Lance Corporal Justin Ellsworth, who died in Iraq. After his death, his family asked Yahoo for access to his emails. Yahoo refused. After a court ordered Yahoo to hand over the contents of the account, Yahoo complied. But the parallel to Ellsworth has its limits. With emails, there are significant concerns over privacy: it just cannot be assumed that every deceased person wants his or her executors and heirs poring through their private and potentially embarrassing emails.

In contrast, blogs are intended for some level of public consumption and the privacy issues generally don’t run as high. (Though even with blogs, privacy concerns can exist, such as with David Lat, the formerly anonymous “Article III Groupie” who writes Underneath Their Robes.) Indeed, although many blogs are quickly abandoned, others are intended to serve as lasting statements of authorship, whether professional or personal (or both). As timethief noted in a later post, “Blogging is now and will remain part of what defined me as a unique individual.” But blogs aren’t books or magazines. After we’re gone, existing copies of books we wrote can continue to exist without additional effort on the part of our estates or heirs. And our estates and heirs can’t force consumers to return legally acquired copies of books.

But the book analogy is hard to apply to blogs. Blogs aren’t material objects and they’ll disappear without maintenance or preservation. But long-term maintenance isn’t really practical, at least yet, for blogs whose owners have passed away. If hosting accounts aren’t kept active, or applicable payments stop, or hosting providers go out of business, or computers fail, or blogging code & databases become incompatible with future technologies, our blogs — like other web-only publications — may disappear or break. Plus, a blog might be shut down by an author’s estate or heirs, unless perhaps some sort of enforceable provisions can be made by the author that the blog be maintained posthumously.

Communal blogs like The Volokh Conspiracy stand a better chance of lengthy lives, since maintenance tasks can be undertaken as new members arrive. But most other sites, even highly successful ones like Howard Bashman’s How Appealing, are run by only one person. For an estate or heir, long-term maintenance after an author’s demise is not necessarily simple or — excuse the pun — appealing. In a rare case, successful blogs like Bashman’s could be valuable estate assets that would encourage continued maintenance and even eventual profitable transfer, but most blogs will utterly lack any such kind of maintenance incentive. (Of course, this is all illustrative, and Eugene and Howard should be blogging for many decades to come!)

This raises the question of digital preservation. Because long-term maintenance may not always be feasible, digital preservation of old sites becomes really important, and the utility of the Internet Archive’s Wayback Machine can’t be overstated. But I think that Wayback Machine is just the beginning of a dialogue over how — and when — to preserve web-only materials. Putting copyright issues to the side for the moment, the Internet Archive doesn’t archive all sites, and when it does, it archives some sites more often than others. Plus, it’s not entirely clear whether the Wayback Machine is currently capable of properly archiving all types of blogs: the Internet Archive states that sites that are database-driven or that generate dynamic web pages can’t be archived. I’d think this limitation could apply to at least some blogs (such as this WordPress blog, which is driven by a PHP & MySQL database).

But a quick review of the Wayback Machine suggests that, despite the disclaimer, the Internet Archive may be improving its ability to archive blogs — here’s links to a WordPress-run site that was archived incorrectly in March 2004, but appears to be much better represented in an archive from November 2004. Hopefully, the Internet Archive is continuing to improve its capability to archive different kinds of webpages. Needless to say, as web publishing technologies evolve, it will remain a struggle to find ways to accurately and authoritatively preserve such materials. My quick review of a number of blawgs suggests that some appear to have been pretty nicely archived, whereas others have not. I’ll address this more in a future post.

Thus, I think that timethief’s question — a really good one — leads to additional questions about whether web-only materials should be kept online, and if so, to even more questions about how, where, and by whom they should be maintained or preserved. I don’t think the answers to these questions are easy or obvious.

Shakespeare & serendipity

Why use a chunk from Shakespeare’s first sonnet as my first posting?

Quick answer #1: Because he wrote so much more beautifully than I ever will.

Quick answer #2: Because I wanted a placeholder.

Not-so-quick answer #3: When working on the blog’s design, I wanted something — anything — to serve as a placeholder. Shakespeare seemed like a good idea: because I’m interested in the technical, policy, and legal problems of preserving information, Shakespeare’s works seemed a textbook example of what should be preserved.

So I found a Shakespeare website and gleefully exercised my right to copy, clip, and paste from the public domain. Sidebar: it would have been even more interesting if I had clipped from a DRM’d CD-ROM of Shakespeare’s works, but that’s another post and another day . . . .

And an admission: Although I was an english & philosophy major in my undergraduate days, it’s been a very, very long time since I thought about Shakespeare. (Notwithstanding Shakespeare in Love, which was great). Having absolutely no idea what might be relevant or useful, I simply looked at the the first thing I found, Shakespeare’s first sonnet.

But serendipity is a funny thing. Considering that I’m currently writing about digital preservation, and further considering that so much of what we electronically preserve is forgettable noise and infoglut — or digital garbage! — I thought Shakespeare’s language was a keeper. Which, of course, it is.