<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>nathenson&#039;s digital garbage &#187; Data Mining</title>
	<atom:link href="http://digitalgarbage.net/tag/data-mining/feed/" rel="self" type="application/rss+xml" />
	<link>http://digitalgarbage.net</link>
	<description>dumpster-diving for bits about law, info, tech, and culture</description>
	<lastBuildDate>Wed, 16 Nov 2011 05:00:26 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Is Zoetrope the next-gen Internet Archive?</title>
		<link>http://digitalgarbage.net/2008/11/22/is-zoetrope-the-next-gen-internet-archive/</link>
		<comments>http://digitalgarbage.net/2008/11/22/is-zoetrope-the-next-gen-internet-archive/#comments</comments>
		<pubDate>Sat, 22 Nov 2008 15:06:43 +0000</pubDate>
		<dc:creator>Ira Nathenson</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Digital Preservation]]></category>
		<category><![CDATA[Internet Archive]]></category>
		<category><![CDATA[Privacy]]></category>
		<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[Searching]]></category>
		<category><![CDATA[Wayback Machine]]></category>

		<guid isPermaLink="false">http://digitalgarbage.net/?p=456</guid>
		<description><![CDATA[Although the Internet Archive&#8217;s Wayback Machine is a great research tool, its utility is hampered but a lack of basic search mechanisms.  One can search by URL and archived links, but basic Google-style boolean searching isn&#8217;t available.  The Archive once &#8230; <a href="http://digitalgarbage.net/2008/11/22/is-zoetrope-the-next-gen-internet-archive/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Although the Internet Archive&#8217;s <a href="http://web.archive.org">Wayback Machine</a> is a great research tool, its utility is hampered but a lack of basic search mechanisms.  One can search by URL and archived links, but basic Google-style boolean searching isn&#8217;t available.  The Archive once offered a beta boolean search tool, but it never worked and it was later withdrawn.</p>
<p>However, a new application may significantly expand our ability to data-mine archived webdata. Reports give a sneak peek at Zoetrope, an application being developed by researchers at Adobe and the <a href="http://uwnews.washington.edu/ni/article.asp?articleID=45255">University of Washington</a>.  As put by the researchers:</p>
<blockquote><p>The Web is ephemeral. Pages change frequently, and it is nearly impossible to find data or follow a link after the underlying page evolves. We present Zoetrope, a system that enables interaction with the historical Web (pages, links, and embedded data) that would otherwise be lost to time. Using a number of novel interactions, the temporal Web can be manipulated, queried, and analyzed from the context of familar [sic] pages. Zoetrope is based on a set of operators for manipulating <em>content streams</em>. We describe these primitives and the associated indexing strategies for handling temporal Web data. They form the basis of Zoetrope and enable our construction of new temporal interactions and visualizations.</p></blockquote>
<p>The demo video shows how historical webdata could be manipulated and compared, as the authors note, in a variety of &#8220;novel&#8221; ways.  Even more significantly, researcher <a href="http://uwnews.washington.edu/ni/article.asp?articleID=45255">Eytan Adar</a> &#8220;hopes to eventually incorporate information from the Internet Archive&#8217;s nearly  14 years of records.&#8221; Such a combination would massively increase the utility of web archives, but would also &#8212; as discussed in a paper I&#8217;m writing &#8212; exacerbate concerns over informational autonomy.</p>
<p><p><a href="http://digitalgarbage.net/2008/11/22/is-zoetrope-the-next-gen-internet-archive/"><em>Click here to view the embedded video.</em></a></p>.</p>
<p>The research paper can be found <a href="http://www.cond.org/z.pdf">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://digitalgarbage.net/2008/11/22/is-zoetrope-the-next-gen-internet-archive/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Odysseus and tax day</title>
		<link>http://digitalgarbage.net/2008/06/25/odysseus-and-tax-day/</link>
		<comments>http://digitalgarbage.net/2008/06/25/odysseus-and-tax-day/#comments</comments>
		<pubDate>Wed, 25 Jun 2008 19:48:02 +0000</pubDate>
		<dc:creator>Ira Nathenson</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Literature]]></category>
		<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://digitalgarbage.net/?p=102</guid>
		<description><![CDATA[Nature.com reports that several researchers have combined astronomical data with events in Homer&#8217;s Odyssey to pinpoint the exact date on which a returning Odysseus executed his wife&#8217;s suitors. Marcelo Magnasco and Constantino Baikouzis identified four astronomical events in the epic &#8230; <a href="http://digitalgarbage.net/2008/06/25/odysseus-and-tax-day/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Nature.com <a href="http://blogs.nature.com/news/thegreatbeyond/2008/06/look_to_the_ancient_skies.html">reports</a> that several researchers have combined astronomical data with events in Homer&#8217;s <em>Odyssey</em> to pinpoint the exact date on which a returning Odysseus executed his wife&#8217;s suitors.</p>
<blockquote><p>Marcelo Magnasco and Constantino Baikouzis identified four astronomical events in the epic poem and calculated dates within 100 years of the fall of Troy that would fit in with the events described around Odysseus’s return home and the ensuing slaughter of men propositioning his wife.</p></blockquote>
<p>According to the researchers, the date was April 16, 1178 BCE.  That&#8217;s also the day after Tax Day, though I&#8217;m pretty sure the IRS didn&#8217;t exist back then.</p>
<p>(Abstract and paper <a href="http://www.pnas.org/cgi/content/abstract/0803317105v2">here</a>; press release <a href="http://newswire.rockefeller.edu/?page=engine&amp;id=777">here</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://digitalgarbage.net/2008/06/25/odysseus-and-tax-day/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What about mail surveillance?</title>
		<link>http://digitalgarbage.net/2008/06/06/what-about-mail-surveillance/</link>
		<comments>http://digitalgarbage.net/2008/06/06/what-about-mail-surveillance/#comments</comments>
		<pubDate>Fri, 06 Jun 2008 18:19:12 +0000</pubDate>
		<dc:creator>Ira Nathenson</dc:creator>
				<category><![CDATA[Privacy]]></category>
		<category><![CDATA[Surveillance]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[NSA]]></category>

		<guid isPermaLink="false">http://digitalgarbage.net/?p=42</guid>
		<description><![CDATA[Yesterday&#8217;s posting on unconsented cell phone surveillance reminded me of an excellent column that Peter Shane wrote a while back in Jurist where he pointed out that any technical legality of the NSA surveillance program is besides the point.  Shane &#8230; <a href="http://digitalgarbage.net/2008/06/06/what-about-mail-surveillance/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Yesterday&#8217;s <a href="http://digitalgarbage.net/2008/06/05/87/">posting</a> on unconsented cell phone surveillance reminded me of an excellent column that Peter Shane wrote a while back in <a href="http://jurist.law.pitt.edu/forumy/2006/06/executive-power-and-breathing-space.php">Jurist</a> where he pointed out that any technical legality of the NSA surveillance program is besides the point.  Shane asked, what if the Post Office created a database with the addresses contained on every piece of mail it handles.  Even if, hypothetically, such a program were legal:</p>
<blockquote><p>An America in which ordinary citizens have their mail “surveilled” would be a different America from the country in which virtually all of us think we live.  Our freedom would be lost not because a law was broken, but because of the breakdown in respect for the norms of liberty and government self-restraint.</p></blockquote>
<p>I think much the same could be said of the ends-justifies-the-means thinking of the Northeastern University researchers who got a European cell phone provider to give them individualized location information on 100,000 unknowing customers. Just because you <em>can</em> do something doesn&#8217;t mean that you <em>should</em>.</p>
]]></content:encoded>
			<wfw:commentRss>http://digitalgarbage.net/2008/06/06/what-about-mail-surveillance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The electronic leash: whatever happened to trusting your kids?</title>
		<link>http://digitalgarbage.net/2006/06/12/kid_tracking/</link>
		<comments>http://digitalgarbage.net/2006/06/12/kid_tracking/#comments</comments>
		<pubDate>Tue, 13 Jun 2006 01:24:47 +0000</pubDate>
		<dc:creator>Ira Nathenson</dc:creator>
				<category><![CDATA[Privacy]]></category>
		<category><![CDATA[Surveillance]]></category>
		<category><![CDATA[Cell phones]]></category>
		<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://digitalgarbage.net/2006/06/12/kid_tracking/</guid>
		<description><![CDATA[Verizon Wireless now offers a service that allows parents to track their kids&#8217; movements through cellphones. According to News.com: Parents can use the service to set up geographic limits and receive text alerts if their children, who also carry phones, &#8230; <a href="http://digitalgarbage.net/2006/06/12/kid_tracking/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Verizon Wireless now offers a service that allows parents to track their kids&#8217; movements through cellphones.  According to <a href="http://news.com.com/Verizontolaunchmobilechaperoneservice/2100-1037_3-6082472.html?tag=nefd.top">News.com</a>:</p>
<blockquote><p>Parents can use the service to set up geographic limits and receive text alerts if their children, who also carry phones, go too far from home.  The service also lets parents check where their offspring are via a map on their cell phone or computer.</p></blockquote>
<p>The service &#8212; &#8220;Chaperone&#8221; for location tracking, and &#8220;Child Zone&#8221; for a boundary-setting add-on &#8212; is available for now only on a four-button phone designed for young kids, such as 5-9 year olds.  (Who buys a phone for a 5 year old?)  But News.com indicates that Verizon Wireless might develop a version of the program for older kids, with more sophisticated phones.</p>
<p>What a great way to train kids for a lifetime of submitting to technological surveillance from authorities.  If you really want to be creeped out, go <a href="http://www.verizonwireless.com/b2c/splash/chaperone/splash.jsp">here</a> on the Verizon Wireless site and watch the animated cartoon family whose kids cheerily acquiesce to their parents spying on them.</p>
<p>Whatever happened to trusting kids and letting them make decisions (and letting them learn to live with the consequences)?</p>
<p>And to be clear, I <em>am</em> a parent.</p>
<p>Thanks to <a href="http://yro.slashdot.org/yro/06/06/12/1716212.shtml">Slashdot</a>, where I first read about this.</p>
]]></content:encoded>
			<wfw:commentRss>http://digitalgarbage.net/2006/06/12/kid_tracking/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Facebook: job-hunting, non-invisibility, and the creepiness factor</title>
		<link>http://digitalgarbage.net/2006/06/12/facebook/</link>
		<comments>http://digitalgarbage.net/2006/06/12/facebook/#comments</comments>
		<pubDate>Mon, 12 Jun 2006 18:24:55 +0000</pubDate>
		<dc:creator>Ira Nathenson</dc:creator>
				<category><![CDATA[Reputation]]></category>
		<category><![CDATA[Social Networking]]></category>
		<category><![CDATA[Blogging]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Infoglut]]></category>
		<category><![CDATA[Information]]></category>
		<category><![CDATA[Law School]]></category>
		<category><![CDATA[NSA]]></category>
		<category><![CDATA[Privacy]]></category>
		<category><![CDATA[Searching]]></category>
		<category><![CDATA[Surveillance]]></category>

		<guid isPermaLink="false">http://digitalgarbage.net/2006/06/12/facebook-job-hunting-non-invisibility-and-the-creepiness-factor/</guid>
		<description><![CDATA[Note to job applicants: your potential employers aren&#8217;t just looking at Google and Yahoo. Sunday&#8217;s New York Times includes a really interesting article by Alan Finder on how some companies now investigate job applicants on social networking sites such as &#8230; <a href="http://digitalgarbage.net/2006/06/12/facebook/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><span>Note to job applicants: your potential employers aren&#8217;t just looking at Google and Yahoo. </span><span> </span></p>
<p><span>Sunday&#8217;s New York Times includes a really interesting article by Alan Finder </span><span>on how some companies now investigate job applicants on social networking sites such as Facebook, MySpace, Xanga, and Friendster.  See <span>&#8220;<a href="http://www.nytimes.com/2006/06/11/us/11recruit.html"><span style="color: #800080;">For Some, Online Persona Undermines a Résumé</span></a>.&#8221; </span></span></p>
<p><span>The article underscores a simple but important fact: users of social network sites shouldn&#8217;t assume that their postings are private.  Although names like &#8220;MySpace&#8221; paint an image of personal spaces, personal doesn&#8217;t mean private.  It&#8217;s not difficult to get into these sites – as the article notes, for some sites such as MySpace, you generally only need to register.  For Facebook, to view entries for a particular college, you only need an e-mail address from that college.</span></p>
<p><span>That means an awful lot of people can view Facebook entries: alumni with email addresses (which could include potential employers), professors, even campus police.  Despite this, at an emotional level, many people assume that their personal websites, blogs, and social network postings are relatively personal spaces that won&#8217;t be noticed or invaded by others.  These assumptions are wrong in at least two ways.</span></p>
<p><span><span id="more-32"></span>First, people might assume – incorrectly – that they&#8217;re not going to be noticed.  True, most postings to personal websites, blogs, and social networking sites are probably viewed by hardly anyone, and at best by only a few of the poster&#8217;s friends.  Because of this, people get a sense of false security that they&#8217;re broadcasting only to their personal crowd.  That&#8217;s probably true for the most part, unless somebody&#8217;s looking you up.  As said by <a href="http://scrawford.blogware.com/blog/_archives/2006/4/6/1866674.html"><span style="color: #800080;">Susan Crawford</span></a> in an excellent posting on social networking, &#8220;Oddly, people using these spaces may feel that they’re just having a conversation with their friends, not thinking about large-scale, perhaps automated searches/hunts about them carried out.  This is like being on a live TV interview, and seeing only the guy across from you, and not realizing that anyone can see you in the world.&#8221;</span></p>
<p><span>Susan&#8217;s right.  Many posters assume that internet infoglut makes them invisible; after all, how will they stand out from the millions of other postings?  But infoglut doesn&#8217;t create invisibility.  At best, posters are <em><span>relatively invisible</span></em>.  But when you combine social networking sites with indexing and searching capacities, relative invisibility can be fleeting.</span></p>
<p><span>Second, posters seem to expect – dangerously – that outsiders shouldn&#8217;t and therefore won&#8217;t intrude into their spaces.  In the blogging context, <a href="http://madisonian.net/archives/2006/04/14/end-of-the-semester-thoughts/#more-617">Mike Madison</a> recounts an instance where he forwarded to a Pitt Law colleague a link to a blog posting about that prof and another faculty member.  One of them then casually mentioned to the student blogger that he or she had read the post.  As Mike says, &#8220;The student was a bit surprised, I think; students generally expect that their blogging is their &#8216;space,&#8217; and faculty (and others) shouldn’t intrude.&#8221;</span></p>
<p><span>But outsiders do intrude, and they might include law enforcement authorities.  <a href="http://www.freedom-to-tinker.com/?p=994"><span style="color: #800080;">Ed Felten</span></a> has described the use of social network sites by Princeton&#8217;s Public Safety officers (i.e., the Princeton campus police) in investigations into alcohol use and campus building-climbing.  Particularly interesting is the <a href="http://www.dailyprincetonian.com/archives/2006/03/17/opinion/14912.shtml"><span style="color: #800080;">controversy that ensued</span></a> after it was revealed that Facebook was used in the investigations.  In the end, Ed reports that &#8220;Public Safety stated that it would not hunt around randomly on Facebook, but it would continue to use Facebook as a tool in specific investigations.  Many people consider this a reasonable compromise.&#8221;  Ed further noted, &#8220;It feels right to me, though I can’t quite articulate why.&#8221; </span><span><span> </span></span></p>
<p><span>Mike&#8217;s and Ed&#8217;s postings both touch upon a sense of some and perhaps many students that outsiders – professors, campus authorities, etc. – are not particularly welcome at student sites.  That&#8217;s somewhat understandable: think of the family reunion where an older, uncool uncle hangs around a bit too long with the younger folks.  I’d call this the creepiness factor.  The creepiness factor is amplified when it&#8217;s law enforcement authorities who come visiting.  But expectations that outsiders will stay away are dangerous.  Considering the relative anonymity of web surfing, it’s doubtful that social norms will emerge to deter others from browsing student sites.  If anything, the tremendous attention being given to social networking guarantees that more people will check these sites out.</span></p>
<p><span>Nonetheless, Ed&#8217;s posting suggests at least one way in which <em>institutions</em> might be pressured into adopting norms that limit their review of social networking sites.  As Ed notes, after student outrage, the Princeton Public Safety director </span><span>promised to use Facebook only in specific investigations. </span><span><span><span><a href="http://www.dailyprincetonian.com/archives/2006/03/15/news/14871.shtml">The Daily Princetonian</a> reports that under new guidelines, &#8220;</span><span>Officers can continue to use Facebook as a supplementary source for investigations, but cannot scour the site for parties or other activities.  In addition, officers are prohibited from identifying themselves as students in their Facebook accounts.&#8221; </span></span></span><span>In discussing the compromise, Ed notes the difficulty in trying to articulate why it&#8217;s reasonable for campus police to use Facebook as part of a specific investigation but not as a tool for random hunting.</span></p>
<p><span>Ed&#8217;s right that it&#8217;s difficult to articulate what&#8217;s reasonable and what isn&#8217;t.  <span>Maybe the distinction goes back, at least in part, to the creepiness factor noted above. </span></span><span><span>Even if social network sites are public or semi-public, it&#8217;s creepy to think that law-enforcement authorities are trolling student sites on a general fishing expedition for inappropriate behavior.  (And the creepiness is magnified a thousandfold-plus when the materials being perused are private. </span><span>NSA, anyone?)</span></span></p>
<p><span><span>But it&#8217;s hard to conclude that it&#8217;s equally creepy for authorities to look up public materials as part of a specific investigation.  (Which begs uncertainty, however, over just what is meant by a &#8220;specific&#8221; investigation&#8230;)  And the same can probably be said, I think, about employment recruiters who use social networking sites to research specific applicants</span><span>. </span></span></p>
<p><span>Thanks very much to </span><span><a href="http://www.robhyndman.com/2006/06/11/red-flags-from-an-online-persona">Robhyndman.com</a>, where I discovered the link to the Times article.</span></p>
]]></content:encoded>
			<wfw:commentRss>http://digitalgarbage.net/2006/06/12/facebook/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

