<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Signal/Noise &#187; Big Data</title>
	<atom:link href="http://billpetti.com/tag/big-data/feed/" rel="self" type="application/rss+xml" />
	<link>http://billpetti.com</link>
	<description>Trying to separate the signal from the noise, one post at a time.</description>
	<lastBuildDate>Fri, 30 Sep 2011 21:49:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='billpetti.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/6cef924e9e2296437300917a41fb5f9c?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>Signal/Noise &#187; Big Data</title>
		<link>http://billpetti.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://billpetti.com/osd.xml" title="Signal/Noise" />
	<atom:link rel='hub' href='http://billpetti.com/?pushpress=hub'/>
		<item>
		<title>The Danger of Data without Theory</title>
		<link>http://billpetti.com/2010/08/13/the-danger-of-data-without-theory/</link>
		<comments>http://billpetti.com/2010/08/13/the-danger-of-data-without-theory/#comments</comments>
		<pubDate>Fri, 13 Aug 2010 12:40:15 +0000</pubDate>
		<dc:creator>Bill Petti</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[research methodology]]></category>
		<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://billpetti.com/?p=2569</guid>
		<description><![CDATA[I came across this Chris Anderson piece from a 2008 issue of Wired via Ana Andjelic.  Anderson argues that in the era of Big Data we no longer need to rely on theory and the scientific method to achieve advances in knowledge: Google&#8217;s founding philosophy is that we don&#8217;t know why this page is better [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=2569&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I came across <a href="http://www.wired.com/science/discoveries/magazine/16-07/pb_theory" target="_blank">this Chris Anderson piece</a> from a 2008 issue of Wired via <a href="http://anaandjelic.typepad.com/" target="_blank">Ana Andjelic</a>.  Anderson argues that in the era of Big Data we no longer need to rely on theory and the scientific method to achieve advances in knowledge:</p>
<blockquote><p>Google&#8217;s founding philosophy is that we don&#8217;t know why this page is better than that one: If the statistics of incoming links say it is, that&#8217;s good enough. No semantic or causal analysis is required. That&#8217;s why Google can translate languages without actually &#8220;knowing&#8221; them (given equal corpus data, Google can translate Klingon into Farsi as easily as it can translate French into German). And why it can match ads to content without any knowledge or assumptions about the ads or the content.</p>
<p>Speaking at the O&#8217;Reilly Emerging Technology Conference this past March, Peter Norvig, Google&#8217;s research director, offered an update to George Box&#8217;s maxim: &#8220;All models are wrong, and increasingly you can succeed without them.&#8221;</p>
<p>&#8230;faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete.</p>
<p>There is now a better way. Petabytes allow us to say: &#8220;Correlation is enough.&#8221; We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.</p></blockquote>
<p><a href="http://www.nature.com/nature/journal/v462/n7274/full/462722a.html"><img class="alignleft" src="http://www.nature.com/nature/journal/v462/n7274/images/462722a-i1.0.jpg" alt="" width="168" height="197" /></a>There is certainly value in sophisticated data mining and an inductive approach to research, but to dismiss the deductive approach (construct theory&gt;deduce testable hypotheses&gt;empirically verify or falsify hypotheses) would be shortsighted.  Modern data mining may be enough to authoritatively establish a non-random relationship, and in some cases (translations and advertising) more than suffices for useful application.  However, even the largest data sets still represent only a sample&#8211;and, therefore, an approximation&#8211;of reality.  Moreover, establishing correlation still doesn&#8217;t get you to the underlying <a href="http://www-personal.umd.umich.edu/~delittle/resources/causal%20mechanism.pdf" target="_blank">causal mechanisms</a> that drive causation.  Even if Google, with enough data and advanced statistical techniques, can claim that a causal relationship exists it can&#8217;t tell you <em>why</em> it exists.</p>
<p>For some subjects, &#8220;why&#8221; may not matter&#8211;do we care why Google&#8217;s program is able to accurately translate between languages, or is the practical effect enough for us?  But for others it is crucial when thinking about how to construct an intervention to alter some state of being (e.g. a medical condition, poverty, civil war, etc).   Understanding causal mechanisms can also help us think through the consequences of an intervention&#8211;what are some potential side effects?  Are there other, seemingly unrelated, areas that might be affected by the intervention in a negative way?  When we are dealing with more interconnected, complex systems (like human physiology or society) it behooves us to go beyond relationships and understand what levers are being pulled.</p>
<p><strong>Update: </strong>My friend Drew pointed me to <a href="http://www.drewconway.com/zia/?p=209" target="_blank">his reaction</a> to Anderson&#8217;s piece when it originally came out&#8211;worth a read.</p>
<br /> Tagged: <a href='http://billpetti.com/tag/big-data/'>Big Data</a>, <a href='http://billpetti.com/tag/research-methodology/'>research methodology</a>, <a href='http://billpetti.com/tag/science/'>Science</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/billpetti.wordpress.com/2569/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/billpetti.wordpress.com/2569/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/billpetti.wordpress.com/2569/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/billpetti.wordpress.com/2569/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/billpetti.wordpress.com/2569/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/billpetti.wordpress.com/2569/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/billpetti.wordpress.com/2569/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/billpetti.wordpress.com/2569/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/billpetti.wordpress.com/2569/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/billpetti.wordpress.com/2569/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/billpetti.wordpress.com/2569/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/billpetti.wordpress.com/2569/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/billpetti.wordpress.com/2569/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/billpetti.wordpress.com/2569/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=2569&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://billpetti.com/2010/08/13/the-danger-of-data-without-theory/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/83d0c69bc078d64ebe36a701cbf755b2?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">billpetti</media:title>
		</media:content>

		<media:content url="http://www.nature.com/nature/journal/v462/n7274/images/462722a-i1.0.jpg" medium="image" />
	</item>
		<item>
		<title>&#8220;Statistics is the New Grammar&#8221;</title>
		<link>http://billpetti.com/2010/04/20/statistics-is-the-new-grammar/</link>
		<comments>http://billpetti.com/2010/04/20/statistics-is-the-new-grammar/#comments</comments>
		<pubDate>Tue, 20 Apr 2010 17:14:16 +0000</pubDate>
		<dc:creator>Bill Petti</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[data-driven world]]></category>
		<category><![CDATA[education]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://billpetti.com/?p=2078</guid>
		<description><![CDATA[In the latest issue of WIRED, Clive Thompson pens a great piece which echoes a sentiment I&#8217;ve touched on before: in a data-driven world it is critical that all citizens have at least a basic literacy in statistics. Now and in the future, we will have unprecedented access to voluminous amounts of data.  The analysis of this [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=2078&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In the latest issue of WIRED, <a href="http://www.wired.com/magazine/2010/04/st_thompson_statistics/" target="_blank">Clive Thompson pens a great piece</a> which echoes a sentiment I&#8217;ve touched on before: in a data-driven world it is critical that all citizens have at least a basic literacy in statistics.</p>
<p>Now and in the future, we will have unprecedented access to voluminous amounts of data.  The analysis of this data and the conclusions drawn from it will have a major impact on public policy, business, and personal decisions.  The net effect of this could go either way&#8211;it can usher in a period of unprecedented efficiency, novelty, and positive decision making or it can precipitate deleterious actions.  Data does not speak for itself.  How we analyze and interpret that data matters a great deal, which puts a premium on statistical literacy for everyone&#8211;not just PhDs and policy wonks.</p>
<p>Thompson notes a number of statistical fallacies that many, including members of the media, fall prey to.  Using a single event to prove or disprove a general property is one spectacular one that we see all the time, particularly with large, macro-level events.  Regardless of what side of the climate change debate you are on a single snow storm or record-breaking heat wave does not rise to the level of hypothesis-nullifying or -verifying evidence.</p>
<blockquote><p>There are oodles of other examples of how our inability to grasp statistics&#8211;and the mother of it all, probability&#8211;makes us believe stupid things.  Gamblers think their number is more likely to come up this time because it didn&#8217;t come up last time.  Political polls are touted by the media even when their samples are laughably skewed.</p></blockquote>
<p>Take correlation and causation.  The cartoon below nicely illustrates the common fallacy that the correlation of two events is enough to prove that one causes the other:</p>
<div class="wp-caption aligncenter" style="width: 453px"><a href="http://www.few.vu.nl/~wrvhage/images/pavlov.gif"><img src="http://www.few.vu.nl/~wrvhage/images/pavlov.gif" alt="" width="443" height="352" /></a><p class="wp-caption-text">Correlation and Causation</p></div>
<p>Bottom line: the importance of statistical literacy will only increase.  Statistics will come to permeate our lives, more so than ever before.  And if we want to truly take advantage of this we had better learn to speak the language.</p>
<br /> Tagged: <a href='http://billpetti.com/tag/big-data/'>Big Data</a>, <a href='http://billpetti.com/tag/data-driven-world/'>data-driven world</a>, <a href='http://billpetti.com/tag/education/'>education</a>, <a href='http://billpetti.com/tag/statistics/'>statistics</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/billpetti.wordpress.com/2078/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/billpetti.wordpress.com/2078/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/billpetti.wordpress.com/2078/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/billpetti.wordpress.com/2078/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/billpetti.wordpress.com/2078/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/billpetti.wordpress.com/2078/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/billpetti.wordpress.com/2078/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/billpetti.wordpress.com/2078/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/billpetti.wordpress.com/2078/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/billpetti.wordpress.com/2078/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/billpetti.wordpress.com/2078/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/billpetti.wordpress.com/2078/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/billpetti.wordpress.com/2078/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/billpetti.wordpress.com/2078/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=2078&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://billpetti.com/2010/04/20/statistics-is-the-new-grammar/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/83d0c69bc078d64ebe36a701cbf755b2?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">billpetti</media:title>
		</media:content>

		<media:content url="http://www.few.vu.nl/~wrvhage/images/pavlov.gif" medium="image" />
	</item>
		<item>
		<title>The Era of Big Data: IBM Gets It</title>
		<link>http://billpetti.com/2010/04/19/the-era-of-big-data-ibm-gets-it/</link>
		<comments>http://billpetti.com/2010/04/19/the-era-of-big-data-ibm-gets-it/#comments</comments>
		<pubDate>Mon, 19 Apr 2010 12:15:48 +0000</pubDate>
		<dc:creator>Bill Petti</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[Big Data]]></category>

		<guid isPermaLink="false">http://billpetti.com/?p=1959</guid>
		<description><![CDATA[I&#8217;ve written before about how IBM dove headfirst into the world of Big Data.  They&#8217;ve made a big bet on the revolutionary possibilities available to business, governments, and individuals given the revolution in data capture and analytics we are entering.  At this point you&#8217;ve all seen this point made in various ways through IBM&#8217;s Smarter Planet [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=1959&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve written before about how <a href="http://billpetti.com/2009/11/16/cloud-analytics-from-big-blue/" target="_blank">IBM dove headfirst</a> into the world of <a href="http://billpetti.com/2009/08/07/profiting-from-an-analytically-driven-world/" target="_blank">Big Data</a>.  They&#8217;ve made a big bet on the revolutionary possibilities available to business, governments, and individuals given the revolution in data capture and analytics we are entering.  At this point you&#8217;ve all seen this point made in various ways through IBM&#8217;s <a href="http://www.ibm.com/smarterplanet/us/en/" target="_blank">Smarter Planet campaign</a>.  Here&#8217;s one of the ads where they make their case for why data matters:</p>
<span style="text-align:center; display: block;"><a href="http://billpetti.com/2010/04/19/the-era-of-big-data-ibm-gets-it/"><img src="http://img.youtube.com/vi/AnL98lQdqa8/2.jpg" alt="" /></a></span>
<p>Combine sophisticated data analytics with the fact that <a href="http://www.aolnews.com/science/article/scientists-make-it-official-people-are-so-predictable/19364257" target="_blank">people, it turns out, are actually rather predictable</a> in their routines and you end up with a myriad of possibilities for business and public policy.</p>
<p>Via <a href="http://twitter.com/alexlundry/status/10735720137" target="_blank">Alex Lundry</a></p>
<p>[Note: I will be traveling over the next few days, so posting will be light-to-nonexistent]</p>
<br /> Tagged: <a href='http://billpetti.com/tag/analytics/'>analytics</a>, <a href='http://billpetti.com/tag/big-data/'>Big Data</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/billpetti.wordpress.com/1959/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/billpetti.wordpress.com/1959/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/billpetti.wordpress.com/1959/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/billpetti.wordpress.com/1959/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/billpetti.wordpress.com/1959/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/billpetti.wordpress.com/1959/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/billpetti.wordpress.com/1959/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/billpetti.wordpress.com/1959/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/billpetti.wordpress.com/1959/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/billpetti.wordpress.com/1959/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/billpetti.wordpress.com/1959/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/billpetti.wordpress.com/1959/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/billpetti.wordpress.com/1959/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/billpetti.wordpress.com/1959/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=1959&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://billpetti.com/2010/04/19/the-era-of-big-data-ibm-gets-it/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/83d0c69bc078d64ebe36a701cbf755b2?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">billpetti</media:title>
		</media:content>
	</item>
		<item>
		<title>Cloud Analytics from Big Blue</title>
		<link>http://billpetti.com/2009/11/16/cloud-analytics-from-big-blue/</link>
		<comments>http://billpetti.com/2009/11/16/cloud-analytics-from-big-blue/#comments</comments>
		<pubDate>Mon, 16 Nov 2009 13:46:20 +0000</pubDate>
		<dc:creator>Bill Petti</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://billpetti.com/?p=1225</guid>
		<description><![CDATA[Music to analytically-driven ears: [...] IBM is unveiling a new internal analytics product that the company is touting as the “largest private cloud computing environment for business analytics in the world,” which launches internally with more than a petabyte of information. Along with this internal product, IBM will launch a companion product for clients to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=1225&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="This is music to my analytically-driven ears" target="_blank">Music to analytically-driven ears</a>:</p>
<blockquote><p>[...] IBM is unveiling a new internal analytics product that the company is touting as the “largest private cloud computing environment for business analytics in the world,” which launches internally with more than a petabyte of information. Along with this internal product, IBM will launch a companion product for clients to build upon this cloud-based architecture, called IBM Smart Analytics Cloud.</p>
<p>The internal product, dubbed Blue Insight, will provide 200,000 employees in IBM’s sales and development department with the ability to extract data and information to make decisions and gain further insight at the point of sale. Blue Insight will gather information from nearly 100 different information warehouses and data stores, providing analytics on more than a petabyte (1,000 terabytes or 1,000,000 gigabytes) of data. For example, sales execs may use customizable queries of real time data to understand revenue opportunities and how many sales in their region are closing to help improve prediction. Or a manufacturing process engineer can evaluate real-time data on the plant floor to identify trends and data to improve yield and reduce shipment delivery times.</p>
<p>IBM Smart Analytics Cloud offering for clients will similarly deliver powerful business intelligence via the scalable, private cloud. The product lets the client import data and than transform this information into insights to develop strategies and decisions. The service will offer the ability to create reports, analysis, dashboards, and scorecards to monitor business performance and measure results.</p></blockquote>
<p>The key to good analytics is not just the accumulation of relevant data, but also the ability to manipulate and visualize the data in meaningful ways.  The ability to create custom dashboards, etc, is an important part of making analytics a useful and profitable tool.  IBM has certainly dedicated itself to expanding its service and software offerings relative to its traditional hardware business, and analytics has emerged as a huge part of that strategy.  In doing so, it should be well positioned to profit from the coming <a href="2009/08/07/profiting-from-an-analytically-driven-world/" target="_blank">analytically-driven</a> <a href="2009/08/14/more-on-a-data-driven-world/" target="_blank">world</a>.</p>
<br /> Tagged: analytics, Big Data, statistics <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/billpetti.wordpress.com/1225/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/billpetti.wordpress.com/1225/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/billpetti.wordpress.com/1225/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/billpetti.wordpress.com/1225/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/billpetti.wordpress.com/1225/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/billpetti.wordpress.com/1225/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/billpetti.wordpress.com/1225/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/billpetti.wordpress.com/1225/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/billpetti.wordpress.com/1225/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/billpetti.wordpress.com/1225/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/billpetti.wordpress.com/1225/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/billpetti.wordpress.com/1225/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/billpetti.wordpress.com/1225/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/billpetti.wordpress.com/1225/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=1225&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://billpetti.com/2009/11/16/cloud-analytics-from-big-blue/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/83d0c69bc078d64ebe36a701cbf755b2?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">billpetti</media:title>
		</media:content>
	</item>
		<item>
		<title>We are all creatives now (or, at least, will be by 2013)</title>
		<link>http://billpetti.com/2009/10/23/we-are-all-creatives-or-at-least-will-be-by-2013/</link>
		<comments>http://billpetti.com/2009/10/23/we-are-all-creatives-or-at-least-will-be-by-2013/#comments</comments>
		<pubDate>Fri, 23 Oct 2009 11:05:49 +0000</pubDate>
		<dc:creator>Bill Petti</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[creatives]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[Nate Silver]]></category>
		<category><![CDATA[noise]]></category>
		<category><![CDATA[signals]]></category>
		<category><![CDATA[social media]]></category>
		<category><![CDATA[sophisticated aggregation]]></category>

		<guid isPermaLink="false">http://billpetti.com/?p=889</guid>
		<description><![CDATA[SEED published an article the other day that discussed the coming impact of near total authorship.  The gist of the article is that at some point, nearly everyone will be able to publish content and that this will have profound implications for society in much the same way that near universal literacy has. So what [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=889&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>SEED <a href="http://seedmagazine.com/content/article/a_writing_revolution/" target="_blank">published an article the other day</a> that discussed the coming impact of near total authorship.  The gist of the article is that at some point, nearly everyone will be able to publish content and that this will have profound implications for society in much the same way that near universal literacy has.</p>
<p style="text-align:left;">
<div class="wp-caption aligncenter" style="width: 471px"><a href="http://seedmagazine.com/images/uploads/authors-per-year_inline_640x262.jpg" target="_blank"><img class="   " title="Authorship over time" src="http://seedmagazine.com/images/uploads/authors-per-year_inline_640x262.jpg" alt="Authorship over time" width="461" height="189" /></a><p class="wp-caption-text">Authorship over time</p></div>
<p>So what are the implications for universal authorship?  SEED mentions one and I have another that comes to mind.</p>
<p>The first implication is in the article: as more and more people become creators of content it will become increasingly difficult for organizations of all kinds to control their messaging and their brand.  Provide a bad customer experience and it will be on facebook, immediately broadcast to hundreds if not thousands of people.  Discriminate against someone because of their race or sexual orientation and they will tweet about it, likely prompting dozens of re-tweets which allows the issue to reach exponentially more people, who then might blog about it, and so on, and so on.  Organizations are already finding it hard to control their brand, imagine when the number of authors increases 10 fold (which, according to SEED, will now happen yearly).  Theoretically, firms and technologies that can monitor content related to an organization will be big winners.  Additionally, organizations will need to invest more heavily in their own networks and crowds to help combat negative content (whether true or false).<span id="more-889"></span></p>
<p>Another implication that immediately came to my mind is the increased difficulty in separating signals from noise.   As the cost of entry into the market for content heads towards zero and the tools of creativity are fully democratized, there will be an even greater explosion in content from which individuals and organizations will have to separate relevant, accurate pieces of information.  There is already a flood of information to wade through and it is becoming more difficult to do so.  Exponentially increase the amount of content and the variety of sources and you&#8217;ve taken the problem to an even greater level.  All things being equal, more content and a more fractured supply source will only increase the amount of noise and make identifying the signals (the accurate pieces of information) more difficult.  What we will need, and what will become valuable, are services that don&#8217;t simply aggregate content, but also determine their level of accuracy and credibility.</p>
<div class="wp-caption alignleft" style="width: 226px"><a href="http://timetoeatthedogs.files.wordpress.com/2008/11/nate-silver.jpg"><img class="  " title="Via Ron Kaplan" src="http://timetoeatthedogs.files.wordpress.com/2008/11/nate-silver.jpg?w=216&h=133" alt="Nate Silver" width="216" height="133" /></a><p class="wp-caption-text">Nate Silver</p></div>
<p>I am thinking here of services that mimic the approach of Nate Silver at <a href="What's Wrong with the World's Leading Media Companies&quot;, by Ava Seave, Jonathan Knee, and Bruce Greenwald." target="_blank">FiveThirtyEight</a>.  The world didn&#8217;t lack political polls, but it did lack a methodology for cutting through the noise created by dozens of polls, many providing contradictory predictions.  Silver came up with a way to not just aggregate polls, but to increase the ratio of signal to noise, allowing for a more accurate portrayal of public opinion and ultimately a prediction of Presidential elections.  Silver made polls better by developing what I would call a sophisticated aggregation methodology (<em>note: I will be writing more on this soon</em>).  The key will be to develop a scalable, replicable approach that can be applied to a variety of domains.</p>
<p>Soon enough, we may all be creatives.   And that means there will be a heck of a lot more chaff to wade through.</p>
<div class="zemanta-pixie" style="margin-top:10px;height:15px;"><a class="zemanta-pixie-a" title="Reblog this post [with Zemanta]" href="http://reblog.zemanta.com/zemified/ed50202c-fe11-48e1-8246-edfc9207cfdf/"><img class="zemanta-pixie-img" style="border:medium none;float:right;" src="http://img.zemanta.com/reblog_e.png?x-id=ed50202c-fe11-48e1-8246-edfc9207cfdf" alt="Reblog this post [with Zemanta]" /></a></div>
<br /> Tagged: Big Data, creatives, data, Nate Silver, noise, signals, social media, sophisticated aggregation <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/billpetti.wordpress.com/889/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/billpetti.wordpress.com/889/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/billpetti.wordpress.com/889/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/billpetti.wordpress.com/889/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/billpetti.wordpress.com/889/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/billpetti.wordpress.com/889/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/billpetti.wordpress.com/889/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/billpetti.wordpress.com/889/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/billpetti.wordpress.com/889/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/billpetti.wordpress.com/889/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/billpetti.wordpress.com/889/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/billpetti.wordpress.com/889/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/billpetti.wordpress.com/889/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/billpetti.wordpress.com/889/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=889&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://billpetti.com/2009/10/23/we-are-all-creatives-or-at-least-will-be-by-2013/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/83d0c69bc078d64ebe36a701cbf755b2?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">billpetti</media:title>
		</media:content>

		<media:content url="http://seedmagazine.com/images/uploads/authors-per-year_inline_640x262.jpg" medium="image">
			<media:title type="html">Authorship over time</media:title>
		</media:content>

		<media:content url="http://timetoeatthedogs.files.wordpress.com/2008/11/nate-silver.jpg" medium="image">
			<media:title type="html">Via Ron Kaplan</media:title>
		</media:content>

		<media:content url="http://img.zemanta.com/reblog_e.png?x-id=ed50202c-fe11-48e1-8246-edfc9207cfdf" medium="image">
			<media:title type="html">Reblog this post [with Zemanta]</media:title>
		</media:content>
	</item>
		<item>
		<title>“Science these days has basically turned into a data-management problem&#8221;</title>
		<link>http://billpetti.com/2009/10/14/%e2%80%9cscience-these-days-has-basically-turned-into-a-data-management-problem/</link>
		<comments>http://billpetti.com/2009/10/14/%e2%80%9cscience-these-days-has-basically-turned-into-a-data-management-problem/#comments</comments>
		<pubDate>Wed, 14 Oct 2009 11:17:59 +0000</pubDate>
		<dc:creator>Bill Petti</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data visualization]]></category>

		<guid isPermaLink="false">http://billpetti.com/?p=786</guid>
		<description><![CDATA[So says Professor Jimmy Lin at the University of Maryland in a recent NYT Technology article about the shortfall in &#8220;Big Data-competent&#8221; university students.  The article points out that the kind of data we are now dealing with (which will only continue to increase exponentially) requires a different perspective and experience than most currently have.  [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=786&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>So says Professor Jimmy Lin at the University of Maryland in a recent <a href="heading to NYC, listening to John Mayer Trio's &quot;Another Kind of Green&quot;" target="_blank">NYT Technology article</a> about the shortfall in &#8220;Big Data-competent&#8221; university students.  The article points out that the kind of data we are now dealing with (which will only continue to increase exponentially) requires a different perspective and experience than most currently have.  Firms that have a vested interest in workers with these skills, such as Google and I.B.M., have partnered with universities in an effort to change the frame of reference for students.</p>
<p>This underscores the <a href="http://billpetti.com/2009/08/07/profiting-from-an-analytically-driven-world/" target="_blank">comparative</a> <a href="http://billpetti.com/2009/08/14/more-on-a-data-driven-world/" target="_blank">advantage</a> of <a href="http://billpetti.com/2009/08/22/challenges-of-consuming-real-time-data/" target="_blank">individuals</a> with <a href="http://billpetti.com/2009/09/11/the-soft-sciences-to-get-their-day/" target="_blank">skills</a> amenable to collecting, coding, manipulating, and visualizing Big Data in the current labor market.  Additionally, as not all data will be easily collected and coded via computer programs, <a href="http://billpetti.com/2009/09/16/crowdflower-live-from-techcrunch50/" target="_blank">services</a> that can efficiently harness <a href="http://www.google.com/url?q=https://www.mturk.com/mturk/welcome&amp;ei=w2rUSsH_GNPd8Qa0kPSHDQ&amp;sa=X&amp;oi=spellmeleon_result&amp;resnum=1&amp;ct=result&amp;ved=0CAkQhgIwAA&amp;usg=AFQjCNEX45pwKPZxFM7TogCuzTVWBdmEDg" target="_blank">the crowd</a> in support of Big Data will also be critically important.</p>
<p style="text-align:center;">
<div class="wp-caption aligncenter" style="width: 388px"><a href="http://www.wired.com/science/discoveries/magazine/16-07/pb_visualizing"><img class=" " src="http://www.wired.com/images/article/magazine/1607/pb_visualizing_f.jpg" alt="A visualization of thousands of Wikipedia edits that were made by a single software bot. Each color corresponds to a different page." width="378" height="352" /></a><p class="wp-caption-text">A visualization of thousands of Wikipedia edits that were made by a single software bot. Each color corresponds to a different page.</p></div>
<p><em>Photo via <a href="http://www.wired.com/science/discoveries/magazine/16-07/pb_visualizing" target="_blank">Wired</a></em></p>
<br /> Tagged: analytics, Big Data, crowdsourcing, data, data visualization <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/billpetti.wordpress.com/786/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/billpetti.wordpress.com/786/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/billpetti.wordpress.com/786/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/billpetti.wordpress.com/786/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/billpetti.wordpress.com/786/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/billpetti.wordpress.com/786/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/billpetti.wordpress.com/786/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/billpetti.wordpress.com/786/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/billpetti.wordpress.com/786/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/billpetti.wordpress.com/786/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/billpetti.wordpress.com/786/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/billpetti.wordpress.com/786/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/billpetti.wordpress.com/786/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/billpetti.wordpress.com/786/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=786&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://billpetti.com/2009/10/14/%e2%80%9cscience-these-days-has-basically-turned-into-a-data-management-problem/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/83d0c69bc078d64ebe36a701cbf755b2?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">billpetti</media:title>
		</media:content>

		<media:content url="http://www.wired.com/images/article/magazine/1607/pb_visualizing_f.jpg" medium="image">
			<media:title type="html">A visualization of thousands of Wikipedia edits that were made by a single software bot. Each color corresponds to a different page.</media:title>
		</media:content>
	</item>
		<item>
		<title>Crowdsourcing Data Coding</title>
		<link>http://billpetti.com/2009/09/16/crowdflower-live-from-techcrunch50/</link>
		<comments>http://billpetti.com/2009/09/16/crowdflower-live-from-techcrunch50/#comments</comments>
		<pubDate>Wed, 16 Sep 2009 10:17:02 +0000</pubDate>
		<dc:creator>Bill Petti</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[social science]]></category>

		<guid isPermaLink="false">http://billpetti.com/2009/09/15/crowdflower-live-from-techcrunch50/</guid>
		<description><![CDATA[I just finished watching the video below of CrowdFlower&#8217;s presentation at the TechCrunch50 conference.  CrowdFlower is a plaform that allows firms to crowdsource various tasks, such as populating a spreadsheet with email addresses or selecting stills from thousands of videos that have particular qualities.  The examples in the video include very labor intensive tasks, but [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=608&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I just finished watching the video below of CrowdFlower&#8217;s presentation at the TechCrunch50 conference.  <a href="http://crowdflower.com/" target="_blank">CrowdFlower</a> is a plaform that allows firms to crowdsource various tasks, such as populating a spreadsheet with email addresses or selecting stills from thousands of videos that have particular qualities.  The examples in the video include very labor intensive tasks, but tasks that a firm is not likely to either need again or feels is worth dedicating staff to.</p>
<p><span style="display:block;width:425px;margin:0 auto;"> <embed src='http://widgets.vodpod.com/w/video_embed/ExternalVideo.873131' type='application/x-shockwave-flash' AllowScriptAccess='sameDomain' pluginspage='http://www.macromedia.com/go/getflashplayer' wmode='transparent' flashvars='loc=%2F&autoplay=false&vid=2167086' width='425' height='350' /></span></p>
<div style="font-size:10px;">more about &#8220;<a href="http://vodpod.com/watch/2196384-crowdflower-live-from-techcrunch50?pod=">CrowdFlower, Live From TechCrunch50</a>&#8220;, posted with <a href="http://vodpod.com?r=wp">vodpod</a></div>
<p>As I was watching the video I thought about the potential to leverage such a platform for large-scale coding of qualitative data.<span id="more-608"></span>  Coming from the social sciences, often we find the need in large scale research for the massive coding of data, whether it is language from a speech, the tenor or sentiment of quotations (or newspaper articles in media studies), the nature of cases (i.e. did country A make a threat to country B, did country B back down as a result, etc.), or the responses from an open-ended survey.  Coding is an issue whether you conducting qualitative or quantitative analysis&#8211;especially where you have captured large amounts of data.  Often times the data is not inherently numerical and needs to be translated so that quantitative analysis can be conducted.  Likewise, with a qualitative approach one still needs to categorize various data points to allow for meaningful comparisons.</p>
<p>The interesting thing about a service like Crowdflower is that it can leverage a ready group of workers globally who are ready and willing to conduct the coding at a reasonable price.  Additionally, Crowdflower utilizes various real-time methods to ensure the quality of the coding.  Partially this is achieved through the scoring of coders relative to their past performance, how they fair on tasks that are &#8220;planted&#8221; by Crowdflower (i.e. salting with tasks where the correct answer is known ahead of time), and how much agreement there is between coders on various items.</p>
<p>The final method  comes up quite a bit in social science research when you have to determine how to categorize a given piece of data.  The level of agreement is crucial to confidently coding a particular case.  I would imagine that a platform such as CrowdFlower could make that task easier and more robust by quickly tapping into a larger pool of coders.</p>
<p>Has anyone used a service like CrowdFlower in this way (i.e. coding data from qualitative research)?  Would be interested in your perspective.</p>
<br /> Tagged: Big Data, coding, crowdsourcing, data, social science <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/billpetti.wordpress.com/608/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/billpetti.wordpress.com/608/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/billpetti.wordpress.com/608/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/billpetti.wordpress.com/608/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/billpetti.wordpress.com/608/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/billpetti.wordpress.com/608/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/billpetti.wordpress.com/608/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/billpetti.wordpress.com/608/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/billpetti.wordpress.com/608/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/billpetti.wordpress.com/608/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/billpetti.wordpress.com/608/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/billpetti.wordpress.com/608/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/billpetti.wordpress.com/608/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/billpetti.wordpress.com/608/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=608&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://billpetti.com/2009/09/16/crowdflower-live-from-techcrunch50/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/83d0c69bc078d64ebe36a701cbf755b2?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">billpetti</media:title>
		</media:content>
	</item>
		<item>
		<title>The &#8216;Soft Sciences&#8217; to get their Day?</title>
		<link>http://billpetti.com/2009/09/11/the-soft-sciences-to-get-their-day/</link>
		<comments>http://billpetti.com/2009/09/11/the-soft-sciences-to-get-their-day/#comments</comments>
		<pubDate>Sat, 12 Sep 2009 01:56:25 +0000</pubDate>
		<dc:creator>Bill Petti</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[social media]]></category>
		<category><![CDATA[social networks]]></category>
		<category><![CDATA[social science]]></category>

		<guid isPermaLink="false">http://billpetti.com/?p=583</guid>
		<description><![CDATA[In a recent report, Garnter proposes that as corporations try to benefit from the growth of social media they will come to rely more and more on employees with formal, advanced training in the social sciences. Gartner Vice President Kathy Harris discusses in some detail four areas of jobs needed in the near future. Though [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=583&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://www.eweek.com/c/a/IT-Management/There-Will-Be-Web-Jobs-for-Social-Scientists-138503/?kc=EWKNLCSM09012009STR" target="_blank">a recent report</a>, Garnter proposes that as corporations try to benefit from the growth of social media they will come to rely more and more on employees with formal, advanced training in the social sciences.</p>
<blockquote>
<div class="wp-caption alignright" style="width: 135px"><img class="  " src="http://www3.niu.edu/acad/psych/Millis/History/2003/Milgram_head.gif" alt="Stanley Milgram" width="125" height="139" /><p class="wp-caption-text">Stanley Milgram</p></div>
<p>Gartner Vice President Kathy Harris discusses in some detail four areas of jobs needed in the near future. Though she never really uses the words &#8220;social networks&#8221; the implication is that most companies aren&#8217;t really geared toward taking advantage of the impact of these online communities, and that the numbers will be too large to ignore, regardless of the business you are in.</p>
<p>“Many of the needed technical capabilities originate in the social sciences and are aimed at usability and adoption of technology-related business services,” Harris said in a release. “These capabilities embody the notion of ‘action at the interface’ between the enterprise and its markets or between business management and technology management. Therefore, organizations are likely to shift the responsibility for leveraging technology outside centralized IT organizations and into the business units responsible for growth and innovation of revenue, products and services.”</p></blockquote>
<div class="wp-caption alignleft" style="width: 143px"><img class=" " src="http://www.nndb.com/people/682/000117331/erving-goffman-2-sized.jpg" alt="Erving Goffman" width="133" height="151" /><p class="wp-caption-text">Erving Goffman</p></div>
<p>To me, if you combine the <a href="http://billpetti.com/2009/08/22/challenges-of-…real-time-data/" target="_blank">plethora</a> <a href="http://dataspora.com/blog/the-rise-of-the-data-web/" target="_blank">of data being generated</a> by Web 2.0 technologies with the inherent social and behavioral aspects of these technologies, it <a href="http://wp.me/pB5tD-2d" target="_blank">screams for individuals that have training in sophisticated research methodologies</a> (both quantitative and qualitative) as well as substantive subject&#8217;s that relate to sociology, psychology, and behavioral economics.  It may be creating a perfect storm where individuals with this particular skill set finally find themselves in high demand outside of the Ivory Tower.  As a trained social scientist myself, I also hope it puts to bed, once and for all, the short-sighted notion that the social sciences don&#8217;t really belong in the category of &#8216;science&#8217; compared to their physical cousins.</p>
<p>(Via <a href="http://www.jasonspector.com/" target="_blank">Jason Spector</a>)</p>
<br /> Tagged: Big Data, social media, social networks, social science <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/billpetti.wordpress.com/583/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/billpetti.wordpress.com/583/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/billpetti.wordpress.com/583/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/billpetti.wordpress.com/583/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/billpetti.wordpress.com/583/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/billpetti.wordpress.com/583/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/billpetti.wordpress.com/583/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/billpetti.wordpress.com/583/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/billpetti.wordpress.com/583/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/billpetti.wordpress.com/583/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/billpetti.wordpress.com/583/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/billpetti.wordpress.com/583/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/billpetti.wordpress.com/583/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/billpetti.wordpress.com/583/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=583&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://billpetti.com/2009/09/11/the-soft-sciences-to-get-their-day/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/83d0c69bc078d64ebe36a701cbf755b2?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">billpetti</media:title>
		</media:content>

		<media:content url="http://www3.niu.edu/acad/psych/Millis/History/2003/Milgram_head.gif" medium="image">
			<media:title type="html">Stanley Milgram</media:title>
		</media:content>

		<media:content url="http://www.nndb.com/people/682/000117331/erving-goffman-2-sized.jpg" medium="image">
			<media:title type="html">Erving Goffman</media:title>
		</media:content>
	</item>
		<item>
		<title>Challenges of Consuming Real-time Data</title>
		<link>http://billpetti.com/2009/08/22/challenges-of-consuming-real-time-data/</link>
		<comments>http://billpetti.com/2009/08/22/challenges-of-consuming-real-time-data/#comments</comments>
		<pubDate>Sat, 22 Aug 2009 12:22:42 +0000</pubDate>
		<dc:creator>Bill Petti</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[data]]></category>

		<guid isPermaLink="false">http://billpetti.wordpress.com/?p=351</guid>
		<description><![CDATA[I&#8217;ve run across quite a few stories lately discussing the 1) the revolution in data production we are living through and 2) the challenges we face in being able to sift through and view that data in a meaningful way through the web. The first comes from GigaOM, where Jennifer Martinez looks at the emerging [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=351&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve run across quite a few stories lately discussing the 1) the revolution in data production we are living through and 2) the challenges we face in being able to sift through and view that data in a meaningful way through the web.</p>
<p>The first comes from GigaOM, where Jennifer Martinez <a href="http://gigaom.com/2009/08/19/the-real-time-web-sifting-required/" target="_blank">looks at the emerging problem of trying to keep up with the constant flow of data</a> via status updates.  As our networks grow, and our use of various social networks increases, we are inundated with updates which often times leads to missing particular updates that we may be most interested in.  Additional, she notes that besides missing out on information you care about, this stream overload can lead to &#8220;<a href="http://gigaom.com/2009/08/13/the-evolution-of-blogging/">disjointed conversations</a> that <a href="http://gigaom.com/2008/11/28/with-twitter-a-desperate-need-for-context/">lack context</a>, making it hard to piece together and decipher what it all means&#8221;.  I can relate to this problem, and my &#8216;immersion&#8217; in social networks is average to above average.  I haven&#8217;t figured out an optimal way to keep up.  I try to utilize a few useful tools (e.g. Seesmic), but between social networks and Google Reader I find myself constantly playing catchup.</p>
<p>Michael Driscoll at Dataspora follows up on this theme providing a more high-level discussion of <a href="http://dataspora.com/blog/the-rise-of-the-data-web/">how the rise of data (vs. documents) conflicts with the architecture that underlies the web today</a>.  Current mark-up languages are geared towards, and ideal for, documents (e.g. HTML and XML), not the kind of streaming data that will come to dominate content.  To explain this point he provides a comparison of metaphors where documents=trees and data=streams:<br />
<span id="more-351"></span></p>
<blockquote><p>Trees are rooted and finite: you can’t chop up a tree and easily put it back together again (while XML has made concessions to <a href="http://www.w3.org/TR/xml-fragment">document fragments</a>, it is not a natural fit).</p>
<p>Streams can be split, sampled, and filtered. The divisibility of data streams lends itself to parallelism in a way that document trees do not. The stream paradigm conceives of data as extending infinitely forward in time. The Twitter data stream has no end: it ought have no end tag.</p>
<p>Conceiving of data as streams moves us out of the realm of static objects and into the <a href="http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-24.html#%_sec_3.5">realm of signal processing</a>.  This is the domain of the living: where the web is not an archive but an organism.</p></blockquote>
<p>Finally, Ben Lorica at O&#8217;Reilly Radar discuss <a href="http://radar.oreilly.com/2009/08/big-data-and-real-time-structured-data-analytics.html">the challenges with trying to analyze large amounts of data in near real-time</a>.  As there are a number of potential solutions for the structured data that we are generating, there is a less obvious way to deal with the immense unstructured data.  He notes recent work by a team at UC Berkeley that was able to take unstructured data and, leveraging <a href="http://en.wikipedia.org/wiki/Named_entity_recognition">entity extraction</a>, turned it into structured data for a SQL database.</p>
<br /> Tagged: analytics, Big Data, data <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/billpetti.wordpress.com/351/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/billpetti.wordpress.com/351/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/billpetti.wordpress.com/351/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/billpetti.wordpress.com/351/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/billpetti.wordpress.com/351/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/billpetti.wordpress.com/351/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/billpetti.wordpress.com/351/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/billpetti.wordpress.com/351/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/billpetti.wordpress.com/351/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/billpetti.wordpress.com/351/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/billpetti.wordpress.com/351/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/billpetti.wordpress.com/351/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/billpetti.wordpress.com/351/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/billpetti.wordpress.com/351/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=351&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://billpetti.com/2009/08/22/challenges-of-consuming-real-time-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/83d0c69bc078d64ebe36a701cbf755b2?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">billpetti</media:title>
		</media:content>
	</item>
		<item>
		<title>More on a Data-driven World: Links &amp; Commentary</title>
		<link>http://billpetti.com/2009/08/14/more-on-a-data-driven-world/</link>
		<comments>http://billpetti.com/2009/08/14/more-on-a-data-driven-world/#comments</comments>
		<pubDate>Fri, 14 Aug 2009 23:03:40 +0000</pubDate>
		<dc:creator>Bill Petti</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data visualization]]></category>

		<guid isPermaLink="false">http://billpetti.wordpress.com/?p=191</guid>
		<description><![CDATA[Last week I wrote about the increasing demand for analytically-skilled, sophisticated statisticians by all sorts of companies looking to take advantage of our increasingly data-driven world.  This past Wednesday, the New York Times published another piece yet again highlighting this trend: As suggested by Daniel Pink’s assertions on the rise of a right-brained working elite, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=191&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Last week <a href="http://wp.me/pB5tD-2d" target="_blank">I wrote about</a> the increasing demand for analytically-skilled, sophisticated statisticians by all sorts of companies looking to take advantage of our increasingly data-driven world.  This past Wednesday, the New York Times <a href="http://nytimes.com/external/gigaom/2009/08/12/12gigaom-the-future-of-work-its-data-baby-50432.html" target="_blank">published another piece</a> yet again highlighting this trend:</p>
<blockquote><p>As suggested <a rel="nofollow" href="http://webworkerdaily.com/?p=15761" target="_blank">by Daniel Pink</a>’s assertions on the rise of a right-brained working elite, <em><strong>the ability to extract <span>stories </span>from a world of increasing and abundant data </strong></em>will be increasingly critical to many industries. Indeed, the opening of U.S. federal government data at <a rel="nofollow" href="http://www.data.gov/" target="_blank">data.gov</a> (and the <a rel="nofollow" href="http://www.guardian.co.uk/technology/2009/jun/10/berners-lee-downing-street-web-open" target="_blank">appointment of Sir Tim Berners-Lee</a> to similarly open the UK’s data archives) implies a new societal and cultural importance for data wranglers. (my emphasis)</p></blockquote>
<p>The article also included some great links for those looking to get started examining this new trend.  They include:</p>
<ol>
<li>The recently published book “<a href="http://oreilly.com/catalog/9780596157111/" target="_blank">Beautiful Data</a>” brings together essays some of the world’s most cutting-edge data practitioners — such as <a href="http://stamen.com/" target="_blank">Stamen Design</a> — on subjects as diverse as DNA analysis, crime maps and crowdsourcing.</li>
<li>Ben Fry’s PhD thesis “<a href="http://benfry.com/phd/" target="_blank">Computational Information Design</a>,” which outlines the need for a new field based on multiple disciplines.</li>
<li>The post “<a href="http://dataspora.com/blog/sexy-data-geeks/" target="_blank">Three Sexy Skills Of Data Geeks</a>,” which explains statistics, data munging and visualization — or studying, suffering and storytelling, as the author jokingly suggests.</li>
<li>Blogs such as <a href="http://dataspora.com/blog/sexy-data-geeks/" target="_blank">Dataspora</a> and <a rel="nofollow" href="http://flowingdata.com/" target="_blank">Flowing Data</a>.</li>
</ol>
<p>Some people may be asking what the big deal is.  Statisticians have been around forever and their techniques have become more sophisticated over time.  The big deal is that it isn&#8217;t just about statistics and crunching numbers.  It is about combining multiple disciplines&#8211;such as statistics and graphic design&#8211;at a time of unprecedented data accumulation so as to glean better insights through the collection, analysis, and <a href="http://www.fastcompany.com/blog/michael-cannell/cannell/visualization-new-frontier-design?1250338853" target="_blank">visualization of data</a>.  Most companies claim to have a focus on &#8216;analytics&#8217;, but in my experience this term and its sophistication in a business setting varies widely.  Getting the most out of data requires leadership to think deeply and strategically about what kinds of data would be most useful, what kind of measures most illuminating, and how potential insights gleaned from that data might change their go-to-market strategy as well as R&amp;D.  This should be correlated with a serious commitment to creating the necessary infrastructure (i.e. processes, systems) for collecting, analyzing, and visualizing the relevant data.  Like most things, it is a question of whether data and analytics are viewed as simply a nice feature or critical to growing and maintaining a business.  Will and vision, not just resources, are crucial.</p>
<p>For those that are interested in the new frontier of data and analytics I would also recommend the following:</p>
<ul>
<li><a href="http://anyall.org/blog/" target="_blank">AI and Social Science</a></li>
<li><a href="http://blog.jonudell.net/" target="_blank">John Udell&#8217;s Blog</a></li>
<li><a href="http://www.datawrangling.com/" target="_blank">Data Wrangling</a></li>
<li><a href="http://radar.oreilly.com/tim/" target="_blank">O&#8217;Reilly Radar</a></li>
<li><a href="http://flowingdata.com/2008/08/04/beginners-guide-to-flowingdata-a-guided-tour/" target="_blank">FlowingData Beginner&#8217;s Guide</a></li>
<li><a href="http://www.visualcomplexity.com/vc/blog/" target="_blank">Visual Complexity</a></li>
</ul>
<p>If anyone has additional links or recommended reading feel free to leave it in the comments section or <a href="mailto:wpetti@alumni.upenn.edu">email me</a>.</p>
<br /> Tagged: analytics, Big Data, data, data visualization <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/billpetti.wordpress.com/191/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/billpetti.wordpress.com/191/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/billpetti.wordpress.com/191/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/billpetti.wordpress.com/191/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/billpetti.wordpress.com/191/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/billpetti.wordpress.com/191/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/billpetti.wordpress.com/191/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/billpetti.wordpress.com/191/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/billpetti.wordpress.com/191/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/billpetti.wordpress.com/191/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/billpetti.wordpress.com/191/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/billpetti.wordpress.com/191/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/billpetti.wordpress.com/191/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/billpetti.wordpress.com/191/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=billpetti.com&#038;blog=8839193&#038;post=191&#038;subd=billpetti&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://billpetti.com/2009/08/14/more-on-a-data-driven-world/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/83d0c69bc078d64ebe36a701cbf755b2?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">billpetti</media:title>
		</media:content>
	</item>
	</channel>
</rss>
