Signal/Noise

Entries tagged as ‘Big Data’

Cloud Analytics from Big Blue

November 16, 2009 · Leave a Comment

Music to analytically-driven ears:

[...] IBM is unveiling a new internal analytics product that the company is touting as the “largest private cloud computing environment for business analytics in the world,” which launches internally with more than a petabyte of information. Along with this internal product, IBM will launch a companion product for clients to build upon this cloud-based architecture, called IBM Smart Analytics Cloud.

The internal product, dubbed Blue Insight, will provide 200,000 employees in IBM’s sales and development department with the ability to extract data and information to make decisions and gain further insight at the point of sale. Blue Insight will gather information from nearly 100 different information warehouses and data stores, providing analytics on more than a petabyte (1,000 terabytes or 1,000,000 gigabytes) of data. For example, sales execs may use customizable queries of real time data to understand revenue opportunities and how many sales in their region are closing to help improve prediction. Or a manufacturing process engineer can evaluate real-time data on the plant floor to identify trends and data to improve yield and reduce shipment delivery times.

IBM Smart Analytics Cloud offering for clients will similarly deliver powerful business intelligence via the scalable, private cloud. The product lets the client import data and than transform this information into insights to develop strategies and decisions. The service will offer the ability to create reports, analysis, dashboards, and scorecards to monitor business performance and measure results.

The key to good analytics is not just the accumulation of relevant data, but also the ability to manipulate and visualize the data in meaningful ways.  The ability to create custom dashboards, etc, is an important part of making analytics a useful and profitable tool.  IBM has certainly dedicated itself to expanding its service and software offerings relative to its traditional hardware business, and analytics has emerged as a huge part of that strategy.  In doing so, it should be well positioned to profit from the coming analytically-driven world.

Categories: Uncategorized
Tagged: , ,

We are all creatives now (or, at least, will be by 2013)

October 23, 2009 · Leave a Comment

SEED published an article the other day that discussed the coming impact of near total authorship.  The gist of the article is that at some point, nearly everyone will be able to publish content and that this will have profound implications for society in much the same way that near universal literacy has.

Authorship over time

Authorship over time

So what are the implications for universal authorship?  SEED mentions one and I have another that comes to mind.

The first implication is in the article: as more and more people become creators of content it will become increasingly difficult for organizations of all kinds to control their messaging and their brand.  Provide a bad customer experience and it will be on facebook, immediately broadcast to hundreds if not thousands of people.  Discriminate against someone because of their race or sexual orientation and they will tweet about it, likely prompting dozens of re-tweets which allows the issue to reach exponentially more people, who then might blog about it, and so on, and so on.  Organizations are already finding it hard to control their brand, imagine when the number of authors increases 10 fold (which, according to SEED, will now happen yearly).  Theoretically, firms and technologies that can monitor content related to an organization will be big winners.  Additionally, organizations will need to invest more heavily in their own networks and crowds to help combat negative content (whether true or false). Continue reading

Categories: Uncategorized
Tagged: , , , , , , ,

“Science these days has basically turned into a data-management problem”

October 14, 2009 · Leave a Comment

So says Professor Jimmy Lin at the University of Maryland in a recent NYT Technology article about the shortfall in “Big Data-competent” university students.  The article points out that the kind of data we are now dealing with (which will only continue to increase exponentially) requires a different perspective and experience than most currently have.  Firms that have a vested interest in workers with these skills, such as Google and I.B.M., have partnered with universities in an effort to change the frame of reference for students.

This underscores the comparative advantage of individuals with skills amenable to collecting, coding, manipulating, and visualizing Big Data in the current labor market.  Additionally, as not all data will be easily collected and coded via computer programs, services that can efficiently harness the crowd in support of Big Data will also be critically important.

A visualization of thousands of Wikipedia edits that were made by a single software bot. Each color corresponds to a different page.

A visualization of thousands of Wikipedia edits that were made by a single software bot. Each color corresponds to a different page.

Photo via Wired

Categories: Uncategorized
Tagged: , , , ,

Crowdsourcing Data Coding

September 16, 2009 · 1 Comment

I just finished watching the video below of CrowdFlower’s presentation at the TechCrunch50 conference.  CrowdFlower is a plaform that allows firms to crowdsource various tasks, such as populating a spreadsheet with email addresses or selecting stills from thousands of videos that have particular qualities.  The examples in the video include very labor intensive tasks, but tasks that a firm is not likely to either need again or feels is worth dedicating staff to.

more about “CrowdFlower, Live From TechCrunch50“, posted with vodpod

As I was watching the video I thought about the potential to leverage such a platform for large-scale coding of qualitative data. Continue reading

Categories: Uncategorized
Tagged: , , , ,

The ‘Soft Sciences’ to get their Day?

September 11, 2009 · 1 Comment

In a recent report, Garnter proposes that as corporations try to benefit from the growth of social media they will come to rely more and more on employees with formal, advanced training in the social sciences.

Stanley Milgram

Stanley Milgram

Gartner Vice President Kathy Harris discusses in some detail four areas of jobs needed in the near future. Though she never really uses the words “social networks” the implication is that most companies aren’t really geared toward taking advantage of the impact of these online communities, and that the numbers will be too large to ignore, regardless of the business you are in.

“Many of the needed technical capabilities originate in the social sciences and are aimed at usability and adoption of technology-related business services,” Harris said in a release. “These capabilities embody the notion of ‘action at the interface’ between the enterprise and its markets or between business management and technology management. Therefore, organizations are likely to shift the responsibility for leveraging technology outside centralized IT organizations and into the business units responsible for growth and innovation of revenue, products and services.”

Erving Goffman

Erving Goffman

To me, if you combine the plethora of data being generated by Web 2.0 technologies with the inherent social and behavioral aspects of these technologies, it screams for individuals that have training in sophisticated research methodologies (both quantitative and qualitative) as well as substantive subject’s that relate to sociology, psychology, and behavioral economics.  It may be creating a perfect storm where individuals with this particular skill set finally find themselves in high demand outside of the Ivory Tower.  As a trained social scientist myself, I also hope it puts to bed, once and for all, the short-sighted notion that the social sciences don’t really belong in the category of ’science’ compared to their physical cousins.

(Via Jason Spector)

Categories: Uncategorized
Tagged: , , ,

Challenges of Consuming Real-time Data

August 22, 2009 · Leave a Comment

I’ve run across quite a few stories lately discussing the 1) the revolution in data production we are living through and 2) the challenges we face in being able to sift through and view that data in a meaningful way through the web.

The first comes from GigaOM, where Jennifer Martinez looks at the emerging problem of trying to keep up with the constant flow of data via status updates.  As our networks grow, and our use of various social networks increases, we are inundated with updates which often times leads to missing particular updates that we may be most interested in.  Additional, she notes that besides missing out on information you care about, this stream overload can lead to “disjointed conversations that lack context, making it hard to piece together and decipher what it all means”.  I can relate to this problem, and my ‘immersion’ in social networks is average to above average.  I haven’t figured out an optimal way to keep up.  I try to utilize a few useful tools (e.g. Seesmic), but between social networks and Google Reader I find myself constantly playing catchup.

Michael Driscoll at Dataspora follows up on this theme providing a more high-level discussion of how the rise of data (vs. documents) conflicts with the architecture that underlies the web today.  Current mark-up languages are geared towards, and ideal for, documents (e.g. HTML and XML), not the kind of streaming data that will come to dominate content.  To explain this point he provides a comparison of metaphors where documents=trees and data=streams:
Continue reading

Categories: Uncategorized
Tagged: , ,

More on a Data-driven World: Links & Commentary

August 14, 2009 · 1 Comment

Last week I wrote about the increasing demand for analytically-skilled, sophisticated statisticians by all sorts of companies looking to take advantage of our increasingly data-driven world.  This past Wednesday, the New York Times published another piece yet again highlighting this trend:

As suggested by Daniel Pink’s assertions on the rise of a right-brained working elite, the ability to extract stories from a world of increasing and abundant data will be increasingly critical to many industries. Indeed, the opening of U.S. federal government data at data.gov (and the appointment of Sir Tim Berners-Lee to similarly open the UK’s data archives) implies a new societal and cultural importance for data wranglers. (my emphasis)

The article also included some great links for those looking to get started examining this new trend.  They include:

  1. The recently published book “Beautiful Data” brings together essays some of the world’s most cutting-edge data practitioners — such as Stamen Design — on subjects as diverse as DNA analysis, crime maps and crowdsourcing.
  2. Ben Fry’s PhD thesis “Computational Information Design,” which outlines the need for a new field based on multiple disciplines.
  3. The post “Three Sexy Skills Of Data Geeks,” which explains statistics, data munging and visualization — or studying, suffering and storytelling, as the author jokingly suggests.
  4. Blogs such as Dataspora and Flowing Data.

Some people may be asking what the big deal is.  Statisticians have been around forever and their techniques have become more sophisticated over time.  The big deal is that it isn’t just about statistics and crunching numbers.  It is about combining multiple disciplines–such as statistics and graphic design–at a time of unprecedented data accumulation so as to glean better insights through the collection, analysis, and visualization of data.  Most companies claim to have a focus on ‘analytics’, but in my experience this term and its sophistication in a business setting varies widely.  Getting the most out of data requires leadership to think deeply and strategically about what kinds of data would be most useful, what kind of measures most illuminating, and how potential insights gleaned from that data might change their go-to-market strategy as well as R&D.  This should be correlated with a serious commitment to creating the necessary infrastructure (i.e. processes, systems) for collecting, analyzing, and visualizing the relevant data.  Like most things, it is a question of whether data and analytics are viewed as simply a nice feature or critical to growing and maintaining a business.  Will and vision, not just resources, are crucial.

For those that are interested in the new frontier of data and analytics I would also recommend the following:

If anyone has additional links or recommended reading feel free to leave it in the comments section or email me.

Categories: Uncategorized
Tagged: , , ,

Profiting from an Analytically Driven World

August 7, 2009 · 1 Comment

The NY Times had a great article yesterday profiling the increasing fortunes for advanced statisticians.  As the world has become more data-driven and flush with raw numbers, the need to derive sophisticated insights from all that data has increased. Data does not speak for itself:

The new breed of statisticians tackle that problem. They use powerful computers and sophisticated mathematical models to hunt for meaningful patterns and insights in vast troves of data. The applications are as diverse as improving Internet search and online advertising, culling gene sequencing information for cancer research and analyzing sensor and location data to optimize the handling of food shipments.

With the rise in data also comes the opportunity to extract profits if one can identify the right insights and patterns.  For example, I.B.M. recently launched a new group that will focus on business analytics and optimization.  They plan to grow the group aggressively.

With this shift towards a data-driven world has come a corresponding shift in the value of certain skills.  In this case, sophisticated statisticians and the analytically-minded find themselves in a position where their skills now command both respect and high salaries.  It has also allowed people to pursue careers that weren’t necessarily available to them even just a few years ago. Two of my favorite examples are the rise of statisticians in professional athletics (e.g. the Moneyball approach to baseball) and Nate Silver who went from pioneering the Sabermetric analysis of baseball to political commentator and analyst.

While I am a fan of analytically-driven approaches I also appreciate the potential pitfalls of relying on statistics.  More than once I have heard the quip,”lies, damn lies, and statistics”.  But it’s incumbent on consumers of data to be sophisticated consumers, such that they can call out the sloppy use or, worse, intentional misrepresentation of data.

In the current world we live in, one remains ignorant of statistics at their own peril.

Categories: Uncategorized
Tagged: , , ,