Sigma Gamma

month

October 2012

1 post

The Future of Digital Video Ad delivery

Imagine placing videos through auction adjacent to niche-topic movies (like political thrillers) on Google Play in the same way we place videos adjacent to topical content through auction today. During the next presidential election that’s exactly how we’ll do it.

Read More →

Oct 11, 20120 notes

February 2012

1 post

Use of R as an embedded analytics engine  → oracle.com

day0:

Oracle has made an interesting move in leveraging the open source language R to fill a gap in its portfolio when competing against the likes of IBM (which acquired SPSS) and ‘best of breed’ players like SAS. 

Similar to SAS’s data manipulation extensions for common databases (pushing down the manipulation, filtering etc. to the database) though more like Revolution Analytic’s integration with Netezza (now part of IBM), Oracle has converted base R scripts for execution within the database. Though this is really a first release (e.g., lacking fine grained workload management), it is great move in burnishing its credentials in a space gaining prominence in organizations.

Some thoughts to ponder:

  1. Use of R eases adoption by the upcoming generation of statisticians, actuaries (generally referred to as ‘data scientists’). This is a great advantage in terms of dislodging incumbents such as SAS and SPSS and increasing its portion of software license fees. In this model, greater use of R based analytics in operations results in larger number of processors (technically cores) that are licensed from Oracle. While, it allows organizations to use open source R for desktop based modeling and development.
  2. This addresses the biggest shortcomings of R to date, which has been lack of scalability. Though there are commercial implementations such as those from Revolution Analytics, the ubiquitous nature of Oracle in enterprise computing environments makes it so much easier to scale ideations from the innovation lab into commercial reality.
  3. Though there is some integration with Oracle’s reporting suites, it is a bit of a kludge. Longer term, it will not be surprising to see CRAN suite of visualizations natively implemented in OBIEE and leverage by default when accessing R script output. Tighter integration will aid in visualization development which in today’s world mimics business presentation prep in the early 1990s - anyone remember Harvard Graphics on DOS?
  4. Commercial support for R should aid in its adoption and usage. The latest Redmonk language survey indicates it is the sixth fastest growing language (currently noted as in Tier 2 cluster along with MATLAB and Scala). It will be interesting to see if growth accelerates sufficiently for it to lead the pack when it comes to high level languages for analysis and general computation.

Next on my wish list is an IDE that can aid R scripting in a way that IntelliJ aids Java and Scala (RStudio is just very first gen, IMHO). Thoughts?

Feb 13, 20122 notes

January 2012

1 post

chartsnthings: Before, During and After: The Richest 1 Percent → chartsnthings.tumblr.com

chartsnthings:

This weekend the NYT published Shaila Dewan and Robert Gebeloff’s story about the richest 1 percent of Americans (a more diverse bunch than you’d think). The graphics department published a lot of work in print and online to accompany the article. Online, there was an interactive map that…

Jan 26, 20122,487 notes

November 2011

1 post

Organizational Data Posture

Most of what I know about databases and data management I taught myself while working with (or thinking about the problems of) political campaigns. The rest I learned in grad school while thinking about thinking about politics. Either way, most of my professional life has been spent organizing data and trying to figure out what, if anything, we could actually know from looking at data. If nothing else, I’ve figured out one thing for sure: organizing the data is inextricably linked to learning anything from it.

I have always been frustrated by the lack of careful thought about the relationship between how data is stored and modeled and its intended use. You can think of this relationship as the organizational posture of data—the state of data when you are not looking at it determines how easy it is to get down to understanding data when the time comes. Awareness of posture matters just as much for business as it does in academic research and political campaigns.

Most small organizations get their data posture wrong because getting data management right used to be expensive and difficult. Worse still, if you got it wrong, you could actually make a nightmare out of trivial access to mission-critical data. Those circumstances created some bad habits.

Now data management infrastructure has matured to the point where tools for managing and collecting transactional data are readily and cheaply available. Whether browser-based database applications, e-commerce tools, point-of-sale customer identification, email marketing services, QR-codes in printed communications, or mobile computing apps, we have available to us plentiful ways to measure the mechanisms of audience engagement.

Likewise, tools and techniques for understanding and analyzing data have reached the point where high-value insights are available for the simple cost of collecting good data and managing it well. Business Intelligence (BI) tools—built on the premise that ad hoc data analysis should be easy to implement without hand-coding and complicated programming—have broken out of the Enterprise-only realm and are accessible to organizations of even modest scale.

Inexpensive transactional data management coupled with accessible BI tools may sound like data paradise but there’s a missing piece of the puzzle: the tools for managing and collecting transactional data don’t necessarily store and present data optimally for analytical tools—and they probably shouldn’t.

 In technical jargon this is the difference between OLTP (online transactional processing) and OLAP (online analytical processing). Most small-to-medium organizations (especially those that are undercapitalized) will try to get by conducting analytical functions from within their transactional databases. This tactic is often manageable at small volumes but it does not scale well. The tipping point comes when the analytic potential of non-mission-critical data becomes valuable enough that it makes sense for the organization to start gathering and collecting it. That the analytical data is valuable is assumed; but if the organization pools that data, in what is likely the only data management tool available, with mission-critical transactional data then problems develop.

First, as live transactional data becomes a smaller and smaller proportion of the overall dataset; transactional performance suffers—requiring processing power or worse, users, to sort through irrelevant data to get to current customer information. Second, analytic power suffers because the organizational schema is built to optimize insertion of individual transactions rather than extraction and summarization of large-scale analytics.

The art of striking the right organizational data posture is to recognize when you are at the point of realizing serious value from data analytics. There are two things you want to avoid: (1) you don’t want to be playing catch-up later against competitors who got it right and (2) you don’t want to have to disentangle an intertwined mess of OLTP/OLAP databases. 

Nov 28, 20111 note

August 2011

1 post

Aug 04, 20110 notes

July 2011

1 post

Using Statistical Inference in the Practice of Politics

Most people interested in politics have at least some exposure to the use of statistical inference at a very basic level–at the very least, most would accept that it does work and know of some specialists more adept than themselves at using it. Polls are stock-in-trade of most political news coverage and so every entry-level political junkie quickly develops some facility with the jargon necessary for trading suppositions based on horse-race polling.

Of course,  a little experience lends the ever-so-slightly-seasoned political operative to learn that the real value of polling lays not in measuring where we are but in guiding resource allocation. The most important inferences (educated guesses about facts which cannot be directly observed) for campaigns to make have to do with the differential effect that issue messages or various framings of contestable issues have on persuadable voters—and the key information here is which messages move just enough voters at the lowest cost. In a campaign with scarce resources (and I’ve never seen any other kind) investing in information that helps you spend your communications budget wisely is almost always a good choice.

Micro-targeting is a deeper use of statistical inference that is becoming increasingly available to campaigns. It started with the largest (nationwide, presidential) campaigns, but the method has become more accessible to smaller (statewide, nationally-targeted congressional, and statewide-targeted legislative) campaigns. The practice is revolutionary for campaigns and works like this:

Start with a voter file…improve that voter file with commercially available data about consumer behavior (especially consumer behavior reasonably linked to political interests)…draw a large sample from the improved voter file…poll large sample on a battery of political interest questions…regress poll results on sample to derive a model that relates political interests to consumer behavior…apply that model to the broader voter file to predict electoral behavior…use predictions to constrain persuasion and mobilization communication expenditures and reduce wasted communication.

Now, the casual observer might notice that there are quite a few steps of inference and supposition in that chain of reasoning, but these are sensible when performed by someone who fully understands the statistical pitfalls.

A deeper way to apply statistical inference to improve targeting and resource allocation is to combine micro-targeting predictions (and/or traditional polling results) with tactical field campaign results to further pare down target universes or to highlight mobilization segments.

Micro-targeting is revolutionary as a first application of bringing large datasets to bear to drive inferences about electoral behavior. But the state-of-the art has already begun to pass this technique by. The obvious innovation of combining micro-targeting predictions with tactical field data suggests a more powerful methodological move we could make. Field campaign data represents the tip of the iceberg when it comes to measurable campaign activity. Generally, field efforts in well-organized campaigns collect two dimensions of data (turnout intent & candidate preference) coded on a five- to seven-point scale. The development of statewide shared voter files through browser-based data applications has fostered the spread of best-practices, improved coding, and made it possible to collect, store, and accumulate such data across election cycles.

But vote choice and vote intent (while highly instrumental variables) are only a very narrow view into political behavior. Given that we have a wide array of tools available to deliver calls-to-action across a number of media and a number of corresponding tools to measure response we actually have the capacity to generate call and response filters that can define multi-dimensional segmentation for political audiences. This gives us an opportunity—if we will take it—to walk non-voters through a succession of calls-to-action and measured iterative communication to determine which paths will lead which sub-segments to convert to voters, to become our voters, or to adopt a shared identity.

If you are interested to learn how to implement these techniques in your campaign, we’d love to hear from you.

Jul 21, 20113 notes
Next page →
2011 2012
  • January 1
  • February 1
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October 1
  • November
  • December
2011 2012
  • January
  • February
  • March
  • April
  • May
  • June
  • July 1
  • August 1
  • September
  • October
  • November 1
  • December