Thoughts on the INFORMS Business Analytics Conference

This post, from DC2 President Harlan Harris, was originally published on his blog. Harlan was on the board of WINFORMS, the local chapter of the Operations Research professional society, from 2012 until this summer. Earlier this year, I attended the INFORMS Conference on Business Analytics & Operations Research, in Boston. I was asked beforehand if I wanted to be a conference blogger, and for some reason I said I would. This meant I was able to publish posts on the conference's WordPress web site, and was also obliged to do so!

Here are the five posts that I wrote, along with an excerpt from each. Please click through to read the full pieces:

Operations Research, from the point of view of Data Science

  • more insight, less action — deliverables tend towards predictions and storytelling, versus formal optimization
  • more openness, less big iron — open source software leads to a low-cost, highly flexible approach
  • more scruffy, less neat — data science technologies often come from black-box statistical models, vs. domain-based theory
  • more velocity, smaller projects — a hundred $10K projects beats one $1M project
  • more science, less engineering — both practitioners and methods have different backgrounds
  • more hipsters, less suits — stronger connections to the tech industry than to the boardroom
  • more rockstars, less teams — one person can now (roughly) do everything, in simple cases, for better or worse

What is a “Data Product”?

DJ Patil says “a data product is a product that facilitates an end goal through the use of data.” So, it’s not just an analysis, or a recommendation to executives, or an insight that leads to an improvement to a business process. It’s a visible component of a system. LinkedIn’s People You May Know is viewed by many millions of customers, and it’s based on the complex interactions of the customers themselves.

Healthcare (and not Education) at INFORMS Analytics

[A]s a DC resident, we often hear of “Healthcare and Education” as a linked pair of industries. Both are systems focused on social good, with intertwined government, nonprofit, and for-profit entities, highly distributed management, and (reportedly) huge opportunities for improvement. Aside from MIT Leaders for Global Operations winning the Smith Prize (and a number of shoutouts to academic partners and mentors), there was not a peep from the education sector at tonight’s awards ceremony. Is education, and particularly K-12 and postsecondary education, not amenable to OR techniques or solutions?

What’s Changed at the Practice/Analytics Conference?

In 2011, almost every talk seemed to me to be from a Fortune 500 company, or a large nonprofit, or a consulting firm advising a Fortune 500 company or a large nonprofit. Entrepeneurship around analytics was barely to be seen. This year, there are at least a few talks about Hadoop and iPhone apps and more. Has the cost of deploying advanced analytics substantially dropped?

Why OR/Analytics People Need to Know About Database Technology

It’s worthwhile learning a bit about databases, even if you have no decision-making authority in your organization, and don’t feel like becoming a database administrator (good call). But by getting involved early in the data-collection process, when IT folks are sitting around a table arguing about platform questions, you can get a word in occasionally about the things that matter for analytics — collecting all the data, storing it in a way friendly to later analytics, and so forth.

All in all, I enjoyed blogging the conference, and recommend the practice to others! It's a great way to organize your thoughts and to summarize and synthesize your experiences.

Elements of an Analytics "Education"

This a guest post by Wen Phan, who will be completing a Master of Science in Business at George Washington University (GWU) School of Business.  Wen is the recipient of the GWU Business Analytics Award for Excellence and Chair of the Business Analytics Symposium, a full-day symposium on business analytics on Friday, May 30th -- all are invited to attend. Follow Wen on Twitter @wenphan.

GWU Business Analytics Symposium, 5/30/14, Marvin CenterWe have read the infamous McKinsey report. There is the estimated 140,000- to 190,000-person shortage of deep analytic talent by 2018, and an even bigger need - 1.5 million professionals - for those who can manage and consume analytical content. Justin Timberlake brought sexy back in 2006, but it’ll be the data scientist that will bring sexy to the 21st century. While data scientists are arguably the poster child of this most recent data hype, savvy data professionals are really required across many levels and functions of an organization. Consequently, a number of new and specialized advanced degree programs in data and analytics have emerged over the past several years – many of which are not housed in the traditional analytical departments, such as statistics, computer science or math. These programs are becoming increasingly competitive and graduates of these programs are skilled and in demand. For many just completing their undergraduate degrees or with just a few years of experience, these data degrees have become a viable option in developing skills and connections for a burgeoning industry. For others with several years of experience in adjacent fields, such as myself, such educational opportunities provide a way to help with career transitions and advancement.

I came back to school after having worked for a little over a decade. My undergraduate degree is in electrical engineering and at one point in my career, I worked on some of the most advanced microchips in the world. But I also have experience in operations, software engineering, product management, and marketing. Through it all, I have learned about the art and science of designing and delivering technology and products from ground zero - both from technical and business perspectives. My decision to leave a comfortable, well-paid job to return to school was made in order to leverage my technical and business experience in new ways and gain new skills and experiences to increase my ability to make an impact in organizations.

There are many opinions regarding what is important in an analytics education and just as many options to pursuing them, each with their own merits. Given that, I do believe there are a few competencies that should be developed no matter what educational path one takes, whether it is graduate school, MOOCs, or self-learning. What I offer here are some personal thoughts on these considerations based on my own background, previous professional experiences, and recent educational endeavor with analytics and, more broadly, using technology and problem solving to advance organizational goals.

Not just stats.

For many, analytics is about statistics and a data degree is just slightly different from a statistics one. There is no doubt that statistics plays a major role in analytics, but it is still just one of the technical skills. If you are a serious direct handler of data of any kind, it will be obvious that programming chops are almost a must. For more customized and sophisticated processing, even substantial computer science knowledge – data structures, algorithms, and design patterns – will be required. Of course, even this idea has been pretty mainstream and is nicely captured by Drew Conway’s Data Science Venn Diagram. Other areas not as obvious to data competency are that of data storage theory and implementation (e.g. relational databases and data warehouses), operations research, and decision analysis. The computer science and statistics portions really focus on the sexy predictive modeling aspects of data. That said, knowing how to effectively collect and store data upstream is tremendously valuable. After all, it is often the case that data extends beyond just one analysis or model. Data begets more data (e.g. data gravity). Many of the underlying statistical methods, such as maximum likelihood estimation (MLE), neural networks and support vector machines, all rely on principles and techniques of operations research. Further, operations research, also called optimization, offers a prescriptive perspective on analytics. Last, it is obvious that analytics can help identify trends, understand customers, and forecast the future. However, in and of themselves those activities do not add any value; it is the decisions and resulting actions taken on those activities that deliver value. But, often, these decisions must be made in the face of substantial uncertainty and risk - hence the importance of critical decision analysis. The level of expertise required in various technical domains must align with your professional goals, but a basic knowledge of the above should allow you adequate fluency across analytics activities.


I consider analytics an applied degree similar to how engineering is an applied degree. Engineering applies math and science to solve problems. Analytics is similar this way. One importance of applied fields is that they are where the rubber of theory needs to meet the road of reality. Data is not always normally distributed. In fact data is not always valid or even consistent. Formal education offers rigor in developing strong foundational knowledge and skills. However, just as important are the skills to deal with reality. It is no myth that 80% of analytics is just about pre-processing the data; I call it dealing with reality. It is important to understand the theory behind the models, and frankly, it’s pretty fun to indulge in the intricacies of machine learning and convex optimization. In the end though, those things have been made relatively straightforward to implement with computers. What hasn’t (yet) been nicely encapsulated in computer software is the judgment and skill required to handle the ugliness of real-world data. You know what else is reality? Teammates, communication, and project management constraints. All this is to say that so much of an analytics education includes other areas that are not the theory, and I would argue that the success of many analytics endeavors are limited not by the theoretical knowledge, but rather by the practicalities of implementation whether with data, machines, or people. My personal recommendation to aspiring or budding data geeks is to cut your teeth as much as possible in dealing with reality. Do projects. As many of them as possible. With real data. And real stakeholders. And, for those of you manager types, give it a try; it’ll give you the empathy and perspective to effectively work with the hardcore data scientists and manage the analytics process.

Working with complexity and ambiguity.

The funny thing about data is that you have problems both when you have too little and too much of it. With too little data, you are often making inferences and assessing the confidence of those inferences. With too much data, you are trying not to get confused. In the best case scenarios, your objectives in mining the data are straightforward and crystal clear. However, that is often not the case and exploration is required. Navigating this process of exploration and value discovery can be complex and ambiguous. There are the questions of “where do I start?” and “how far do I go?” This really speaks to the art of working with data. You pick up best practices along the way and develop some of your own. Initial exploration tactics may be as simple as profiling all attributes and computing correlations among a few of thing, seeing if anything looks promising or sticks. This process is further exacerbated with “big data”, where computational time is non-negligible and limits feedback delays during any kind of exploratory data analysis.

You can search the web for all kinds of advice on skills to develop for a data career. The few tidbits I include above are just my perspectives on some of the higher order bits in developing solid data skills. Advanced degree programs offer compelling environments to build these skills and gain exposure in an efficient way, including a professional network, resources, and opportunities. However, it is not the only way. As with all professional endeavors, one needs to assess his or her goals, background, and situation to ultimately determine the educational path that makes sense.


[1] James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, Angela Hung Byers. “Big Data: The Next Frontier for Innovation, Competition, and Productivity.” McKinsey Global Institute. June 2011.

[2] Thomas H. Davenport, D.J. Patil . “Data Scientist: The Sexiest Job of the 21st Century.” Harvard Business Review. October 2012.

[3] Quentin Hardy. "What Are the Odds That Stats Would Be This Popular?" The New York Times. January 26, 2012. 

[4] Patrick Thibodeau. “Career alert: A Master of analytics degree is the ticket – if you can get into class”. Computerworld. April 24, 2014. 

[5] Drew Conway. “The Data Science Venn Diagram.” 

[6] Kristin P. Bennett, Emilio Parrado-Hernandez. “The Interplay of Optimization and Machine Learning Research.” Journal of Machine Learning Research 7. 2006. 

[7] Mousumi Ghosh. “7 Key Skills of Effective Data Scientists.” Data Science Central. March 14, 2014.

[8] Anmol Rajpurohit. “Is Data Scientist the right career path for you? Candid advice.” KDnuggets. March 27, 2014. 


Weekly Round-Up: Hadoop, Big Data vs. Analytics, Process Management, and Palantir

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from Hadoop to business process management. In this week's round-up:

  • To Hadoop or Not to Hadoop?
  • What’s the Difference Between Big Data and Business Analytics?
  • What Big Data Means to BPM
  • How A Deviant Philosopher Built Palantir

To Hadoop or Not to Hadoop?

Our first piece this week is an interesting blog post about what sorts of data operations Hadoop is and isn't good for. The post can serve as a useful guide when trying to figure out whether or not you should use Hadoop to do what you're thinking of doing with your data. It is organized into 5 categories of things you should consider and contains a series of questions you can ask yourself for each of the categories to help with your decision-making.

What’s the Difference Between Big Data and Business Analytics?

This is an excellent post on Cathy O'Neil's Mathbabe blog about how she distinguishes big data from business analytics. Cathy argues that what most people consider big data is really business analytics (on arguably large data sets) and that big data, in her opinion, consists of automated intelligent systems that algorithmically know what to do and need very little human interference. She goes into more detail about the differences between, including some examples to drive home her point.

What Big Data Means to BPM

Continuing on the subject of intelligent systems performing business processes, our third piece this week is a Data Informed article about big data's effect on business process management. The article is an interview with Nathaniel Palmer, a BPM veteran practitioner and author. In the interview, Palmer answers questions about what kinds of trends are emerging in business process management, how big data is affecting its practices, and what changes are being brought about because of it.

How A Deviant Philosopher Built Palantir

Our last piece this week is a Forbes article about Palantir, an analytics software company that works with federal intelligence agencies and is funded by In-Q-Tel - the CIA's investment fund. The article describes the company's CEO, what the company does, who it does for, and delves into some of Palantir's history. Overall, the article provides an interesting look at a very interesting company.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: WibiData, Big Data Trends, Analytics Processes, and Human Trafficking

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from Big Data trends to using data to fight human trafficking. In this week's round-up:

  • WibiData Gets $15M to Help It Become the Hadoop Application Company
  • 7 Big Data Trends That Will Impact Your Business
  • Want Better Analytics? Fix Your Processes
  • How Big Data is Being Used to Target Human Trafficking

WibiData Gets $15M to Help It Become the Hadoop Application Company

It was announced this week that Cloudera co-founder Christophe Bisciglia's new company, WibiData, has raised $15 million in a Series B round of financing. WibiData is looking to become a dominant player in the market by selling software that lets companies build consumer-facing applications on Hadoop. This article has additional details about the company and what they are trying to do.

7 Big Data Trends That Will Impact Your Business

We're all interested in seeing what the future of data science and Big Data have in store, and this article identifies 7 trends that the author thinks will continue to develop in the years ahead. Some general themes of the trends listed include predictions about platforms, structure, and programming languages.

Want Better Analytics? Fix Your Processes

In order to succeed in running a data-driven organization, you must have the proper analytical business processes in place so that any insights derived from your efforts can be applied to improving operations. In this article, the author proposes 5 principles to ensure analytics are used correctly and deliver the results the organization wants.

How Big Data is Being Used to Target Human Trafficking

Our last article this week is a piece about how Google announced recently that it will be partnering with other organizations in an effort to leverage data analytics in helping to fight human trafficking. Part of the effort will include aggregation of previously dispersed data and another part will consist of developing algorithms to identify patterns and better predict trafficking trends. This article lists additional details about the project.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Big Data Value, Education, Social Data Analysis, and Saving the Planet

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from Big Data's impact on education to using data to reduce global violence. In this week's round-up:

  • The Value of Big Data Isn't the Data
  • Big Data Will Revolutionize Learning
  • Data Analysis Should Be a Social Event
  • Using Big Data to Save the Planet

The Value of Big Data Isn't the Data

This is an Harvard Business Review blog post by CTO of Narrative Science and Northwestern faculty member, Kris Hammond about where he believes the value is in Big Data. Hammond proposes that the value is in getting machines to conduct the data analysis we need conducted and communicating their findings in an intuitive way. In the post, he describes in more detail why he believes this is so valuable and provides explanations and diagrams outlining the steps that can be taken in order to put these processes in place.

Big Data Will Revolutionize Learning

This interesting Smart Data Collective article is about how technology now allows us to capture information about virtually everything that happens in education and what this means for the future of education. Some of these things include customizing content for individual students, reducing drop-out rates, and enhancing the overall learning experience - all resulting in improved student outcomes. The articles talks a little about each of these and describes how they are, and will continue to be, implemented.

Data Analysis Should Be a Social Event

This is another interesting HBR article advocating a more social approach to solving data analysis problems. The authors urge us to use an approach familiar to those that have attended data-dives or hackathons before - get a group of people with various different perspectives together to brainstorm and come up with ideas about how to best solve the problem you're trying to solve. The article points out that this approach doesn't just work well at hackathons, it has also been implemented with great success at companies.

Using Big Data to Save the Planet

Our final article this week is a Slashdot piece about how the U.S. State Department is partnering with groups from around the world and using data analytics to help reduce violence in countries where it is a major problem. According to the article, they are using an analytics tool named Senturion to track data that can be obtained from social networks, economic data, and other sources to provide output that can help determine what types of resources are necessary on the ground in those troubled countries. The article mentions some of the countries where this analytics system is helping to identify conflict trends and also provides some examples of specific initiatives it is providing assistance with.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Guys Vs Girls - Check out the New Data Blog from Local Startup Hinge

Today we are reblogging (with permission of course) a new local data blog, Hingesights  (, from local dating startup Hinge. In their own words:

Hinge is a better way to meet dates. Simply rate your interest in your friends' Facebook friends, then Hinge lets you know when you're both interested. hingelogohighresIt’s a fun way to see who’s out there, and to connect with the friends of friends you may never have met otherwise. Matches always share a social connection, so you’re always meeting through friends.

 This week we look at the difference in how men and women use Hinge, and how it contributes to the rarely understood mating rituals of one of the world’s most mysterious and interesting specimen: the Single Washingtonian.

Hingesights: Guys vs. Girls!

Let the battle of the sexes begin! Our nerds have been at it again, and it turns out guys and girls do not play Hinge the same way. Ladder theorists and evolutionary biologists step aside - Hinge is here to shed a little light on the great mysteries of courtship.

Who makes the cut? It’s common knowledge that girls are pickier than guys. If you want proof, we suggest taking a girl and a guy to a restaurant, and see which one asks to completely restructure their salad. The big question is, how much pickier is the fairer sex? All in all, girls favorite only 16% of their daily potential matches. Remember, ladies-- you don’t have to hoard your favorites. They’re unlimited. It doesn’t mean the guy is your soulmate, just that you’d be open to starting a conversation.

On the other side, guys favorite a solid 34% of their daily potentials. Chivalry, perhaps? Equal opportunity daters? Maybe it’s video game tendencies, and their thumbs just instinctively favorite girls because of the relative location of the buttons on an X-Box controller.

Whatever the reason, our data confirms that girls are pretty darn choosy with their potential dates, and guys are a bit more “open-minded.” Or whatever word you’d like to use there.

The Clooney Effect Another trend we noticed is that as age increases, the likelihood of favoriting potentials slightly increases for women, but actually decreases for men. Are men just losing their motivation, or are they suddenly slammed with dating options as they enter their Clooney years? Either way, we're certainly glad to see women pursuing a solid dating life, regardless of age. Get it, girls.

Our closing takeaways? Both sides need to keep saving favorites! It’s good for you, regardless of where you are on the ladder of life. And ladies of Hinge: live a little! You never know-- your next spontaneous favorite could be your next great date.


Permalink to this blog post: 

Big Data Week 2013

There's a lot more to the data, statistics, and analytics community than "Big Data." I've argued that focusing on the scale of certain modern data sets can distract from key innovations in statistical and machine learning techniques, visualization and exploration tools, and predictive applications that have revolutionized all of our work in recent years. But there's no question that the ability to collect, manage, explore, and productize terabyte and petabyte-scale data sets has been revolutionary for those industries that actually have that quantity of information, and has driven broad interest in the value of data and statistical modeling. So, we at Data Community DC are very pleased to be the local organizing partners for Big Data Week 2013. We're part of Big Data WeekBig Data Week is a global, loosely-coordinated festival of events, April 22nd-28th, organized by folks in London, and with participants in at least 18 cities, including Washington, DC. Locally, we at DC2 decided to use this opportunity as a great excuse to work more closely with other planners of data-related events in the region. In addition to events run by DC2-affiliated Meetups (Data Business DC, Data Science DC Data Science MD, Data Visualization DC, and R Users DC), we're very pleased to be coordinating with Big Data DC, Open Analytics DC, WINFORMS, and INFORMS MD. Together, we'll be bringing you about ten events over eleven days, all themed in some way around big data!

Here's what you need to know:

The Weird Dynamics of Viral Marketing in a Growing Market

This is the third part of a four part series of blog posts on viral marketing. In part 1, I discuss the faulty assumptions in the current models of viral marketing. In part 2, I present a better mathematical model of viral marketing. In part 4, I’ll discuss the effects of returning customers.

If the market is static, strong viral sharing can lead to rapid early growth, but once a peak is reached, the number of customers falls to zero unless your product has 100% retention. So how can a business use viral marketing to grow their customer base for the long term? Which factors (i.e. sharing rate, churn, market size, market growth) matter? In this blog post, I’ll adapt the mathematical model of viral marketing from part 2 of this series to examine how changing market size affects viral growth.

What is “the Market”:

What comprises “the market” depends on the nature of the product. If it is an iPhone game, then the market is people who own iPhones and play games on them. If it is a YouTube video of a pug climbing the stairs like a boss, then the market is people with internet connected devices who find that kind of video humorous. For the first example, entering the market could mean that you’ve bought your first iPhone, or started playing games on it. Leaving the market could mean that you’ve stopped playing games or swapped your iPhone for a different type of device. Note that leaving the market is different from becoming a former customer. In the case of an iPhone game, becoming a former customer means that you’ve stopped using the game and perhaps removed it from your phone -- you are still part of the market for iPhone games.

The Model:

If new potential customers can be added to the market and members of any subpopulation can leave the market, the parameters are now:

  • \(\beta\) - The infection rate (sharing rate)
  • [latex]\gamma[/latex] - The recovery rate (churn rate)
  • [latex]\alpha[/latex] -  Birth rate (rate that potential customers are entering the market)
  • [latex]\mu[/latex] - Death rate (rate that people are leaving the market)

In part 2, I described how the population transitions from potential customers to current customers to former customers. Here, I’ll add terms to the differential equations to model how people are entering and leaving the market. Note that the total market size, [latex]N = S + I + R[/latex], is no longer constant, but will grow or shrink with time.

Assume that the population entering the market joins as part of the ‘potential customer’ population. The number of new potential customers joining the market per unit time is [latex]\alpha (S + I + R)[/latex]. The populations leaving the market can come from any of the three subpopulations. The numbers of people leaving the potential customer, current customer, and former customer groups per unit time are [latex]\mu S[/latex], [latex]\mu I[/latex], and [latex]\mu R[/latex], respectively.

The equations become:

  • [latex]dS/dt = -\beta SI + \alpha (S + I + R) - \mu S[/latex]
  • [latex]dI/dt = \beta SI - \gamma I - \mu I[/latex]
  • [latex]dR/dt = \gamma I - \mu R[/latex]

Examining the Equations:

Since [latex]N = S + I + R[/latex],

[latex]dN/dt = (\alpha - \mu )N[/latex]

which is an equation that we can solve. Thus

[latex]N(t) = N(0) * e^{(\alpha - \mu ) t}[/latex].

The market grows exponentially if [latex]\alpha > \mu[/latex] and shrinks exponentially if [latex]\alpha < \mu[/latex]. If [latex]\alpha = \mu[/latex], the total market size stays the same with people entering and leaving the market at equal rates.

We can learn some more about the dynamics by examining where [latex]dS/dt[/latex] and [latex]dI/dt[/latex] are zero (that is, where the potential customer base and current customer base don't change):

[latex]dI/dt = 0[/latex] if [latex]S = \frac{\gamma + \mu}{β}[/latex] or [latex]I = 0[/latex] [latex]dS/dt = 0[/latex] if [latex]S = \frac{\alpha N(t)}{\beta I + \mu}[/latex]

Plotting these lines in the [latex]S[/latex] vs [latex]I[/latex] plane divides it into up to four regions, depending on the relative values of the parameters:

The blue lines represent where [latex]dI/dt = 0[/latex]. The green line represents where [latex]dS/dt = 0[/latex]. Note that since [latex]dS/dt[/latex] is proportional to [latex]N(t)[/latex], the green line will rise or lower with time depending on whether the market is growing or shrinking. The red arrows indicate the general direction the [latex]S-I[/latex] trajectory will be moving in while it is in each region. This suggests that if

[latex]S(0) > \frac{\gamma + \mu}{\beta}[/latex]

then the number of number of customers grows, initially. In real terms, that means that initial growth depends on having a large enough sharing rate and number of current customers compared to the churn and "death" rates.


[latex]\frac{\alpha N(0)}{\mu} > \frac{\gamma + \mu}{β}[/latex]

that is, the market size, growth rate, and sharing rate are large enough, then the number of customers may fluctuate, cyclically. In either case, the number of customers asymptotically approaches the point where both [latex]dI/dt[/latex] and [latex]dS/dt[/latex] are zero, [latex]\frac{α}{γ + μ}N(t)[/latex] or

[latex]\frac{α}{γ + μ}N(0) * e^{(α - μ) t}[/latex].

Notice that in this case, the long term behavior does not depend on the viral sharing rate, [latex]β[/latex]!

Instead, how the number of customers grows or shrinks long term depends entirely on whether the market is growing or shrinking.

If [latex]S(0) < \frac{γ + μ}{β}[/latex]  then the number of customers will only grow if the number of potential customers grows larger than [latex]\frac{γ + μ}{β}[/latex] before the number of current customers drops to zero.

In less mathematical terms, what this all means is that, if your customer base doesn't die out in the beginning, your customer base will grow exponentially, as long as you have a growing market. How fast your customer base grows in the long term depends on the growth of the market and the churn rate, but not on the viral sharing rate!


As in Part 2, numerically integrating the differential equations allows for visualizing the effect each parameter has on viral growth.

First, compare different values of the sharing rate, β. With the parameters:

  • [latex]N(0)[/latex] = 1 million people in the market
  • [latex]γ[/latex] = 70% of customers lost per day
  • [latex]I(0)[/latex] = 10 current customers
  • [latex]α[/latex] = 14 new people in the market, per 1000 people
  • [latex]μ[/latex] = 8 people lost from the market, per 1000 people

compare [latex]βN(0)[/latex] = 5, 1, 0.8, 0.5 invites per customer per day. The condition for initial growth is:


[latex]\frac{γ + μ}{β}[/latex]









For higher values of the sharing rate, [latex]β[/latex], a higher initial peak in the number of customers is reached and the ups and downs in the number of current customers end sooner, but the values for which the growth condition is met all follow the same pattern of growth after several months. That is, they all asymptotically approach [latex]\frac{α}{γ + μ}N(0) * e^{(α - μ) t}[/latex].

Now consider the effects of varying churn:

  • [latex]N(0)[/latex] = 1 million people in the market
  • [latex]βN(0)[/latex] = 0.8 invites per customer per day
  • [latex]I(0)[/latex] = 10 current customers
  • [latex]α[/latex] = 14 new people in the market, per 1000 people
  • [latex]μ[/latex] = 8 people lost from the market, per 1000 people

compare [latex]γ[/latex] = 90%, 70%, 40%, or 10% of customers lost per day. The condition for growth is


[latex]\frac{γ + μ}{β}[/latex]









Notice how the change in churn rate, [latex]γ[/latex], affects both the height of the initial peak and the long term growth. Lower levels of churn also lead to less of a roller coaster ride. For the case of 90% churn, the condition for growth, [latex]S(t) > \frac{γ + μ}{β}[/latex], is not met at [latex]t=0[/latex], but because of the growth of the market, [latex]S(t)[/latex] crosses the threshold before [latex]I(t)[/latex] goes to zero and viral growth is still achieved.

Finally, consider the effects of varying the growth rate of the market:

  • [latex]N(0)[/latex] = 1 million people in the market
  • [latex]βN(0)[/latex] = 2 invites per customer per day
  • [latex]I(0)[/latex]=10 current customers
  • [latex]γ[/latex] = 50% customers lost per day
  • [latex]μ[/latex] = 8 people lost from the market, per 1000 people

compare [latex]α[/latex] = 4, 8, 14, 16 new people in the market, per 1000 people. The size of the initial peak is barely affected by the market growth rate, [latex]α[/latex], but long term behavior is affected greatly. For a “birth” rate that is lower than the “death” rate, even with strong viral sharing, the customer base drops to zero within about a month. For a [latex]α = μ[/latex], the customer base reaches a steady value within a year. Growing markets lead to growing customer bases, with small changes in market growth rates having a large effect in the long term.

Conclusion: All this fancy math and book learning brings us to a conclusion that is nothing new: a large and growing market is still one of the most important factors in growing a customer base for the long term. Keeping more of the customers you have also has a strong effect on long term viral growth. However the effects of a large viral sharing rate are only seen in the short term. Strong viral growth is important if your goal is to reach a large number of people quickly, as with a viral advertising campaign. But if your goal is to grow a customer base for a product for the long term, any amount of viral sharing can lead to long term growth as long as the market is growing and the churn in the customer base is low enough.

In Part 4, I'll look at the effect of returning customers.

TLDR: Differences in viral sharing rates only matter in the short term. For long term growth, the most important factors are low churn and a growing market.

Image Credit.

Mid Maryland Data Science Kickoff Event Review

On Tuesday, January 29th, nearly 90 academics, professionals, and data science enthusiasts gathered at JHU APL for the kick-off meetup of the new Mid-Maryland Data Science group. With samosas on their plates and sodas in hand, members filled the air with conversations about their work and interests. After their meal, members were ushered into the main auditorium and the presenters took their place at the front. PANO_20130129_183408


Greetings and Mission

by Jason Barbour & Matt Motyka

Jason and Matt kicked off the talks with an introduction of the group. Motivated by both growth of data science and the vast opportunities being made available by powerful free tools and open access to data, they described their interest in creating a local group that help grow  Maryland data science community. Being software developers with analytic experience, Jason and Matt next described their seven keys to a success analytic: infrastructure, people, data, model, and presentation. Lastly, metrics about the interests and experience of the members was presented.

The Rise of Data Products

by Sean Murphy

With excitement and passion, Sean took the stage to show how now is the Gold Rush for data products. Laying out the definition of a data product, and cycling through several well known examples, Sean explained how these products are able to bring social, financial, or environmental value through the combination of data and algorithms. Consumers want data, and the tools and infrastructure needed to supply this demand are available either freely or extremely low cost. Data scientists are now able to harness this stack of tools to provide the data products that consumers crave. As Sean succinctly stated, it is a great time time to work with data.

The article version of the talk can be found here.

The Variety of Data Scientists

by Harlan Harris

Being a full-fledged data science, Harlan followed up Sean by presenting his research into what the name “data scientists” really means. Using the results of a data scientist survey, Harlan listed several skill groupings that provide a shorthand for the variety of skills that data scientists possess: programming, stats, math, business, and machine learning/big data. Next Harlan, discussed that the diverse backgrounds of data scientists can be more accurately categorized into four types: data businessperson, data creative, data researcher, and data engineer. With this breakdown, Harlan demonstrated that the data scientists community is actually composed of individuals with a variety of interests and skills.

Cloudera Impala - Closing the near real time gap in BIGDATA

by Wayne Wheeles

A true cyber security evangelist, Wayne Wheeles presented how Cloudera’s Impala, was able to make near real time security analysis a reality. With his years of experience in the field of cyber security, and his prior work utilizing big data technologies, Wayne was given unique access to Cloudera’s latest tool. Through his testing and analysis, he concluded that the Impala tool offered a significant improvement in performance and could become a vital tool in cyber security.

After the last presentation, more than a dozen members joined joined us at nearby Looney’s Pub to end the night with a few beers and snacks. To everyone's surprise, Donald Miner of EMC Greenplum offered to pick-up the tab! You can follow him on Twitter or LinkedIn from this page.

If you missed this first event, don't worry as the next one is coming up on March 14th in Baltimore. Check it out here.


Weekly Round-Up: Long Data, Data Storytelling, Operational Intelligence, and Visualizing Inaugural Speeches

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from data storytelling to visualizing presidential inauguration speeches. In this week's round-up:

  • Forget Big Data, Think Long Data
  • Telling a Story with Data
  • Is Operational Intelligence the End Game for Big Data?
  • US Presidents Inaugural Speech Text Network Analysis

Forget Big Data, Think Long Data

While Big Data is all the rage these days, this Wired article trumpets the merits of Long Data - data sets that have massive historical sweeps. It points out that Big Data is usually from the present or from the recent past and so you often don't get the same perspective as you do from data sets that span very long timelines. These data sets let you observe how events unfold over time, which can provide valuable insights. The article goes on to describe more differences between Big and Long Data and cites examples of some of the ways Long Data is used today.

Telling a Story with Data

Deloitte University Press published an interesting post this week about how to tell a story with data. The post argues that unless decision-makers understand the data and its implications, they may not change their behavior and adopt analytical approaches while making decisions. This is where data storytelling - the art of communicating the insights that can be drawn from the data - comes in. The post goes on to describe some good and bad examples of this and also provides some useful guidelines for it.

Is Operational Intelligence the End Game for Big Data?

This is a post on the Inside Analysis blog that talks about how Business Intelligence is beginning to be taken to the next level, and how that level is Operational Intelligence. With the advancement of data science and big data technologies, organizations are starting to be able to take a deeper look into their data, draw insights that weren't visible previously, and start using predictive analytics to forecast more accurately. The post goes on to talk about Operational Intelligence and how these new insights can be transferred to a user or system that can make the appropriate business decisions and take the required actions.

US Presidents Inaugural Speech Text Network Analysis

This is a post from Nodus Labs showing off some of the interesting work they've done creating network visualizations out of US presidential inaugural addresses. The post describes their methodology, includes a video explaining the networks, and they even embedded some examples that you can play around with and explore!

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups