Consulting

Keep it Simple: How to Build a Successful Business Intelligence/Data Warehouse Architecture

Is your data warehouse architecture starting to look like a Rube Goldberg machine held together with duct tape instead of the elegant solution enabling data driven decision making that you envisioned? If your organization is anything like the ones I’ve worked in, then I suspect it might. Many businesses say they recognize that data is an asset, but when it comes to implementing solutions, the focus on providing business value is quickly lost as technical complexities pile up.

duct_tape

How can you recognize if your data warehouse is getting too complicated?

Does it have multiple layers that capture the same data in just a slightly different way? An organization I worked with determined that they needed 4 database layers (staging, long term staging, enterprise data warehouse, and data marts) with significant amounts of duplication. The duplication resulted from each layer not having a clear purpose, but even with more clarity on purpose, this architecture makes adding, changing and maintaining data harder at every turn.

Are you using technologies just because you have used them in the past? Or thought they would be cool to try out? An organization I worked with implemented a fantastically simple data warehouse star schema (http://en.wikipedia.org/wiki/Star_schema) with well under 500 GB of data. Unfortunately, they decided to complicate the data warehouse by adding a semantic layer to support a BI tool and an OLAP cube (which is some ways was a second semantic layer to support BI tools). There is nothing wrong with semantic layers or OLAP cubes. In fact, there are many valid reasons to use them. But, if you do not have said valid reason, they become just another piece of the data architecture that requires maintenance. Has someone asked for data that “should” be easy to get, but instead will take weeks of dedicated effort to pull together? I frequently encounter requests that sound simple, but the number of underlying systems involved and the lack of consistent data integration practices expands the scope exponentially.

Before I bring up too many bad memories of technical complexities taking over a BI/DW project, I want to get into what to do to avoid making things overcomplicated. The most important thing is to find a way that works for your organization to stay focused on business value.

If you find yourself thinking…

“The data is in staging, but we need to transform it into the operational data store, enterprise data warehouse and update 5 data marts before anyone can access the data.”

or

“I am going to try to because I want to learn more about it.”

or

“I keep having to pull together the customer data and it takes 2 weeks just to get an approved list of all customers.”

Stop, drop and roll, oh wait, you’re not technically on fire, so just stopping should do. Take some time to consider how to reset so that the focus is on providing business value. You might try using an approach such as the 5 Whys which was developed for root cause analysis by Sakichi Toyoda for Toyota . It forces reflection on a specific problem and helps you drill down into the specific cause. Why not try it out to see if you can find the root cause of complexity in a BI/DW project? It might just help you reduce or eliminate complexities when there is no good reason for the complexity in the first place.

Another suggestion is to identify areas of complexity from a technical perspective, but don’t stop there. The crucial next step is to determine how the complex technical environment impacts business users. For example, a technical team identifies two complex ETL processes for loading sales and HR data. Both include one off logic and processes that make it difficult to discern what is going on so it takes hours to troubleshoot issues that arise. In addition, the performance of both ETL processes has significantly degraded. The business users don’t really care about all that, but they have been complaining more and more about delays in getting the latest sales figures. When you connect the growing complexity to the delays in getting important data, the business users can productively contribute to a discussion on priority and business value. In this case, sales data would take clear precedence over HR data. Both can be added to a backlog, along with any other areas of complexity identified, and addressed in priority order.

Neither of these is a quick fix, but even slowly chipping away at overly complex areas will yield immediate benefits. Each simplification makes understanding, maintaining and extending the existing system easier.

Bio

Sara_Handel Sara Handel, Co-founder of DC Business Intelligentsia, Business Intelligence Lead at Excella Consulting (www.excella.com) - I love working on projects that transform data into clear, meaningful information that can help business leaders shape their strategies and make better decisions. My systems engineering education coupled with my expertise in Business Intelligence (BI) allows me to help clients find ways to maximize their systems' data capabilities from requirements through long term maintenance. I co-founded the DC Business Intelligentsia meetup to foster a community of people interested in contributing to the better use of data.

Political Tech: Predicting 2016 Headlines

Mark Stephenson is a Founding Partner at Cardinal Insights, a data analysis, modeling and strategy firm.  Cardinal Insights provides accessible and powerful data targeting tools to Republican campaigns and causes of all sizes.  Twitter:  @markjstephenson  http://www.CardinalInsights.com

The reliance on data in politics comes as no surprise to those who watch trends in technology.  Business and corporate entities have been making major investments in data analysis, warehousing and processing for decades, as have both major political parties.  As the strategic, tactical and demographic winds shift for political operatives, so too has the need to become more effective at building high quality datasets with robust analysis efforts.

Recent efforts by both Republican and Democrat organizations to outpace each other in the analytical race to the top have been well documented by the press[1].  With the 2016 Presidential election cycle already underway (yes...really), I decided to make some headline predictions for what we will see after our next President is elected, as it relates to data, technology and organizational shifts over the next three years.

 

"Data Crunchers Analyze Their Way Into the White House"

 

A similar version of the headlines we saw in 2012, the growth in the reliance and seniority of data science staff will continue.  Senior members of both party's Committee and Presidential campaign staff will be technologists (this is already happening), and data science will be integrated in all aspects of those campaigns (ie. fundraising, political, digital, etc.).

 

"Digital Targeting Dominates the Targeting Playbook"

 

Studies continue to show shifts in how voters consume advertising content, including political messaging.  Television still remains a core tactical tool, but voters of all ages are increasingly unplugged from traditional methods[2].  One-to-one data and digital targeting will grow in the scope and budget it receives and both vendors and campaigns will shift tactical applications to respond to demands.

 

"Scaled Data: State Races Take Advantage of National Tools"

 

In 2016, not only will national, big budget races use data and analytics to glean insights, but these tools will scale to lower level, state-based campaigns.  Along with more widely available, cheap (even free) technology, companies like Cardinal Insights and efforts like "Project Ivy"[3] are turning what used to be expensive and time-consuming data analysis into scalable, accessible products.  These will have lasting efforts on the profile of many state House and Senate legislatures and as a result, state and local political outcomes.

 

"Business Takes Notice: Political Data Wizards Shift Corporate Efforts"

 

Just as many political operatives took skills learned and applied during the 2012 election and focused them on entrepreneurship, the same will happen to a higher degree after 2016.  Innovators in the political data, digital and television spaces will prove the effectiveness of these new tools and as a result, corporate marketing and advertising will seek them out.

 

"Shift from "The Gut" to "The Numbers" for Decision Making and Targeting"

 

Many decisions made by political operatives in the past were made from the gut:  their intuition told them that a certain choice was the right one, not necessarily a proven method, backed by data.  In 2016, there will continue to be a dynamic shift towards data-driven efforts throughout campaigns, with an emphasis on testing, metrics and fact-based decision making.  This will permeate all divisions of campaigns, from fundraising to operations to political decisions.

Just as companies like Amazon, Coca Cola and Ford build massive data and analysis infrastructures to capitalize on sales opportunities, political campaigns will do the same to capitalize on persuading voters.  As trends in data analysis, targeting, statistical modeling and technology continue to reveal themselves, you will read many headlines in late November 2016 that are similar to the ones above.  Keep an eye on the press to see what campaigns do in 2014 and watch the growth of a booming analytical industry continue to distill itself throughout American politics.

 


[1] http://www.washingtonpost.com/blogs/the-switch/wp/2014/03/13/how-the-gops-new-campaign-tech-helped-win-a-tight-race-in-florida/; http://www.motherjones.com/politics/2012/10/harper-reed-obama-campaign-microtargeting

[2] http://www.targetedvictory.com/2014/02/21/grid-national-survey/

[3] http://swampland.time.com/2014/02/24/project-ivy-democrats-taking-obama-technology-down-ballot/

Validation: The Market vs Peer Review

FixMyPineapple2How do we know our methods are valid? In academia your customers are primarily other academics, and in business they're whoever has money to pay.  However, it's a fallacy to think academics don't have to answer outside of their ivory tower or that businesses need not be concerned with expert opinion.  Academics need to pay their mortgages and buy milk, and businesses will find their products or services aren't selling if they're clearly ignoring well established principles.  Conversely, businesses don't need to push the boundaries of science for a product to be useful and academics don't need to increase market share to push the boundaries of science. So how do we strike a balance?  When do we need to seek these different forms of validation?

Academics Still Need Food

Some of us wish things were so simple, that academics need not worry about money nor businesses about peer review, but everything is not so well delineated.  Eventually the most eccentric scientist has to buy food, and every business eventually faces questions about its products/services.  Someone has to buy the result of your efforts, the only question is how many are buying and at what price.  In academia, without a grant you may just be an adjunct professor.  Professors are effectively running mission driven organizations that are non-profit in nature, and their investors are the grant review panels who greatly consider the peer review process in the awarding process.

Bridge the Consumer Gap

Nothing goes exactly as planned.  Consumers may buy initially, but there will inevitably be questions and business representatives can not possibly address all those questions.  A small "army" may be necessary to handle the questions, and armies need clear direction, so businesses are inevitably reviewed and accepted by its peers.  The peer review helps bridge the gap between business and consumer.

Unlike academic peer review, businesses often have privacy requirements for competitive advantage that preclude the open exposure of a particular solution. In these cases, credibility is demonstrated when your solution can provide answers to clients' particular use cases. This is a practical peer review in the jungle of business.

Incestual Peer Review

You can get lost in this peer review process, each person has their thoughts which they write about, affecting others' thoughts which they write about, and so on and so forth.  A small group of people can produce mountains of "peer reviewed" papers and convince themselves of whatever they like, much like any crazy person can shut the outside world and become lost in their own thoughts.  Godel's incompleteness theorem can be loosely interpreted as, "For any system there will always be statements that are true, but that are unprovable within the system."  Godel was dealing specifically with natural numbers, but we inherently understand that you can not always look inward for the answer, you have to engage the outside world.

Snake Oil Salesman

Conversely, without peer review or accountability, cursory acceptance (i.e. consumer purchases) can give a false sense of legitimacy.  Some people will give anything a try at least once, and the snake oil salesman is the perfect example.  Traveling from town to town, the salesman brings an oil people have never seen before and claims it can remedy all of their worst ailments; However, once people use the snake oil and realize it is no more effective than a sugar pill, the salesman has already moved on to another town. Experience with a business goes a long way in legitimizing the business.

Avoid Mob Rule

These two forms of legitimacy, looking internal versus external, peer review versus purchase, can be extremely powerful, rewarding, and a positive force in society.  Have you ever had an idea people didn't immediately accept?  Did you look to your friends and colleagues for support before sharing your idea more widely?  This is a type of peer review (though not a formal one), and something we use to develop and introduce new ideas.

The Matrix

Conversely, have you ever known something to be true but can't find the words to convince others it is true?  In The Matrix, Morpheus tells Neo, "Unfortunately, no one can be told what the Matrix is. You have to see it for yourself."  If people can be made to see what you know to be true, to experience the matrix rather than be told about it, they have more grounds to believe and accept your claims.  Sometimes in business you have to ignore the nay-sayers, build your vision, and let its adoption speak for itself.  Ironically there are those who would presume to teach birds to fly, and businesses may watch the peer review process explain how their vision works only to then be lectured on why they were successful.

Conclusion

In legitimizing our work, business or academic, when do we look to peer review and when do we look to engaging the world?  This is a self similar process, where we may gather our own thoughts before speaking, or we may consult our friends and colleagues before publishing, but above all we must be aware of who is consuming our product and review our product accordingly before sharing it.

Selling Data Science: Validation

FixMyPineapple2 We are all familiar with the phrase "We can not see the forest for the trees", and this certainly applies to us as data scientists.  We can become so involved with what we're doing, what we're building, the details of our work, that we don't know what our work looks like to other people.  Often we want others to understand just how hard it was to do what we've done, just how much work went into it, and sometimes we're vain enough to want people to know just how smart we are.

So what do we do?  How do we validate one action over another?  Do we build the trees so others can see the forrest?  Must others know the details to validate what we've built, or is it enough that they can make use of our work?

We are all made equal by our limitation to 24 hours in a day, and we must choose what we listen to and what we don't, what we focus on and what we don't.  The people who make use of our work must do the same.  John Locke proposed the philosophical thought experiment, "If a tree falls in the woods and no one is around to hear it, does it make a sound?"  If we explain all the details of our work, and no one gives the time to listen, will anyone understand?  To what will people give their time?

Let's suppose that we can successfully communicate all the challenges we faced and overcame in building our magnificent ideas (as if anyone would sit still that long), what then?  Thomas Edison is famous for saying, “I have not failed. I've just found 10,000 ways that won't work.”, but today we buy lightbulbs that work, who remembers all the details about the different ways he failed?  "It may be important for people who are studying the thermodynamic effects of electrical currents through materials." Ok, it's important to that person to know the difference, but for the rest of us it's still not important.  We experiment, we fail, we overcome, thereby validating our work because others don't have to.

Better to teach a man to fish than to provide for him forever, but there are an infinite number of ways to successfully fish.  Some approaches may be nuanced in their differences, but others may be so wildly different they're unrecognizable, unbelievable, and beg for incredulity.  The catch is (no pun intended) methods are valid because they yield measurable results.

It's important to catch fish, but success is not consistent nor guaranteed, and groups of people may fish together so after sharing their bounty everyone is fed.  What if someone starts using this unrecognizable and unbelieveable method of fishing?  Will the others accept this "risk" and share their fish with those who won't use the "right" fishing technique, their technique?  Even if it works the first time that may simply be a fluke they say, and we certainly can't waste any more resources "risking" hungry bellies now can we.

So does validation lie in the method or the results?  If you're going hungry you might try a new technique, or you might have faith in what's worked until the bitter end.  If a few people can catch plenty of fish for the rest, let the others experiment.  Maybe you're better at making boats, so both you and the fishermen prosper.  Perhaps there's someone else willing to share the risk because they see your vision, your combined efforts giving you both a better chance at validation.

If we go along with what others are comfortable with, they'll provide fish.  If we have enough fish for a while, we can experiment and potentially catch more fish in the long run.  Others may see the value in our experiments and provide us fish for a while until we start catching fish.  In the end you need fish, and if others aren't willing to give you fish you have to get your own fish, whatever method yields results.

Selling Data Science: Common Language

What do you think of when you say the word "data"?  For data scientists this means SO MANY different things from unstructured data like natural language and web crawling to perfectly square excel spreadsheets.  What do non-data scientists think of?  Many times we might come up with a slick line for describing what we do with data, such as, "I help find meaning in data" but that doesn't help sell data science.  Language is everything, and if people don't use a word on a regular basis it will not have any meaning for them.  Many people aren't sure whether they even have data let alone if there's some deeper meaning, some insight, they would like to find.  As with any language barrier the goal is to find common ground and build from there.

You can't blame people, the word "data" is about as abstract as you can get, perhaps because it can refer to so many different things.  When discussing data casually, rather than mansplain what you believe data is or what it could be, it's much easier to find examples of data that they are familiar with and preferably are integral to their work.

The most common data that everyone runs into is natural language, unfortunately this unstructured data is also some of the most difficult to work with; In other words, they may know what it is but showing how it's data may still be difficult.  One solution: discuss a metric with a qualitative name, such metrics include: "similarity", "diversity", or "uniqueness".  We may use the Jaro algorithm to measure similarity, where we count common letters between two strings and their transpositions, and there are other algorithms.  When discuss 'similarity' with someone new, or any other word that measures relationships in natural language, we are exploring something we both accept and we are building common ground.

first50JournalistDeathAssoc

Some data is obvious, like this neatly curated spreadsheet from the Committee to Protect Journalists.  Part of my larger presentation at Freedom Hack (thus the lack of labels), the visualization shown on the right was only possible to build in short order because the data was already well organized.  If we're lucky enough to have such an easy start to a conversation, we get to bring the conversation to the next level and maybe build something interesting that all parties can appreciate; In other words we get to "geek out" professionally.

Selling Data Science

Data Science is said to include statisticians, mathematicians, machine learning experts, algorithm experts, visualization ninjas, etc., and while these objective theories may be useful in recognizing necessary skills, selling our ideas is about execution.  Ironically there are plenty of sales theories and guidelines, such as SPIN selling, the iconic ABC scene from boiler room, or my personal favorite from Glengarry Glenross, that tell us what we should be doing, what questions we should be asking, how a sale should progress, and of course how to close, but none of these address the thoughts we may be wrestling with as we navigate conversations.  We don't necessarily mean to complicate things, we just become accustomed to working with other data science types, but we still must reconcile how we communicate with our peers versus people in other walks of life who are often geniuses in their own right.

We love to "Geek Out", we love exploring the root of ideas and discovering what's possible, but now we want to show people what we've discovered, what we've built, and just how useful it is.  Who should we start with?  How do we choose our network?  What events should we attend?  How do we balance business and professional relationships?  Should I continue to wear a suit?  Are flip-flops too casual?  Are startup t-shirts a uniform?  When is it appropriate to talk business?  How can I summarize my latest project?  Is that joke Ok in this context?  What is "lip service"?  What is a "slow no"?  Does being "partnered" on a project eventually lead to paying contracts?  What should I blog about?  How detailed should my proposal be?  What can I offer that has meaning to those around me?  Can we begin with something simple, or do we have to map out a complete long term solution?  Can I get along professionally with this person/team on a long term project?  Can I do everything that's being asked of me or should I pull a team together?  Do I have the proper legal infrastructure in place to form a team?  What is appropriate "in kind" support?  Is it clear what I'm offering?

The one consistent element is people: who would we like to work with and how.  This post kicks off a new series that explores these issues and helps us balance between geeking out and selling the results, between creating and sharing.