The Data Week DC merch store is open for business!
I am writing this post for two reasons. The first is to share with you why I am working with Data Community DC to produce DC DATACON. The second is to ask for your help to ensure we have the right people involved in the conversation so we accomplish our objectives. The stakes are high for our regional economy and for the future of Data Science.
If you are not familiar with Data Community DC, it is a network of over 20,000 data scientists and other data professionals in the DC region. Nine different meet-ups engage in a “program” throughout the year to address a diverse range of data science topics at locations across the region (namely DC and close-in NoVA). By going to any one of the meet-up sites (best accessed via the DC2 website), you can get information about all of these great events.
As I got more involved with Data Community DC as a Community Organizer with the founding and of the Full Stack Data Science meet-up (now 500+ data scientists strong) and later joining the Board of Directors, it became clear that something was missing. That was a unified conversation with all the meet-ups coming together to not only learn from one another but to create a larger conversation about the future of data science in the Mid-Atlantic region – led by data professionals.
DC DATACON is scheduled for October 3rd at The Marvin Center on the campus of George Washington University. Our registration site will be up soon. On the conference site, we will have all the information needed to register, sponsor and/or exhibit. DC DATACON certainly compliments other “new” forums taking on data-related topics. However, we are looking to accomplish something a little different. We aim to drive conversations about how to get data science out of the lab and into the enterprise so professionals of all makes and models can begin using data as a strategic asset. I believe the benefits to organizations large and small, public and private are well known so I don’t need to restate them here. What is not so obvious is HOW you actually do it. That’s DC DATACON; a focus on the applied tools, technologies, and methodologies that make data science work.
DC DATACON is also about defining our region’s “brand” of data science. What market sectors are we best suited to lead as data science becomes a more significant factor in the health and well-being of our regional economy. Another way of saying this is to put our stake in the ground for our areas of concentration to drive R&D, investment, workforce shaping, product development and so on. Think about, what region is top of mind when you mention the financial sector? As a data scientist, you would say NYC. For consumer applications, you think Silicon Valley. Not to say data scientists and businesses here in DC aren’t working in finance and consumer apps, but data science in DC will likely not be defined by either of these. Our focus, and what differentiates our region from others, are opportunities to lead and shape the future of intelligent transportation, autonomous air operations, cybersecurity, and bioinformatics for improved quality of care in addition to public policy, defense and intelligence. DC DATACON; a focus on our special concentration of the public and private sector organizations.
With that as context, I hope you will consider supporting the event on October 3rd by spreading the word and planning to join us on October 3rd. If you’re in a position to sponsor or exhibit, this is a chance to get really hands on. But, to the opening “ask” we want to make sure we have the right people involved. Therefore, I welcome any thoughts you have about who you believe are the leading people, companies and technologies in data science in our region (public, private, not for profit). If you take a look through the lens of transportation, cybersecurity, healthcare, defense, intelligence, and public policy, that would focus your response in a way to be particularly helpful.
You can respond to this post or message me privately with your thoughts. My e-mail is firstname.lastname@example.org. Thank you!
We're proud to introduce your new
Data Community DC Agents
In a city that is built on who you know, your DC2 Agents fit in perfectly. We are your access point to a growing network of 20,000 local data practitioners, focusing on helping our members find each other across our weekly events and generally keeping our members’ interests in mind. Here are some of the scenarios we work with:
- Member: I’d like to get into deep learning.
Agent: We’re hosting a talk on deep learning Tuesday, have experts on our advisory board, and organize a workshop twice a month. Where would you like to start?
Member: I work in excel and would like to learn a little code, where do I start?
Agent: We have a number of educational partners and we host free mini-hacks. What’s your budget?
Member: My company just won a contract and we need to hire, can you help?
Agent: Our free events are filled with local talent, and if you need a little support we’d be happy to introduce you to one of our partners.
Member: We’d like to promote ourselves in the local data scene, and get to know the key players. How can DC2 help?
Agent: Our network of 20,000 data practitioners is growing every day and reaches 500+ people each month. Our Sponsorship Network and Organizers Program is regularly invited to private and invitation only events. Would you like to join us?
Member: I just finished my dissertation and would like to get involved in the community. Where do I start?
- Agent: Perhaps you would you like to speak at one of our events or help organize?
In the 5 years of DC2’s existence one thing has been consistent: people attend our events looking for great people and opportunities in data science. We tried so many things over the years: hackathons, themed events, coordinated events, joint events, meetings, committees, titles, volunteers, debating, defining, electing, partnering, sponsoring, hiring... (deep breath) - let’s talk about what happened.
This is how I think of DC networking events, especially tech events:
Over the past 6 years we’ve perfected hosting events and mentoring new organizers, but it takes months and years to build relationships and find your friends and core network. Even then, we don’t know where people are in their lives, what they are looking for, all their unique skills, what they would like help with. With new relationships at events, we could be talking to exactly the right person, but in our 5 or 6 conversations during 45 minute networking session before the main event, will we get to our bottom lines and realize the person across from us is exactly who we’re looking for? Many people never have the chance to find out, and this is why we’ve created the DC2 Agents.
So the next time you’re looking to meet the right data practitioner, looking to build your data science team, looking to join a new team, looking to find a good mentor, looking for the right classes, looking to teach, volunteer, be a speaker, or anything else data, who ya gonna call?
Aaron Schumacher, one of the Data Science DC organizers and an employee of Arlington-based Deep Learning Analytics, wrote the article with the support of many local reviewers, including feedback from members of the DC Machine Learning Journal Club.
Aaron will be giving a talk on the material of Hello, TensorFlow! on Wednesday June 29 as part of the Deep Dive into TensorFlow meetup to be hosted at Sapient in Arlington. It should be a great opportunity to explore and discuss this new and exciting tool!
Data Community DC and District Data Labs are hosting a Supervised Machine Learning with R workshop on Saturday April 30th. Come out and learn about R's capabilities for regression and classification, how to perform inference with these models, and how to use out-of-sample evaluation methods for your models!
Data Community DC and District Data Labs are hosting a Natural Language Processing with Python workshop on Saturday April 9th from 9am - 5pm. Register before March 26th for an early bird discount!
Data Community DC and District Data Labs are hosting a Data Visualization with R workshop on Saturday April 2nd from 9am - 5pm. Register before March 19th for an early bird discount!
Data Community DC and District Data Labs are hosting a Graph Analytics with Python workshop on Saturday March 12th from 9am - 5pm. Register before February 27th for an early bird discount!
Data Community DC and District Data Labs are hosting a Machine Learning with Python workshop on Saturday February 20th from 9am - 5pm. Register before February 6th for an early bird discount!
Data Community DC and District Data Labs are hosting another session of their Building Data Apps with Python workshop on Saturday February 6th from 9am - 5pm. If you're interested in learning about the data science pipeline and how to build and end-to-end data product using Python, you won't want to miss it. Register before January 23rd for an early bird discount!
Data Community DC and District Data Labs are hosting a Web Scraping with Python workshop on Saturday January 30th from 9am - 5pm. Register before January 16th for an early bird discount!
For several years now, the DC Nightowls meetup has been a stable of after hours coworking for entrepreneurs, startups, and self-starters doing interesting projects in the Metropolitan DC area. A number of our Data Science community members have been owls as well.
Recently, we decided to combine forces by re-focusing DC Nightowls through a new program called DC2 Digital Nomads. The new program's focus is on the gig economy, freelance knowledge workers, and remote working.
Troubling instances of the mosaic effect — in which different anonymized datasets are combined to reveal unintended details — include the tracking of celebrity cab trips and the identification of Netflix user profiles. Also concerning is the tremendous influence wielded by corporations and their massive data stores, most notoriously embodied by Facebook’s secret psychological experiments.
Data Community DC and District Data Labs are hosting a Natural Language Processing with R workshop on Saturday November 21st from 9am - 5pm. Register before November 7th for an early bird discount!
During an upcoming free workshop, Andrej Lapajne will be going in depth on the benefits of using IBCS to improve your data visualization practices and communication. Here is a brief introduction to what IBCS is and how it is helping businesses across the world visualize their data effectively and consistently.
Are you using data visualization to improve your reports, presentations and communications, or to unknowingly hinder them? All too often, reports fall somewhere between messy spreadsheets and dashboards, full of poorly labeled and inappropriate charts, that simply do not get the message across to the decision-makers.
Countless reports and presentations are created throughout organizations on a daily basis, all in different formats, lengths, shapes and colors, depending on preferences of the person who prepares them. The end results are often managers not making their way through the data presented, time being wasted, and important decisions failing to be made.
The solution - International Business Communication Standards
In 2004 Dr. Rolf Hichert, the renowned German professor, took on a challenge to standardize the way data visualizers present data in their reports, dashboards and presentations. His extremely successful work culminated in 2013 with the public release of the International Business Communication Standards (IBCS) the world’s first practical proposal for the standardized design of business communication.
The IBCS consistently define shapes and colors of actuals and budgets, variances, different KPIs, etc. Often referred to as the “traffic signs for management”, the IBCS are a set of best practices that went viral in Europe and have solved business communication problems in numerous companies such as SAP, Bayer, Lufthansa, Philips, Coca-Cola Bottlers, Swiss Post, etc.
Profit & Loss analysis (income statement) with waterfall charts and variances
How does it work?
Let’s take a look at a typical column chart, designed to help us compare actual sales figures vs. budget:
Is it efficient? The colors used are completely arbitrary, probably just an accidental default of the software tool. It is quite hard to estimate the variances to budget. Are we above the budget or below the budget in a particular month? For how much?
Now let’s observe the same dataset, designed according to the IBCS:
The actuals are depicted as dark grey full columns, while the budget is an outline. This is called scenario coding: the budget is an empty frame that has to be filled up with the actuals.
The variances are explicitly calculated and visualized. Positive variance is green, negative is red. The user’s attention is guided to the variances, which are in this case the key element to understand the sales performance.
The values are explicitly labeled at the most appropriate position on the chart. All texts are standardized, exact, short and displayed horizontally.
Storyline, visual design and uniform notation
The IBCS standards are not just about charts. They comprise of an extensive set of rules and recommendations for the design of business communication that help:
- Organize and structure your content by using an appropriate storyline
- Present your content by using an appropriate visual design and
- Standardize the content by using a consistent, uniform notation.
After you apply the IBCS rules to your standard variance report, it will look something like this:
Sales variance report - Actual vs PY vs Budget
As you may have noticed, this report has several distinctive features:
- The key message (headline) at the top
- Title elements below the key message
- Clear structure of columns (first PY for previous year values, then AC for actual and at the end BU for budget; always in this order)
- Scenario markers below column headers (grey for PY, black for AC and outline for BU)
- Strictly no decorative elements, only a few horizontal lines
- Variances are visualized with red/green “plus-minus” charts and embedded into the table
- Absolute variances (ΔPY, ΔBU) are visualized as a bar chart, while relative variances (ΔPY%, ΔBU%) are visualized as “pin” charts (we prefer to call them “lollipop” charts)
- Semantic axis in charts: grey axis for variance to PY (grey = previous year), double line for variance to budget (outline = budget)
- Numbered explanatory comments that are integrated into the report.
A clear message, appropriate data visualization and accurate explanations. The story that numbers are telling presented on one single page. That's what the managers expect.
We will be going into much further detail on IBCS guidelines for visualizing data and business communications during a free workshop on Oct 29th at 10am. To learn more and register, please visit us at zebra.bi/dc2015.
We had a packed house on Saturday, Oct 10th for our State of Data Science Education event. The panelists were fantastic and the questions from the crowd were amazing. The tweets below represent just a fraction of the awesomeness that took place. Check out the twitter hashtag #de15 for the whole story.