Putting Maryland on the (Data) Map

With the formation of the newly re-named Data Science MD, it seemed appropriate to highlight some of the Maryland-based practitioners of data science. While New York, San Francisco, and even DC get all the attention, Baltimore and the surrounding areas are full of companies, researchers, startups, and government that are actively developing data products and analytics.



Anyone who uses the Internet has heard of the social gaming company, Zynga. Though headquartered in California, Zynga maintains a strong East coast office in Timonium, MD. From Farmville to Words With Friends, Zynga has created numerous, popular games that generate vast amounts of data. Data analytics is so important to Zynga that a vice president in charge of the data-analysis team was quoted as saying "We're an analytics company masquerading as a games company." At the XLDB2012 conference, Daniel McCaffrey of Zynga, gave this presentation on using big data to make games more fun.


In the field of finance, data science is typically associated with the work of  quants, who are responsible for the analysis of a range of financial products. At T. Rowe Price, there are many quantitative analysts actively building stock selection models, conducting time series analysis, and researching quantitative methods using tools such as R, Matlab, and Java. Another finance firm using systematic data-driven investment techniques is Campbell & Company in Baltimore. Alongside these companies, are several top-tier business schools including Johns Hopkins Carey Business School, and Robert H. Smith School of Business. Professor William Agresti, Professor Jim K. Liew, and other faculty at Johns Hopkins are teached a range of courses that include statistical analysis and financial modeling.


Maryland is host to 70 of the top 100 defense contracting firms and has over 9,000 businesses that provide goods and services for America’s national defense. As big data storage and analytics becomes more critical to defense, companies such as Booze Allen Hamilton, Varen Technologies, and Six3 Systems have developed internal data science and development teams focused on mastering many of the techniques and tools used in industry. Inside the government, data products such as Accumulo are being developed to fulfill the stringent requirements necessary to secure and analyze the mountains of defense oriented data. Outside of government, at nearby University of Maryland Baltimore County, Professor Joshi and others in the Computer Science Department are working with data science in cybersecurity.


Data analytics are crucial to media and advertisements. Through analysis, advertisements can target specific audiences. Millennial Media, headquartered in Baltimore, is actively providing creative solutions to advertising customers using tools Hadoop, Vertica and Cassandra. With their own MYDAS Technology, they are aggregating user data and providing real time optimization decisions. Ad.com, an AOL company, is a leader in advertising across desktop, mobile, tablet and connected TVs. Handling 4 terabytes of advertisment data daily, Ad.com is defining how to process true big data.


Maryland is well known for the healthcare work being conducted across the state. At the Johns Hopkins Bloomberg School of Public Health, Professors Jeff Leek and Roger Peng are actively using statistical analysis to offer new insights in health. Professor Leek, who is also instructing the Data Analysis course at Coursera, is focused on statistical methods for high-dimensional data and genomics. Professor Peng, who is instructing the Computing for Data Analysis, is working in environmental biostatistics, researching the health effects of air pollution and climate change.


At Red Owl Analytics, their Reveal product helps to analyze data trails to uncover often overlooked patterns. Using their software, companies are able to analyze individual communications data for illicit activity, such as financial crimes. Dealing with both structured and unstructured data can be tedious and frustrating, but local IKANOW provides an all-in-one big data solution for harvesting and analyzing both types of data with a customizable plug-in-play framework.


Making government data open and accessible continues to shape how citizens interact with their government. With their Data Catalog, Howard County has taken the lead in providing interactive access to a variety of records, spatial data, maps, and reports. From building permits to preservation easement maps, a range of data is available for citizens and developers to access. While not part of the local government, Baltimore Datamind has created an interactive map of neighborhood-level data to promote collaboration, advocacy, informed decisions, and effective policy making.

This is only a sample of the data focused work being conducted in the area.


If I overlooked an interesting company, institution, or practitioner, I would love to hear about it. I can be reached at jason.barbour@gmail.com or @jtbarbour. Also, don't miss out on our latest event, Teaching Machines to Read: Processing Text with NLP, happening on March 14th up in Baltimore.


by Jason Barbour is a software engineer at Varen Technologies, focused on developing analytics for large data sets using tools such as Hadoop, Pig, and Accumulo. Previously, Jason worked as a network analyst for computer network operations. Jason holds a Master of Computer Science degree from UMBC where he concentrated on sensor network connectivity and security analysis. He is a co-organizer of the Data Science MD meetup.