Status, Professions, and Why We're Building a Data Science Community

The dozen or so of us who volunteer our time to run Meetups and serve on the Data Community DC board are occasionally asked -- why? Why is it important to us to create a regional community around data and statistical practitioners? This is an excellent question, one that I (Harlan Harris, DC2 President) have struggled with. It's related to questions such as "What is Data Science?" and "Is big data just hype?", which we have occasionally tackled on this blog and at Meetup events.

I recently read a seemingly unrelated blog post that suggests some possible new answers, and I'd like to share some thoughts here. As a warning, this post will be heavy on navel-gazing (one of my favorite things), and may feel slightly mercenary. Here's the punchline: I want us to provide the medium by which data and analytical professionals can establish a predictable and transparent professional currency.

Kevin Simler wrote a post a couple weeks ago on Ribbonfarm entitled "The Economics of Social Status." In the article, he talks about how it can be clarifying to think about social status as a "good," in the way that money or health is a good -- something that can be traded for other things of value. Of particular interest to me was Simler's thoughts about community:

"Status is defined with respect to a community... let’s take status to be... the total amount of social influence a person has over the other members of his or her community." [bold in original]

At some level, those of us who run these events do this work for our personal status gains, sure. But I think there's something even deeper, and it has to do with the particular community we're involved with and the current point in time.

The community who attends our events and reads this blog is rather new and rather ill-defined. It includes data scientists (whatever those are), data journalists and designers, many statisticians, scientists, and engineers, programmers and database experts, and others. Terms are fluid, and categories overlap. People have widely different educational and professional backgrounds, yet are all meeting to talk about data and analytics. As a result, within this community, role and status is currently relatively illegible (another concept I learned about from Ribbonfarm). Many people seem to be doing amazing things, but it's not clear how to determine what's typical vs. really exceptional, nor is it clear how to move up the professional ladder. To benefit from your status, and to change it, you need to be able to know where you stand.

Your status is based not just on how people react to you, but also on how people think everyone else will react to you... [S]tatus is most reliably measured in public. We tweak our private estimates during every pairwise interaction, but reconcile those estimates during interactions that take place in front of larger audiences — because that’s where we can observe the reactions of everyone else. This explains some of our desire to watch speeches, movies, events, etc. in large crowds.

So, we attend Meetup events (and professional events generally) to calibrate our understanding of the community and where we fit into it. What do our peers think of our work? What tools and techniques do high-status people use, and can we learn to use them too?

Unlike other professional fields, the people who attend our events don't really have an academic professional society. Or rather, they have many. In our January survey of our community, people reported attending meetings sponsored by a variety of professional societies, including the American Statistical Association, ACM's SIGKDD, and INFORMS. Members of these societies generally share an academic experience and a career path, and their relative status is generally quite clear. But recent changes in how data work is performed in organizations, and who performs that work, has led those professional societies to be only weakly aligned with the community as a whole.

And I think that's where we come in. By letting people present their work and interests publicly, to meet others who do similar work (despite radically different backgrounds and industries), and to align their understandings of professional and social status, we create some of the value that professional organizations have traditionally had for their members.

This may raise the question of whether Data Community DC and related organizations (e.g., DataGotham, Strata) will end up mimicking the older professional organizations, with dues and local chapters and staid hierarchies. I don't think so. I think that in the way that LinkedIn has created at least some sense of professional connection for many people, without traditional organizational structure, the data community may end up with a more decentralized and informal system. Particularly for the people who do the sort of work that we do, less-formal local groups, combined with technology, may replace the traditional signaling mechanisms, continuing education, and networking that older structures offered. (See also this recent discussion of Data Science as a profession.)

Finally, another important idea from Simler's article is that the "strength of a community" is equivalent to "the level of agreement about how to measure status." This means that by spending the time and effort to build a community that understands itself and how people fit in, we are also building the strength of that community. As the data and analytics community becomes more coherent and legible to its members, that community also becomes able to act more coherently with regards to the external world. And I think that's a great reason to get involved in organizations like ours.

(Disclaimer: The above may or may not be the opinion of the author, and certainly is not the opinion or policy of Data Community DC.)