educational technology

General Assembly & DC2 Scholarship

GA DC2 Scholarship The DC2 mission statement emphasises that "Data Community DC is an organization committed to connecting and promoting the work of data professionals...", ultimately we see DC2 becoming a hub for data scientists interested in exploring new material, advancing their skills, collaborating, starting a business with data, mentoring others, teaching classes, changing careers, etc. Education is clearly a large part of any of these interests, and while DC2 has held a few workshops and is sponsored by organizations like, we knew we could do more and so we partnered with General Assembly and created a GA & DC2 scholarship specifically for members of Data Community DC.

For our first scholarship we landed on Front End Web Development and User Experience, which we naturally announced first at Data Viz DC.  How does this relate to data science?  As I was happy to rebut Mr. Gelman in our DC2 blogpost reply, sometimes I would love to have a little sandbox where I get to play with algorithms all day, but then again this is exactly what I've run away from in 2013 in becoming an independent data science consultant, I don't want a business plan I'm not a part of dictating what I can play with.  Enter Web Dev and UX.  As Harlan Harris, organizer of DSDC, mentions in his venn diagram on what makes a data scientist, which Tony Ojeda later emphasizes, programming is a natural and necessary part of being a data scientist.  In other words, there's this thing called the interwebs that has more data than you can shake a stick at, and if you can't operate in that environment then as a data scientist you're asking someone else to do that heavy lifting for you.

Over the next month we'll be choosing the winners of the GA DC2 Scholarship, and if you'd like to see any other scholarships in the future please leave your thoughts in the comments below or tweet us.

Happy Thanksgiving!

2013 September DSMD Event: Teaching Data Science to the Masses

Stats-vs-data-science For Data Science MD's Septmeber meetup, we were very fortunate to have the very talented and very passionate Dr. Jeff Leek speak about his experiences teaching Data Science through the online learning platform Coursera. This was also a unique event for DSMD itself because it was the first meetup that only featured one speaker. Having one speaker speak for a whole hour can be a disaster if the speaker is unable to keep the attention of those in the audience. However, Dr. Leek is a very dynamic and engaging speaker and had no problem keeping the attention of everyone in the room, including a couple of middle school students.

For those of you who are not familiar with Dr. Leek, he is a biostatistician at Johns Hopkins University as well as a instructor in the JHU biostatistics program. His biostatistics work typically entails analyzing human genome sequenced data to provide insights to doctors and patients in the form of raw data and advanced visualizations. However, when he is not revolutionizing the medical world or teaching the great biostatisticians of tomorrow at JHU, you may look for him teaching his course on Coursera, or providing new content to his blog, Simply Statistics.

Now, on to the talk. Johns Hopkins and specifically Dr. Leek got involved in teaching a Coursera course because they have constantly been looking at ways to improve learning for their students. They had been "flipping the classroom" by taking lectures and posting them to YouTube so that students could review the lecture material before class and then use the classroom time to dig deeper into specific topics. Because online videos are such a vital component of Massive Open Online Classes (MOOCs), it is no surprise that they took the next leap.

But just in case you think that JHU and Dr. Leek are new to this whole data science "thing," check out their Kaggle team's results for the Heritage Health Prize.


Even though their team fell a few places when run on the private data, they still had a very impressive showing considering there were 1358 teams that entered and over 20,000 entries. But what exactly does data science mean to Dr. Leek? Check out his expanded components of data science chart, that differs from similar charts of other data scientists by showing the root disciplines of each component too.


But what does the course look like?


He covers topics such as type of analyses, how to organize a data analysis, data munging as well as others like:



One of the interesting things to note though is that he also shows examples of poor data analysis attempts. There is a core problem with the statistics example from above (pointed out by high school students). Below is an example of another:


And this course, in addition to two other courses, Computing for Data Analysis and Mathematical Biostatistics Bootcamp taught by other JHU faculty, have had a very positive response.


But how do you teach that many people effectively? That is where the power of Coursera comes in; JHU could have chosen other providers like edX or Udacity but decided to go with Coursera. The videos make it easy to convey knowledge and message boards provide a mechanism to ask questions. Dr. Leek even had students answering questions for other students so that all he had to do was validate the response. But he also pointed out that his class' message board was just like all other message boards and followed 1/98/1 rule where 1% of people respond in a mean way and are unhelpful, 1% of people are very nice and very helpful and the other 98% don't care and don't really respond.

One of the most unique aspects of Coursera is that it helps to scale to tens of thousands of students by using peer/student grading. Each person grades 4 different assignments so that everyone is submitting one answer and grading 4 others. The final score for each student is the median of the four scores from the other students. The rubric used in Dr. Leek's class is below.


The result of this grading policy, based on Dr. Leek's analysis is that good students received good grades, poor students received poor grades and middle students' grades fluctuated a fair amount. So it seems like the policy works mostly, but there is still room for improvement.

But why does Johns Hopkins and Dr. Leek even support this model of learning? They do have full time jobs that involve teaching after all. Well, besides being huge supporters of open source technology and open learning, they also see many other reasons for supporting this paradigm.


Check out the video for the many other reasons why JHU further supports this paradigm. And, while you are at it, see if you can figure out if the x and y axes are related in some way. This was our data science/statistics problem for the evening. The answer can also be found in the video.


We also got a sneak peek at a new tool/component that integrates directly into R - swirl. Look for a meetup or blog post about this tool in the future.


Our next meetup is on October 9th at in Baltimore beginning at 6:30PM. We will have Don Miner speak about using Hadoop for Data Science. If you can make it, come out and join us.

Weekly Round-Up: Big Data Value, Education, Social Data Analysis, and Saving the Planet

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from Big Data's impact on education to using data to reduce global violence. In this week's round-up:

  • The Value of Big Data Isn't the Data
  • Big Data Will Revolutionize Learning
  • Data Analysis Should Be a Social Event
  • Using Big Data to Save the Planet

The Value of Big Data Isn't the Data

This is an Harvard Business Review blog post by CTO of Narrative Science and Northwestern faculty member, Kris Hammond about where he believes the value is in Big Data. Hammond proposes that the value is in getting machines to conduct the data analysis we need conducted and communicating their findings in an intuitive way. In the post, he describes in more detail why he believes this is so valuable and provides explanations and diagrams outlining the steps that can be taken in order to put these processes in place.

Big Data Will Revolutionize Learning

This interesting Smart Data Collective article is about how technology now allows us to capture information about virtually everything that happens in education and what this means for the future of education. Some of these things include customizing content for individual students, reducing drop-out rates, and enhancing the overall learning experience - all resulting in improved student outcomes. The articles talks a little about each of these and describes how they are, and will continue to be, implemented.

Data Analysis Should Be a Social Event

This is another interesting HBR article advocating a more social approach to solving data analysis problems. The authors urge us to use an approach familiar to those that have attended data-dives or hackathons before - get a group of people with various different perspectives together to brainstorm and come up with ideas about how to best solve the problem you're trying to solve. The article points out that this approach doesn't just work well at hackathons, it has also been implemented with great success at companies.

Using Big Data to Save the Planet

Our final article this week is a Slashdot piece about how the U.S. State Department is partnering with groups from around the world and using data analytics to help reduce violence in countries where it is a major problem. According to the article, they are using an analytics tool named Senturion to track data that can be obtained from social networks, economic data, and other sources to provide output that can help determine what types of resources are necessary on the ground in those troubled countries. The article mentions some of the countries where this analytics system is helping to identify conflict trends and also provides some examples of specific initiatives it is providing assistance with.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Data Visualization: Better Events & Education

There is a new approach to education in this country where data is used to accelerate learning; great examples include Apps4VA, EverFi, Unbound Concepts, and New York's Education Data Portal. Unfortunately this same trend is not being as widely applied to professional education (seminars, conferences, meetups, etc.). Why the double standard? Why can't data be used to accelerate professional education, especially at a time when job hopping is the new normal? BLUF: Data Visualization is a new tool to engage event attendees for more successful communication, education, and building of anti-fragile professional networks.

Education Evolution: From Lectures to Data

We are all familiar with the classic Elizabethan style educational format, where the teacher or professor lectures for some period of time, then tests us or gives us independent exercises. Many of the people reading this blog are also familiar with the recently emerged "Data Driven Education", where information is gathered about the student while studying in a number of formats (lecture, tutoring, peer-to-peer, independent, etc.), and that information is used in an algorithm to steer the student toward their most effective combination of educational formats. There are a number of good arguments for this new education format, such as the fact that lectures can be pre-recorded so students' time with educators can be more interactive and allow supervised learning independently or with peers. Data Driven Education also signifies a psychological shift by recognizing that not all students learn in the same way or at the same rate. Students need to discover and learn from their point of view.

Herding Cats

Despite the connectivity of our new technological and mobile world, most of the events we attend as professionals still feature an industry leader in a lecture format. Why is this? Under classic educational systems gathering students' data is easier primarily because the students are a captive audience, they are required to provide the data, however as adults and as professionals we choose when and where to give feedback, if any. In addition, there are privacy concerns as those providing feedback don't necessarily want their feedback data to be widely associated with their name. We may want to migrate from classic lecture style education to data driven education, but the collection of professionals' data means they must be continually convinced to volunteer their data.

Passive vs Active Data Gathering

Making the argument to volunteer data puts the pressure on the educators, which can be a good thing. With a captive audience we can use the students' time to gather data through testing, whether the testing is appealing or not (usually not). With volunteered data, if the process isn't appealing to the professional/student, then the data will not be gathered; the best approach is to collect data as a natural course of events. Data can come naturally from web activity (http logging, cookies, link traces, etc.), during events (mobile app 'like' button during presentations, use of accelerometers to capture 'restlessness', pedometers and/or GPS during an all day/week conference), or from business operations (survey feedback, investigations, reports, purchase orders, etc.). In any case, we must avoid data gathering distractions during valuable event time at all costs.

Data Viz?

Knowing that someone was restless during a boring presentation is not the same as testing subject proficiency, but the goal is not just to achieve a minimal test score, it is impossible to know everything at all times anyway; Today's open source approach requires an anti-fragile community that quickly supports outreach when it's sought (e.g. good coders are also good 'Googlers'). The goal is two-fold: to educate the group and partner need with ability (i.e. demand with supply), and this is where Data Visualization is important. People communicate through objects, especially objects of common interest, the data collected describes those objects, and data visualization naturally presents those objects, clearing the way for discussion and discovery. Great examples include The London Cycle Hire Journeys, National Gun Flow, and Microsoft ViralSearch. Having data visualizations like these, specific to your topic, and stationed throughout an event creates pockets of conversation, allows people to migrate between stations, allows others to answer questions through the interactivity of the data, and frees the speaker to focus on conversations where they're most needed, democratizing the entire experience. Data Visualization is a new tool to engage event attendees for more successful communication, education, and building of anti-fragile professional networks.