This is a guest post by John Kaufhold. Dr. Kaufhold is a data scientist and managing partner of Deep Learning Analytics, a data science company based in Arlington, VA. He presented an introduction to Deep Learning at the March Data Science DC.
Why aren't there more Deep Learning talks, tutorials, or workshops in DC2?
It's been about two months since my Deep Learning talk at Artisphere for DC2. Again, thanks to the organizers (especially Harlan Harris and Sean Gonzalez) and the sponsors (especially Arlington Economic Development). We had a great turnout and a lot of good questions that night. Since the talk and at other Meetups since, I've been encouraged by the tidal wave of interest from teaching organizations and prospective students alike.
First some preemptive answers to the “FAQ” downstream of the talk:
- Mary Galvin wrote a blog review of this event.
- Yes, the slides are available.
- Yes, corresponding audio is also available (thanks Geoff Moes).
- A recently "reconstructed" talk combining the slides and audio is also now available!
- Where else can I learn more about Deep Learning as a data scientist? (This may be a request to teach, a question about how to do something in Deep Learning, a question about theory, or a request to do an internship. They're all basically the same thing.)
- It's this last question that's the focus of this blog post. Lots of people have asked and there are some answers out there already, but if people in the DC MSA are really interested, there could be more. At the end of this post is a survey—if you want more Deep Learning, let DC2 know what you want and together we'll figure out what we can make happen.
There actually was a class...
Aaron Schumacher and Tommy Shen invited me to come talk in April for General Assemb.ly's Data Science course. I did teach one Deep Learning module for them. That module was a slightly longer version of the talk I gave at Artisphere combined with one abbreviated “hands on” module on unsupervised feature learning based on Stanford's tutorial. It didn't help that the tutorial was written in Octave and the class had mostly been using Python up to that point. Though feedback was generally positive for the Deep Learning module, some students wondered if they could get a little more hands on and focus on specifics. And I empathize with them. I've spent real money on Deep Learning tutorials that I thought could have been much more useful if they were more hands on.
Though I've appreciated all the invitations to teach courses, workshops, or lectures, except for the General Assemb.ly course, I've turned down all the invitations to teach something more on Deep Learning. This is not because the data science community here in DC is already expert in Deep Learning or because it's not worth teaching. Quite the opposite. I've not committed to teach more Deep Learning mostly because of these three reasons:
- There are already significant Deep Learning Tutorial resources out there,
- There are significant front end investments that neophytes need to make for any workshop or tutorial to be valuable to both the class and instructor and,
- I haven't found a teaching model in the DC MSA that convinces me teaching a “traditional” class in the formal sense is a better investment of time than instruction through project-based learning on research work contracted through my company.
Resources to learn Deep Learning
There are already many freely available resources to learn the theory of Deep Learning, and it's made even more accessible by many of the very lucid authors who participate in this community. My talk was cherry-picked from a number of these materials and news stories. Here are some representative links that can connect you to much of the mainstream literature and discussion in Deep Learning:
- The tutorials link on the DeepLearning.net page
- NYU's Deep Learning course course material
- Yann LeCun's overview of Deep Learning with Marc'Aurelio Ranzato
- Geoff Hinton's Coursera course on Neural Networks
- A book on Deep Learning from the Microsoft Speech Group
- A reading list list from Carnegie Mellon with student notes on many of the papers
- A Google+ page on Deep Learning
This is the first reason I don't think it's all that valuable for DC to have more of its own Deep Learning “academic” tutorials. And by “academic” I mean tutorials that don't end with students leaving the class successfully implementing systems that learn representations to do amazing things with those learned features. I'm happy to give tutorials in that “academic” direction or shape them based on my own biases, but I doubt I'd improve on what's already out there. I've been doing machine learning for 15 years, so I start with some background to deeply appreciate Deep Learning, but I've only been doing Deep Learning for two years now. And my expertise is self-taught. And I never did a post-doc with Geoff Hinton, Yann LeCun or Yoshua Bengio. I'm still learning, myself.
The investments to go from 0 to Deep Learning
It's a joy to teach motivated students who come equipped with all the prerequisites for really mastering a subject. That said, teaching a less equipped, uninvested and/or unmotivated studentry is often an exercise in joint suffering for both students and instructor.
I believe the requests to have a Deep Learning course, tutorial, workshop or another talk are all well intentioned... Except for Sean Gonzalez—it creeps me out how much he wants a workshop. But I think most of this eager interest in tutorials overlooks just how much preparation a student needs to get a good return on their time and tuition. And if they're not getting a good return, what's the point? The last thing I want to do is give the DC2 community a tutorial on “the Past” of neural nets. Here are what I consider some practical prerequisites for folks to really get something out of a hands-on tutorial:
- An understanding of machine learning, including
- optimization and stochastic gradient descent
- hyperparameter tuning
- at least a passing understanding of neural nets
- A pretty good grasp of Python, including
- a working knowledge of how to configure different packages
- some appreciation for Theano (warts and all)
- a good understanding of data preparation
- Some recent CUDA-capable NVIDIA GPU hardware* configured for your machine
- CUDA drivers
- NVIDIA's CUDA examples compiled
*hardware isn't necessarily a prerequisite, but I don't know how you can get an understanding of any more than toy problems on a CPU
Resources like the ones above are great for getting a student up to speed on the “academic” issues of understanding deep learning, but that only scratches the surface. Once students know what can be done, if they’re anything like me, they want to be able to do it. And at that point, students need a pretty deep understanding of not just the theory, but of both hardware and software to really make some contributions in Deep Learning. Or even apply it to their problem.
Starting with the hardware, let's say, for sake of argument, that you work for the government or are for some other arbitrary reason forced to buy Dell hardware. You begin your journey justifying the $4000 purchase for a machine that might be semi-functional as a Deep Learning platform when there's a $2500 guideline in your department. Individual Dell workstations are like Deep Learning kryptonite, so even if someone in the n layers of approval bureaucracy somehow approved it, it's still the beginning of a frustrating story with an unhappy ending. Or let's say you build your own machine. Now add “building a machine” for a minimum of about $1500 to the prerequisites. But to really get a return in the sweet spot of those components, you probably want to spend at least $2500. Now the prerequisites include a dollar investment in addition to talent and tuition! Or let’s say you’re just going to build out your three-year-old machine you have for the new capability. Oh, you only have a 500W power supply? Lucky you! You’re going shopping! Oh, your machine has an ATI graphics card. I’m sure it’s just a little bit of glue code to repurpose CUDA calls to OpenCL calls for that hardware. Let's say you actually have an NVIDIA card (at least as recent as a GTX 580) and wanted to develop in virtual machines, so you need PCI pass-through to reach the CUDA cores. Lucky you! You have some more reading to do! Pray DenverCoder9's made a summary post in the past 11 years.
“But I run everything in the cloud on EC2,” you say! It's $0.65/hour for G2 instances. And those are the cheap GPU instances. Back of the envelope, it took a week of churning through 1.2 million training images with CUDA convnets (optimized for speed) to produce a breakthrough result. At $0.65/hour, you get maybe 20 or 30 tries doing that before it would have made more sense to have built your own machine. This isn't a crazy way to learn, but any psychological disincentive to experimentation, even $0.65/hour, seems like an unnecessary distraction. I also can't endorse the idea of “dabbling” in Deep Learning; it seems akin to “dabbling” in having children—you either make the commitment or you don't.
At this point, I’m not aware of an “import deeplearning” package in Python that can then fit a nine layer sparse autoencoder with invisible CUDA calls to your GPU on 10 million images at the ipython command line. Though people are trying. That's an extreme example, but in general, you need a flexible, stable codebase to even experiment at a useful scale—and that's really what we data scientists should be doing. Toys are fine and all, but if scale up means a qualitatively different solution, why learn the toy? And that means getting acquainted with the pros and cons of various codebases out there. Or writing your own, which... Good luck!
DC Metro-area teaching models
I start from the premise that no good teacher in the history of teaching has ever been rewarded appropriately with pay for their contributions and most teaching rewards are personal. I accept that premise. And this is all I really ever expect from teaching. I do, however, believe teaching is becoming even less attractive to good teachers every year at every stage of lifelong learning. Traditional post-secondary instructional models are clearly collapsing. Brick and mortar university degrees often trap graduates in debt at the same time the universities have already outsourced their actual teaching mission to low-cost adjunct staff and diverted funds to marketing curricula rather than teaching them. For-profit institutions are even worse. Compensation for a career in public education has never been particularly attractive, but still there have always been teachers who love to teach, are good at it, and do it anyway. However, new narrow metric-based approaches that hold teachers responsible for the students they're dealt rather than the quality of their teaching can be demoralizing for even the most self-possessed teachers. These developments threaten to reduce that pool of quality teachers to a sparse band of marginalized die-hards. But enough of my view of “teaching” the way most people typically blindly suggest I do it. The formal and informal teaching options in the DC MSA mirror these broader developments. I run a company with active contracts and however much I might love teaching and would like to see a well-trained crop of deep learning experts in the region, the investment doesn't add up. So I continue to mentor colleagues and partners through contracted research projects.
I don't know all the models for teaching and haven't spent a lot of time understanding them, but none seem to make sense to me in terms of time invested to teach students—partly because many of them really can't get at the hardware part of the list of prerequisites above. This is my vague understanding of compensation models generally available in the online space*:
- Udemy – produce and own a "digital asset" of the course content and sell tuition and advertising as a MOOC. I have no experience with Udemy, but some people seemed happy to have made $20,000 in a month. Thanks to Valerie at Feastie for suggesting this option.
- Statistics.com – Typically a few thousand for four sessions that Statistics.com then sells; I believe this must be a “work for hire” copyright model for the digital asset that Statistics.com buys from the instructor. I assume it's something akin to commissioned art, that once you create, you no longer own. [Editor’s note: Statistics.com is a sponsor of Data Science DC. The arrangement that John describes is similar to our understanding too.]
- Myngle – Sell lots of online lessons for typically less than a 30% share.
And this is my understanding of compensation models locally available in the DC MSA*:
- General Assemb.ly – Between 15-20% of tuition (where tuition may be $4000/student for a semester class).
- District Data Labs Workshop – Splits total workshop tuition or profit 50% with the instructor—which may be the best deal I've heard, but 50% is a lot to pay for advertising and logistics. [Editor's note: These are the workshops that Data Community DC runs with our partner DDL.]
- Give a lecture – typically a one time lecture with a modest honorarium ($100s) that may include travel. I've given these kinds of lectures at GMU and Marymount.
- Adjunct at a local university – This is often a very labor- and commute-intensive investment and pays no better (with no benefits) than a few thousand dollars. Georgetown will pay about $200 per contact hour with students. Assuming there are three hours of out of classroom commitment for every hour in class, this probably ends up somewhere in the $50 per hour range. All this said, this was the suggestion of a respected entrepreneur in the DC region.
- Tenure-track position at a local university – As an Assistant Professor, you will typically have to forego being anything but a glorified post-doc until your tenure review. And good luck convincing this crowd they need you enough to hire you with tenure.
*These are what I understand to be the approximate options and if you got a worse or better deal, please understand I might be wrong about these specific figures. I'm not wrong, though, that none of these are “market rate” for an experienced data scientist in the DC MSA.
Currently, all of my teaching happens through hands-on internships and project-based learning at my company, where I know the students (i.e. my colleagues, coworkers, subcontractors and partners) are motivated and I know they have sufficient resources to succeed (including hardware). When I “teach,” I typically do it for free, and I try hard to avoid organizations that create asymmetrical relationships with their instructors or sell instructor time as their primary “product” at a steep discount to the instructor compensation. Though polemic, Mike Selik summarized the same issue of cut rate data science in "The End of Kaggle." I'd love to hear of a good model where students could really get the three practical prerequisites for Deep Learning and how I could help make that happen here in DC2 short of making “teaching” my primary vocation. If there's a viable model for that out there, please let me know. If you still think you'd like to learn more about Deep Learning through DC2, please help us understand what you'd want out of it and whether you'd be able to bring your own hardware.