Data Community DC and District Data Labs are hosting a full-day Data Acquisition and Wrangling with Python workshop on Saturday May 9th. More info and registration can be found here. Register before April 25th for an early bird discount!
Eighty percent or more of the time spent on data science projects is spent acquiring data, cleaning it, and preparing it for analysis. That data can come from a variety of sources, including APIs or individual web pages. However, not all data is created equal. Once we have automated its acquisition, much of it requires lengthy cleaning and formatting before it can be used. In this course, you will learn how to obtain data via web scraping and APIs, how to clean and consolidate your data, and how to wrangle it into a database so that it is ready for analysis.
Our focus will be on achieving two goals:
- Understanding more about your customers from their social profiles.
- Pulling data off the web (screen scraping) for market research and getting it into a database.
In part one, you will learn how to use Python to pull data from the Meetup, LinkedIn and Twitter APIs, handle JSON data, and create a single profile from all the data sources.
In part two, you will learn how to build a web scraper with Scrapy (a Python web scraper framework), use xpath to find and extract elements on the page, and do it all while not getting blocked by a website.
What You Will Learn
This course will teach you how to organize your data and handle incoming data streams so that you can transform them into useful information and revenue. You will learn how to combine multiple data sources into a single profile and gather information from several websites. Whether you’re analyzing your customers from social media or looking at client data, you’ll be able to apply these methods to maximize your bottom line.
Part One: Understanding your customers
- How understanding customers and employees improves the bottom line of a business.
- Overview of Python
- Types and Examples of APIs
- Pulling data from APIs with Python
- Organizing the data we just received
- Aggregating data from different sources into a single profile
- Running a basic analysis and creating simple visualizations of the data
Part Two: Acquiring and storing web data
- Why we need to scrape the web
- Rules of the road for web scraping
- How to build a web scraper with Python and Scrapy
- Using XPath to find and extract data from a web page
- Safely and securely scraping the web
- Formatting the data for analysis
By the end of this class, you will be able to collect data from multiple web sources such as social media and other sites, clean it, and organize the data into a single store where you'll be able to conduct further analysis.
Instructor: Robert Dempsey
Robert Dempsey is a tested leader and technology professional delivering solutions and products to solve tough business challenges. His experience forming and leading agile teams combined with more than 14 years of technology experience enables him to solve complex problems while always keeping the bottom line in mind. He’s founded and built three startups in tech and marketing, developed and sold online applications, consulted to Fortune 500 and Inc. 500 companies, and spoken nationally and internationally on software development and agile project management.