Data Community DC and District Data Labs are hosting a Web Scraping with Python workshop on Saturday January 30th from 9am - 5pm. Register before January 16th for an early bird discount!
Perhaps the data you need to complete your analysis isn’t available from an API. Perhaps you’re writing a story on a company that is trying to hide information they are supposed to make public. Or perhaps you would like to augment your existing customer database to better understand your customers. In all of these cases, web scraping can help.
Web scraping is a method of extracting information from websites. We use web scraping to transform unstructured web data, typically in HTML format, into structured data we can store and analyze. Python provides a number of powerful tools to quickly create custom web scrapers.
WHAT YOU WILL LEARN
Gathering data from online sources is a critical tool in the toolbox of every data engineer, researcher, reporter, and analyst. The purpose of this workshop is to provide you with a solid understanding of web scraping concepts, and equip you with the tools and knowledge to create your own custom web scrapers using Python.
The workshop will cover the following:
- An overview of:
- Basic HTML
- The Document Object Model (DOM)
- CSS Selectors and XPath
- Using CSS Selectors and XPath to identify the web page elements you want to extract
- Creating a web scraper to extract tabular data
- Creating a web scraper to extract complex data
- Looping your web scraper to extract data from multiple web pages
- Methods for safe web scraping
By the end of this workshop you will be able to build and run your own custom web scrapers using Python.
For more info and registration, see the DDL course page.