Holden Karau presented a super fast introduction to PySpark – how to use Python and Spark together when you exceed the limitations of a single machine. Apache Spark is a fast and general engine for distributed computing & big data processing with APIs in Scala, Java, Python, and R. This tutorial will briefly introduce PySpark (the Python API for Spark) with some hands-on-exercises combined with a quick introduction to Spark’s core concepts. We will cover the obligatory wordcount example which comes in with every big-data tutorial, as well as discuss Spark’s unique methods for handling node failure and other relevant internals.
About the speakers:
Holden Karau is transgender Canadian, and an active open source contributor. When not in San Francisco working as a software development engineer at IBM’s Spark Technology Center, Holden talks internationally on Spark and holds office hours at coffee shops at home and abroad. Holden is a co-author of numerous books on Spark including High Performance Spark (which she believes is the gift of the season for those with expense accounts) & Learning Spark. She is a Spark committer and makes frequent contributions, specializing in PySpark and Machine Learning. Prior to IBM she worked on a variety of distributed, search, and classification problems at Alpine, Databricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a Bachelor of Mathematics in Computer Science. Outside of software she enjoys playing with fire, welding, scooters, poutine, and dancing.
Michael Li founded The Data Incubator, a New York-based training program that turns talented PhDs from academia into workplace-ready data scientists and quants. The program is free to Fellows, employers engage with the Incubator as hiring partners. Previously, he worked as a data scientist (Foursquare), Wall Street quant (D.E. Shaw, J.P. Morgan), and a rocket scientist (NASA). He completed his PhD at Princeton as a Hertz fellow and read Part III Maths at Cambridge as a Marshall Scholar. At Foursquare, Michael discovered that his favorite part of the job was teaching and mentoring smart people about data science. He decided to build a startup to focus on what he really loves. Michael lives in New York, where he enjoys the Opera, rock climbing, and attending geeky data science events.
Visit our website to learn more about our offerings:
- Data Science Fellowship – a free, full-time, eight-week bootcamp program for PhD and master’s graduates looking to get hired as professional Data Scientists in New York City, Washington DC, San Francisco, and Boston.
- Hiring Data Scientists
- Corporate data science training
- Online data science courses: introductory part-time bootcamps – taught by our expert Data Scientists in residence, and based on our Fellowship curriculum – for busy professionals to boost their data science skills in their spare time.