Data are becoming the new raw material of business
The Economist


Data Science in 30 Minutes: Holden Karau – A Quick Introduction to PySpark


IBM‘s Holden Karau joined  The Data Incubator in June 2017 and for our free online webinar series, Data Science in 30 minutes – Sign up below for the full video!

Holden Karau presented a super fast introduction to PySpark – how to use Python and Spark together when you exceed the limitations of a single machine. Apache Spark is a fast and general engine for distributed computing & big data processing with APIs in Scala, Java, Python, and R. This tutorial will briefly introduce PySpark (the Python API for Spark) with some hands-on-exercises combined with a quick introduction to Spark’s core concepts. We will cover the obligatory wordcount example which comes in with every big-data tutorial, as well as discuss Spark’s unique methods for handling node failure and other relevant internals.

Continue reading


Data Science in 30 Minutes: Deep Learning to Detect Fake News with Uber ATG Head of Data Science, Mike Tamir

This FREE webinar will take place LIVE online on August 21st at 5:30PM ET. Register below now, space is limited!


Join The Data Incubator and Mike Tamir, Head of Data Science for Uber Advanced Technologies Group, for the August 2018 installment of our free monthly webinar series, Data Science in 30 minutes: Deep Learning to Detect Fake News.

Mike will discuss how he created FakerFact.org, an Artificial Intelligence tool that enables readers to detect when an article is focused on credible information sharing vs. when the focus is on manipulation. We will explore real world use case applications for automated “Fake News” evaluation using contemporary deep learning article vectorization and tagging. We begin with the use case and an evaluation of the appropriate context applications for various deep learning applications in fake news evaluation. We will discuss several methodologies for article vectorization with classification pipelines, ranging from traditional to advanced neural network deep architecture techniques. We close with a discussion on troubleshooting and performance optimization when consolidating and evaluating these various techniques on active data sets.
Continue reading


Data Science in 30 Minutes: The Accidental Data Scientist with Katrina Riehl, Director of Data Science for HomeAway.com

This FREE webinar will take place LIVE online on July 24th at 5:30PM ET. Register below now, space is limited!


Join The Data Incubator and Katrina Riehl, Director of Data Science for HomeAway.com, for the July 2018 installment of our free monthly webinar series, Data Science in 30 minutes: The Accidental Data Scientist.

Katrina will detail the journey her career has taken from researcher and software developer to Data Scientist. She will explain how her technology roles and skills have evolved as this new discipline emerged over the last decade. First, starting out as a young Python and Artificial Intelligence enthusiast and eventually after many years, finally embracing Data Science as a discipline, and leading a strong and diverse Data Science team.
Continue reading


Data Science in 30 Minutes: Why Big Data Needs Thick Data with Tricia Wang


This FREE webinar took place on June 26th, 2018. Sign up below for the free video!

Tricia Wang, co-founder of SuddenCompass joined The Data Incubator for the June 2018 episode of our free online webinar series, Data Science in 30 minutes: Why Big Data Needs Thick Data.

Why do so many companies make bad decisions, even with access to unprecedented amounts of data? Tricia has the answer: companies are implementing “big data” without what she calls the secret, missing ingredient, “thick data” – precious, unquantifiable insights from actual people – to make the right business decisions and thrive in the unknown. Tricia shared stories and lessons from how her company, Sudden Compass, advises and teaches organizations to unlock insights from big data and turn their big data projects from optimizing the bottom-line to driving growth.
Continue reading


Data Science in 30 Minutes: Building Data Science Capabilities That Scale


This FREE webinar took place on May 17th, 2018. Sign up below for the full video!

DataScience.com CSO, William Merchan joined The Data Incubator for the May installment of our free online webinar series, Data Science in 30 minutes: Building Data Science Capabilities That Scale.

Data scientists and machine learning engineers saw the highest job growth of any role last year, yet few companies have successfully turned their aggressive hiring into profitable, scalable data science capabilities. In this session, DataScience.com CSO, William Merchan, shares lessons learned from building a platform that supports collaborative data science for a variety of clients, from startups to Fortune 500 companies. Learn about the technology gaps, roadblocks to innovation and efficiency, and talent retention challenges that have proven to be detrimental to data science success in an enterprise environment — and how to mitigate them.
Continue reading


Data Science in 30 Minutes: Alan Schwarz, Former NYTimes Journalist, on Numbers-Based Journalism

Alan Schwarz, former NY Times journalist joined The Data Incubator for the February 2018 installment of our free online webinar series, Data Science in 30 minutes: Numbers-Based Journalism.

Sign up below to get access to the video of this webinar for free!

Alan Schwarz, former N.Y. Times investigative reporter and Pulitzer finalist, discussed numbers-based journalism that shook industries from the National Football League to Big Pharma. Alan used data analysis to expose the NFL’s cover-up of concussions as well as issues in child psychiatry.
Continue reading


Data Science in 30 Minutes: Kirk Borne – A Fortuitous Career in Data Science



Booz Allen Hamilton’s Kirk Borne joined The Data Incubator in August for our FREE monthly webinar series, Data Science in 30 minutes!

Kirk Borne took us on a journey through his career in science and technology, explaining how the industry – and himself – have evolved over the last 4 decades. Starting with skipping lunches in high school to a systematic twitter obsession, Kirk shed light on his road to success in the data science industry.
Continue reading


Data Science in 30 Minutes: Scikit-Learn with Core-Contributor Andreas Mueller


scikit-learn‘s Andreas Mueller joined The Data Incubator in December 2017 for our FREE monthly webinar series, Data Science in 30 Minutes!

We talked about everything new in 0.19, that got released in July of this year, and what the plans are for 0.20 that will be released early next year. Highlights are the multiple metric grid-search, faster T-SNE and better handling of categorical and mixed data.
Continue reading


Data Science in 30 Minutes: A Conversation with Gregory Piatetsky-Shapiro, President of KDnuggets


KDnuggets’ Gregory Piatetsky-Shapiro, Ph.D  joined The Data Incubator in January for the first 2018 installment of our free online webinar series, Data Science in 30 minutes! Gregory discussed his career – from Data Mining to Data Science and examine current trends in the field.

From Data Mining to Knowledge Discovery to Data Science: Gregory Piatetsky talked about his pioneering career in data science, including founding KDnuggets, and co-founding KDD Conferences and ACM SIGKDD, and examined current trends in the field, Data Science Automation, citizen Data Scientists, and implications of AI.
Continue reading


Data Science in 30 Minutes: Infrastructure for Usable Machine Learning with Spark Creator and Stanford Professor, Matei Zaharia


Databricks co-founder, Matei Zaharia, Ph.D joined The Data Incubator for the April 2018 installment of our FREE monthly webinar series, Data Science in 30 minutes: Infrastructure for Usable Machine Learning.

Despite incredible recent advances in machine learning, building machine learning applications remains prohibitively time-consuming and expensive for all but the best-trained, best-funded engineering teams. This expense usually comes not from a need for new and improved statistical models but instead from a lack of systems and tools for supporting end-to-end machine learning application development, from data preparation and labeling to productionization and monitoring. In the Stanford DAWN project, we are developing a set of tools to make these processes easier, from weak supervision approaches to dramatically reduce the need for labeled data, to query-specific model specialization to reduce serving cost, and end-to-end ML systems that encapsulate a complete task and greatly simplify the interface to the user.

Continue reading