Data are becoming the new raw material of business
The Economist

The Benefits of Active Learning for Data Science Skills

Since the late 1970s, educators have promoted the adoption of active learning principles in teaching practices. Active learning is a method that involves students directly in the learning process. This contrasts with traditional learning methods, like lectures, where students passively receive information without taking measures to engage with the material and ensure they have sufficiently understood it. Active learning involves getting students to do activities and to think about the purpose behind these activities[1]. At The Data Incubator, we believe that active learning is the best way to approach data science training and education, and we’ve built our curricula based on these concepts. Below we discuss the benefits of active learning and how we’ve employed active learning methods in our data science training programs.

In the seminal text “A Taxonomy of Educational Objectives”[2], Bloom defines the six learning objectives of cognitive domain, the area of mental skill acquisition.[3] They are:

Effective learning in the cognitive domain is achieved by activating all of these objectives. Everyone learns differently; you often hear people state “I’m a visual learner” or “I’m an auditory learner”. The problem with this thinking is that it over simplifies how people learn into only two categories. Additionally, it adheres to the traditional passive learning techniques that rely on audio and visual cues only. With regards to the six objectives listed above, passive learning only addresses the “low order” thinking skills, remembering and understanding,without activating the “high order” thinking skills . During a one or two hour lecture, there’s no opportunity for students to effectively apply and analyze the information – let alone to evaluate or create something from the new information. While students may do so on their own time after lecture, the opportunity is lost to solidify the concepts when information has been most recently seen.

Critics of active learning often decry it as just another fad. However, numerous research studies have refuted this claim. A review of active learning studies found support for various forms of active learning[4]. Given the various studies analyzed in the review, the author suggests that introducing activities during lecture and promoting student engagement will improve learning outcomes.

The success of active learning has led institutions of higher learning to implement active learning principles. For example, MIT has replaced their traditional passive learning introductory physics classes with what they refer to as TEAL, Technology-Enabled Active Learning. These changes were prompted by low lecture attendance and high failure rate in the previous traditional lecture style courses. A study on TEAL performance reveals improvements in conceptual understanding, class attendance, and passing rate[5]. The study shows the failure rate dropped from 13% to 5% and lecture attendance increased from 50% to 80%, compared to a control group.


Active Learning at The Data Incubator

We understand the benefits of active learning, and we built our curricula based on the evidence that active learning supports better outcomes for students than passive learning. Our data science training programs include various features that promote active learning.

Interactive Lectures: Lectures are presented via an interactive learning environment, where students can follow along and interact with the material on their own device. Students are encouraged to experiment with the variables in real time during lectures, to see how they affect results. Additionally, we demonstrate concepts using interactive figures and plots, allowing students to study the effect of changing parameters. One example lecture activity would be to visualize the effect on performance of a machine learning model by adjusting a hyperparameter. Students can engage with the visualization and confirm the effect we’re discussing in the lecture. Students are no longer merely remembering a fact, they’re analyzing and applying the concepts to actively engage in the learning process.

Flexible Format: Additionally, we avoid long lecture formats to encourage active learning- for this reason, a typical day of data science training will involve frequent breaks from lecture. During these breaks, students work on small exercises that reinforce the concepts that was just discussed. Breaks from lecture are important because people have limited attention spans. Additionally, they ensure students have a chance to employ “high order” thinking skills to essential concepts before moving on to more advanced material. If there’s not enough time to apply and analyze the material, students will not be able to effectively learn new material presented.

Real-world Miniprojects: We include a miniproject as part of each teaching module we create. Miniprojects help students to meet all of the learning objectives outlined by Bloom’s taxonomy by having students start applying the information they’ve just learned on a real-world problem, using real-world data. Students are challenged go beyond remembering and understanding material, to exercise “high order” thinking skills by evaluating lecture material against practical examples and creating solutions with hands-on practice. For example, students will evaluate different machine learning models to determine not only which approach would be best for a given application, but also what makes it better than other models in that particular instance.

Group Learning: Students are encouraged to work in groups; not only does this prevent students from falling behind (by externalizing accountability and encouraging collaboration), it enables them to exercise the “high order” thinking skills required to meet those learning objectives. Peer-to-peer engagement helps build confidence in students by creating more opportunities for reinforcing the course material. When reviewing or explaining a piece of information to a fellow student, that student is engaged in applying, analyzing, evaluating, and creating information based on the course material.

Active learning has been extensively explored and advocated by teaching experts because of the vast amount of benefits it realizes over passive learning. It helps to maintain student concentration and deepens learning towards the “high order” thinking skills. It also helps to engage students who might otherwise struggle. Active learning is the guiding principle behind the creation of all of the data science training curricula at The Data Incubator because of these proven benefits. Data science is not a spectator sport – it requires engagement with the material to master data science skills.



1.) Bonwell, Charles C., and James A. Eison. (1991). Active Learning: Creating Excitement in the Classroom. ASHE-ERIC Higher Education Report No. 1. Washington D.C.: The George Washington University, School of Education and Human Development.
2.) Bloom, Benjamin Samuel. (1956). Taxonomy of educational objectives: The classification of educational goals. Handbook I: Cognitive domain. New York: David McKay Company.
3.) Anderson, Lorin W.; Krathwohl, David R., eds. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. Allyn and Bacon.
4.) Prince, Michael. (2004). Does Active Learning Work? A Review of the Research. Journal of Engineering Education. 93. 223-231.
5.) Dori, Yehudit & Belcher, John. (2005). How Does Technology-Enabled Active Learning Affect Undergraduate Students’ Understanding of Electromagnetism Concepts?. Journal of the Learning Sciences. 14(2), 243-279.


Visit our website to learn more about our offerings:


Here’s How to Survive the Rise of A.I. – Become a Data Facilitator

Front office jobs at investment banks are increasingly being taken over by intelligent machines. Many current front office employees are worried about being displaced by artificial intelligence, and their fears are not unfounded. Huy Nguyen Trieu, former head of macro structuring at Citigroup, has a positive message for traders who risk being replaced by automation: become a data facilitator.


Huy Nguyen Trieu left Citi in 2016. After 13 years in financial engineering at SocGen and RBS, before becoming an Managing Director and head of macro structuring at Citi, he has shifted his focus to acting as a thought leader in the fintech space. Currently a fintech fellow at London’s Imperial College and mentor at fintech accelerator, Level 39, Nguyen Trieu is both fintech guru and entrepreneur. And his current focus is centered on the issue of long term employability in investment banks.

Continue reading

Announcement – The Data Incubator Partnership with MRI Network

Today, we’re excited to announce …

The Data Incubator recently teamed up with MRI Network to increase its access to hiring partnerships worldwide. MRINetwork is comprised of over 1,500 search professionals who specialize in hundreds of industries, many of who came from the industries in which they now recruit. MRI recruiters combine their in depth understanding of industry; with the knowledge of who is who in almost any discipline in order to jump start the search for an enterprise’s next impact player or an entire division.

The addition of MRI Network, and its network of existing clients, will add thousands of hiring partners on top of TDI’s existing 300+ hiring partnerships. As the need for data scientists has increased exponentially over the past few years, MRI provides TDI students with immediate access to new data science positions in geographies worldwide, as well as greater access to companies with a fundamental need for the data science talent required to harness the power of their data.

TDI’s partnership with MRI represents a key inflection point in the already rapid growth of our company. TDI will be increasing the size of its fellowship program, as well as its online programs, to ensure that we can keep up with the needs of one of the world’s premier staffing solutions provider in MRI. Placement services will now be open to those participating in our Data Science Foundations and Machine Learning online programs. If you are interested in a career in data science, please apply for either for one of our programs today.

Visit our website to learn more about our offerings:


Data Scientist Salaries

money-548948_960_720At The Data Incubator we’ve worked with hundreds of Fellows looking to enter industry and our alumni work at companies including LinkedIn, Palantir, Amazon, Capital One, and the NYTimes.  

Starting salary is one of the most common concerns for professionals entering any field, but as we’ve only been using the job title “Data Scientist” for about eight years it can be particularly challenging for prospective data scientists to find good information on their job market. LinkedIn and Facebook were the first to give employees on their data teams the title of data scientist, but now there are thousands of data scientists working across all industries alongside data engineers, data analysts, and quantitative analysts.

Continue reading

4 Data Science Projects That We Can’t Get Enough Of

LI3Y5U376XAt The Data Incubator we run a free advanced 8-week fellowship for PhDs looking to enter the industry as data scientists.  

As part of the application process, we ask potential fellows to propose and begin working on a data science project to highlight their skills to employers.  Regardless of whether you’re selected to be a fellow, this project will be instrumental in attracting employer interest and highlighting your skills.  Here are some projects that we would love to see, and that we hope to see you take on as well.


Multi-Axial Political Analysis  

We often think of American politics in terms of a single axis: left versus right, democrat versus republican.  In reality, the parties are composed of varying factions with different identities and political priorities and American politics is actually broken along multiple axes: foreign policy, social issues, regulation, social spending, education, second amendment, just to name a few.  Continue reading

JUST Capital and The Data Incubator Challenge

Data Science For Social Good (1)


Today, we’re excited to announce that we’re teaming up with JUST Capital to help crowd-source data science for social good.  The Data Incubator offers a free eight-week data science fellowship for those with a PhD or a masters degree looking to transition into data science.  As a part of the application process, students are asked to submit a data science capstone project and the best students are invited to work on them during the fellowship.  JUST Capital is helping providing data and project prompts to harness the collective brainpower amongst The Data Incubator fellows to solve these high-impact social problems.

  • These projects focus on applied data science techniques with tangible impacts on JUST Capital’s mission.
  • The projects are open ended and creativity is encouraged. The documents provided, below, are suitable for analysis, but one should not shy in seeking out additional sources of data.

JUST Capital is a nonprofit that provides information and rankings on how large corporations perform on issues that matter most to the public. We give individuals a voice on what really matters to them, and evaluate how companies perform on those issues. By providing the right knowledge and making it easy to access and understand, we believe capital will flow to corporations that are more JUST, ultimately leading to a balanced business world that takes into account human needs that are so often neglected today. The meaning of JUST is defined by the American public as fair, equitable and balanced. In 2016, JUST Capital surveyed nearly 4,000 Americans from all regions and walks of life, in its second annual Poll on Corporate America. The issues identified by the public form the basis of our benchmark — it is against these Drivers and Components that we measure corporate performance. The most important factors broadly relate to employees, customers, company leadership, the environment, communities and investors.

Continue reading

What is the probability of winning the Hamilton lottery?

roll-the-dice-1502706_960_720People interested in seeing the Broadway musical Hamilton — and there are still many of them, with demand driving starting ticket prices to $\$600$ — can enter Broadway Direct’s daily lottery. Winners can receive up to 2 tickets (out of 21 available tickets) for a total of $\$10$.

What’s the probability of winning?

How easy is it to win these coveted tickets? Members of NYC’s Data Incubator Team have collectively tried and failed 120 times. Given our data, we cannot simply divide the number of successes by the number of trials to calculate our chances of winning — we would get zero (and the odds, which are apparently small, are clearly non-zero).

This kind of situation often comes up under many guises in business and big data, and because we are a data science corporate training company, we decided to use statistics to determine the answer. Say you are measuring the click-through-rate of a piece of organic or paid content, and out of 100 impressions, you have not observed any clicks. The measured CTR is zero but the true CTR is likely not zero. Alternatively, suppose you are measuring the rate of adverse side effects of a new drug. You have tested 40 patients and haven’t found any, but you know the chance is unlikely to be zero. So what are the odds of observing a click or a side effect?  Continue reading

The Iraq War by the Numbers: Extracting the Conflicts’ Staggering Costs

624246174001_5021696153001_5021209282001-vsOne of our fellows recently had a piece published about her very unique and timely capstone project. The original piece is posted on Data Driven Journalism

In her own words:

This war is not only important due to its staggering costs (both human and financial) but also on account of its publicly available and well-documented daily records from 2004 to 2010.

These documents provide a very high spatial and temporal resolution view of the conflict. For example, I extracted from these government memos the number of violent events per day in each county. Then, using latent factor analysis techniques, e.g. non-negative matrix factorization, I was able to cluster the top three principal war zones. Interestingly these principal conflict zones were areas populated by the three main ethno-religious groups in Iraq.

You can watch her explain it herself:


Editor’s Note: The Data Incubator is a data science education company.  We offer a free eight-week fellowship helping candidates with PhDs and masters degrees enter data science careers.  Companies can hire talented data scientists or enroll employees in our data science corporate training.

Polling and big data in the age of Trump, Brexit, and the Colombian Referendum

Our founder, Michael Li, recently collaborated with his colleague Raymond Perkins, a researcher and PhD candidate at Princeton University, on this piece about big data and polling. You can find the original article at Data Driven Journalism.

globe-1015311_960_720The recent presidential inauguration and the notably momentous election that preceded it has brought about numerous discussions surrounding the accuracy of polling and big data. The US election results paired with those of Brexit, and the Colombian Referendum have left a number of people scratching their heads in confusion. Statisticians, however understand the multitude of sampling biases and statistical errors than can ensue when your data is involving human beings.

“Though big data has the potential to virtually eliminate statistical error, it unfortunately provides no protection against sampling bias and, as we’ve seen, may even compound the problem. This is not to say big data has no place in modern polling, in fact it may provide alternative means to predict election results. However, as we move forward we must consider the limitations of big data and our overconfidence in it as a polling panacea.”

At The Data Incubator, this central misconception about big data is one of the core lessons we try to impart on our students. Apply to be a Fellow today!


Editor’s Note: The Data Incubator is a data science education company.  We offer a free eight-week fellowship helping candidates with PhDs and masters degrees enter data science careers.  Companies can hire talented data scientists or enroll employees in our data science corporate training.

How Employers Judge Data Science Projects

mark-516277_960_720One of the more commonly used screening devices for data science is the portfolio project.  Applicants apply with a project that they have showcasing a piece of data science that they’ve accomplished.  At The Data Incubator, we run a free eight week fellowship helping train and transition people with masters and PhD degrees for careers in data science.  One of the key components of the program is completing a capstone data science project to present to our (hundreds of) hiring employers.  In fact, a major part of the fellowship application process is proposing that very capstone project, with many successful candidates having projects that are substantially far along if not nearly completed.  Based on conversations with partners, here’s our sense of priorities for what makes a good project, ranked roughly in order of importance: 

  1. Completion: While their potential is important, projects are assessed primarily based on the success of analysis performed rather than the promise of future work.  Working in any industry is about getting things done quickly, not perfectly, and projects with many gaps, “I wish I had time for”, or “ future steps” suggests the applicant may not be able to get things done at work.
  2. Practicality: High-impact problems of general interest are more interesting than theoretical discussions on academic research problems. If you solve the problem, will anyone care? Identifying interesting problems is half the challenge, especially for candidates leaving academia who must disprove an inherent “academic” bias.
  3. Creativity: Employers are looking for creative, original thinkers who can identify either (1) new datasets or (2) find novel questions to ask about a dataset. Employers do not want to see the tenth generic presentation on Citibike (or Chicago Crime, Yelp Restaurant Ratings data, NYC Restaurant Inspection DataNYC Taxi, BTS Flight Delay, Amazon Review, Zillow home price, World Bank or other macroeconomic data, or beating the stock market) data. Similarly, projects that explain a non-obvious thesis supported by concise plots are more compelling than ones that present obvious conclusions (e.g. “more riders use Citibike during the day than at night”). Employers are looking for data scientists who can find trends in the data that they don’t already know. Continue reading