We all want to be future-proof: not just prepared for unforeseen developments but positioned well to take advantage of them. Having a flexible, adaptable, and scalable technology stack is a great way to get to achieve that goal when it comes to being able to leverage data science effectively. Here are five ideas I personally think it’s crucial to keep in mind when building out your own functionality:
1. Your pipeline is only as good as its weakest link.
It’s great that your predictive modelers have come up with a thousand new features to incorporate, but have you asked your data engineers how that will affect the performance of backend queries? What about your data collection and ingestion flow? Maybe your team is frothing at the mouth for an upgrade to Spark Streaming to run their clustering algorithms in real time, but your frontend will lose responsiveness if you try to display the results as fast as they come in. The key here is not to get sucked into the hype of “scaling up” without fully recognizing the implications across your entire organization and what new demands will be placed on all those moving parts. Continue reading