Thread Reader

Vin Vashishta



12 tweets

What does a data science experiment look like and how do you design the first round of experiments? 1/12 #DataScience #MachineLearning #DeepLearning

Data Science experiments start with a question. How does the business keep customers subscribed or maintain their current level of spending? Notice this is not the standard definition of churn and there’s a good reason for that. 2/12 #DataScience #MachineLearning #DeepLearning

Take existing data, train models, and compare them (multiple single models and ensembles) to the EXISTING customer retention process’s performance. This is your first experiment! Why haven’t we used a test dataset? 3/12 #DataScience #MachineLearning #DeepLearning

We have an oracle! The business has looked at existing data and built, then improved, a set of heuristic based interventions. People heuristics > early models when it comes to learning a function. 4/12 #DataScience #MachineLearning #DeepLearning

In the experiment, I shadow run models against the current heuristics. When the heuristics fail, but one or more models recommended a different intervention, I now have a new avenue for exploration. 5/12 #DataScience #MachineLearning #DeepLearning

I can run a second round of experiments to see if my models’ intervention would be more successful than the heuristics. This is where more complex experimental methods come into play. 6/12 #DataScience #MachineLearning #DeepLearning

Has the model learned anything? I need to build a new experiment to find out. The first part of that process is figuring out what conditions caused the heuristic to fail. Why? 7/12 #DataScience #MachineLearning #DeepLearning

I need to know when to substitute the model’s intervention for the heuristic’s. To do that, I must clearly define the conditions of the experiment. Poorly defined experiments produce unreliable results. 8/12 #DataScience #MachineLearning #DeepLearning

I start by exploring the problem space to find features that are consistent across heuristics failure scenarios. Sometimes I get lucky, and I can build a model to predict heuristic failure. 9/12 #DataScience #MachineLearning #DeepLearning

Most times the reason our heuristics fail is because we don’t have enough data about the problem space to come up with an intervention. Model and heuristic failures will seem random based on available data. What’s going on? 10/12 #DataScience #MachineLearning #DeepLearning

We are missing features. I need to work with a Data Engineer. I lean on the data catalog and their ability to curate novel datasets. With a few cycles, we can usually find what’s missing and build a decent failure model. 11/12 #DataScience #MachineLearning #DeepLearning

I retrain my initial models with the new feature(s), improving their performance and giving me more options to explore further. The iterative process delivers a clearer understanding of churn and a reliable model. 12/12 #DataScience #MachineLearning #DeepLearning

Vin Vashishta


AI influencer since 2014. I teach strategy to data leaders & value-centric data science to ICs. Technical Strategy Advisor, preparing SMEs to profit from data.

Follow on Twitter

Missing some tweets in this thread? Or failed to load images or videos? You can try to .