Thread Reader
Vin Vashishta

Vin Vashishta
@v_vashishta

Sep 23, 2022
12 tweets
Twitter

What does a data science experiment look like and how do you design the first round of experiments? 1/12 #DataScience #MachineLearning #DeepLearning

Data Science experiments start with a question. How does the business keep customers subscribed or maintain their current level of spending? Notice this is not the standard definition of churn and there’s a good reason for that. 2/12 #DataScience #MachineLearning #DeepLearning
Take existing data, train models, and compare them (multiple single models and ensembles) to the EXISTING customer retention process’s performance. This is your first experiment! Why haven’t we used a test dataset? 3/12 #DataScience #MachineLearning #DeepLearning
We have an oracle! The business has looked at existing data and built, then improved, a set of heuristic based interventions. People heuristics > early models when it comes to learning a function. 4/12 #DataScience #MachineLearning #DeepLearning
In the experiment, I shadow run models against the current heuristics. When the heuristics fail, but one or more models recommended a different intervention, I now have a new avenue for exploration. 5/12 #DataScience #MachineLearning #DeepLearning
I can run a second round of experiments to see if my models’ intervention would be more successful than the heuristics. This is where more complex experimental methods come into play. 6/12 #DataScience #MachineLearning #DeepLearning
Has the model learned anything? I need to build a new experiment to find out. The first part of that process is figuring out what conditions caused the heuristic to fail. Why? 7/12 #DataScience #MachineLearning #DeepLearning
I need to know when to substitute the model’s intervention for the heuristic’s. To do that, I must clearly define the conditions of the experiment. Poorly defined experiments produce unreliable results. 8/12 #DataScience #MachineLearning #DeepLearning
I start by exploring the problem space to find features that are consistent across heuristics failure scenarios. Sometimes I get lucky, and I can build a model to predict heuristic failure. 9/12 #DataScience #MachineLearning #DeepLearning
Most times the reason our heuristics fail is because we don’t have enough data about the problem space to come up with an intervention. Model and heuristic failures will seem random based on available data. What’s going on? 10/12 #DataScience #MachineLearning #DeepLearning
We are missing features. I need to work with a Data Engineer. I lean on the data catalog and their ability to curate novel datasets. With a few cycles, we can usually find what’s missing and build a decent failure model. 11/12 #DataScience #MachineLearning #DeepLearning
I retrain my initial models with the new feature(s), improving their performance and giving me more options to explore further. The iterative process delivers a clearer understanding of churn and a reliable model. 12/12 #DataScience #MachineLearning #DeepLearning
Vin Vashishta

Vin Vashishta

@v_vashishta
AI influencer since 2014. I teach strategy to data leaders & value-centric data science to ICs. Technical Strategy Advisor, preparing SMEs to profit from data.
Follow on Twitter
Missing some tweets in this thread? Or failed to load images or videos? You can try to .