# Models as Actions

2020-12-05

2 minutes

People typically categorize data science and machine learning work (as if there’s a difference -_-) in terms of supervised and unsupervised learning. But to me that’s a little too reductionist. Data science covers so much ground depending on the industry your working in and what your role is. For example, consider the following responsibilities…

- All
**ETL**and feature engineering that has to be done to**transform**data. **Optimizing**loss functions to ensure predictive accuracy in supervised/unsupervised learning, deep learning, etc.**Optimizing**an objective function or solving systems of equations (e.g. operations research work).**Exploring**parameter or predictive distributions as you would in Bayesian statistics, probabilistic modeling, reliability engineering, etc.- Stepping into
**engineering**roles involving connecting infrastructure, creating data pipelines, developing libraries, and enforcing standards/best-practices. **Visualizing**data and results from the above responsibilities.

From my experience, it’s better to categorize data science work in terms of actions (identified above in bold). So instead of defining models as unsupervised/supervised models you have optimizing loss/objective functions and exploring ****distributions associated with the data generation process. So you can categorize these problems as optimization problems. This allows you to separate models from ETL and metric creation work which would be categorized as data transformation problems.

It’s a nicer way to classify the responsibilities of data science and machine learning work, all of which often get lumped into terminology like “models”. So rather than overloading the term “model”, this interpretation directly relates to what a data scientist is doing. Terms like unsupervised and supervised seem irrelevant to me since it’s usually pretty obvious if the data scientist is using an outcome variable or not.

To aggressively abstract this I guess what I’m saying is to *classify models based on what you’re doing or the algorithm you’re using rather than the data itself*.

297 Words