# Makefiles for Workflow

2020-09-12

2 minutes

The make utility is typically used to make it easier maintain source code that needs to be compiled. It also just generally helps streamline any workflow.

A makefile is made up of targets that take that consist of dependencies and commands. Your target is typically a compiled binary, the dependencies are source code, and the commands are the instructions you want to compile your source code into the binary. If you make a modification to any of the dependencies then running `make`

or `make target-name`

will rerun your commands to update the target.

```
target: dependencies
commands
```

We can still use makefiles in situations where we’re not compiling code. Below is a simplified example of how I’m using it in a Data Science like workflow.

We have some Python code that generates data.

```
# generate.py
import pandas as pd
import numpy as np
N = 10
data = pd.DataFrame({'x': np.random.normal(3, 1, N)})
data.to_csv('./data.csv', index=False)
```

And some code that computes the maximum likelihood estimate (MLE) from that data (assuming that the data is normally distributed).

```
# estimate.py
from datetime import datetime
import pandas as pd
import numpy as np
from scipy.optimize import minimize
from scipy.stats import norm
print('Running simulation...')
data = pd.read_csv('data.csv')
x = data['x']
# MLE (minimization problem turned into a maximization problem)
def objective_function(params, x):
log_likelihood = 0.0
for value in x:
log_likelihood += np.log(norm.pdf(value, params[0], params[1]))
return(-log_likelihood)
bnds = ((None, None), (0.5, None))
result = minimize(fun=objective_function, x0=[0.5,0.5], bounds=bnds, args=(x))
result = dict(result)
with open('./mle_result.txt', "w") as f:
f.write('Timestamp: {0}\n'.format(datetime.now()))
for key, value in result.items():
f.write('{0}: {1}\n'.format(key, value))
print('Data: mean [{0}], sd [{1}]'.format(np.mean(x), np.std(x)))
print('MLE: mean [{0}], sd [{1}]'.format(result['x'][0], result['x'][1]))
print('... complete')
```

In this case the workflow might involve,

- Simulating the model (generating data and finding the MLE)
- Querying the data that we generated.
- Cleaning up all the files (i.e. the output from the python scripts).

We can wrap all of this up in a makefile.

```
# Makefile
simulate:
@python generate.py
@python estimate.py
preview:
@head -n "$(nrow)" data.csv
clean:
@rm ./data.csv
@rm ./mle_result.txt
```

So running `make simulate`

will run the python scripts and create the data and MLE result. Running `make preview nrow=6`

will print the first 6 rows of the data file. And running `make clean`

will remove the data file and MLE result. (The usage of `@`

is specific to the make utility and prevents the command from being echoed to the standard output.)

396 Words