Sometimes I find myself having to combine several data features into a single metric. It’s bad practice to combine the features if they are on different scales. You end up capturing differences in their scale rather than differences in the underlying metric. So the best approach in these situations is to center the data (subtract the mean) and scale the data (divide by the standard deviation). This is how you eliminate the scale (it’s also part of the data pre-processing you should do before clustering). Once you’ve done all this you can then combine the features by taking row-wise means (or weighted means if having equal weights is not sufficient for the business problem).

In some cases this type of re-scaling might not be enough. You may want to force a vector of data to be bounded between two variables (e.g. 0 and 1 or -1 and 1). This is the approach that I take in Python.

x = np.random.normal(6, 3, 1000)
new_min = 0
new_max = 1
max = np.max(x)
min = np.min(x)
z = (new_max - new_min) / (max - min) * (x - max) + new_max


Reference: