A recipe for relevant predictions

predictions maintenance predictive .jpg

Imagine you desperately need to improve the trustworthiness of your company. Let’s say you work for Backblaze, an online data storage service, and you have batches of hard drive disk data in your fridge. Let’s not waste them, but rather let’s make a list of the 50 most at risk disks, in order that they can be carefully watched by the technicians, and if necessary, be replaced before a failure occurs. Shall we start cooking the data?


Makes: 2 models

Prep time: 20 minutes

Cooking time: 2 x 30 seconds


  • CSV files from the first two quarters


  • 1 instance of PredicSis.ai
  • Your favourite data prep tool (python, R, even bash…)


  • Pick all cases of failure from the CSV files from Q1 and Q2, gather their serial number, model, capacity in bytes, date and outcome (failure in this case)
  • Pick randomly, from the Q1 and Q2 CSV files, cases of non failure; gather their serial number, model, capacity in bytes, date and outcome (non failure in this case)
  • Mix all the selected cases, by concatenating them into a CSV file that we will call Central
  • Next concatenate all data from the previous 30 days (including the S.M.A.R.T. indicators) into a table that we call Peripheral, which will be used as an historic view of the state of the disks
  • Upload the 2 files into PredicSis.ai, and bake for a little less than 30 seconds
  • Request aggregates (100), and cook again for about 30 seconds in PredicSis.ai

You can now enjoy a great model, explore the feature report and predict the probability of failure for each disk on a daily basis.

My personal cooking tip: sort the prediction file by failure score, and keep an eye on the first entries of the list, which are the most likely to fail.

list predictions predicsis.ai.gif

If you wish to be inspired by a video of this recipe, follow this link.

So we’ve looked at the main course of predictive maintenance. Now let’s move on to the dessert course: a tasty selection of insights and rules to tickle your taste buds!


Not only a high performance list generator, but also a great rule maker

Experts know their job; they know that “a drive which has had any reallocations at all is significantly more likely to fail in the immediate months” (that’s not me saying that, but Wikipedia). Yes, insights, and therefore knowledge, come from experience, but they can also come from carefully cooking your data, extracted by Automatic Machine Learning algorithms, to the recipe you want. The AutoML tool reads examples of failures, examples of healthy disks, and builds as many rules as requested by the user of PredicSis.ai to detect future failures.


Here’s an example:

The mean of the indicator smart_7_normalized has been calculated over the previous 30 days. This value has then been analysed. Three optimal groups have then been built from the values of the mean of the indicator smart_7_normalized:

  • Dessert 1, the high-fat option: Disks for which this value is under 68.56451613 (very few, but highly at risk)
  • Dessert 2, for those looking for something a little less daring: Disks for which this value is between 68.56451613 and 90.1639785 (more than half the disks, a little more at risk than the average disks)
  • Dessert 3, all of the taste with none of the fat: Disks for which this value is over 90.1639785 (less than half the disks, the safest disks)
model predictions disks.png

Shakespeare challenged: “If you can look into the seeds of time, and say which grain will grow and which will not, speak then unto me.” We have shown above that if you provide PredicSis.ai with the right data ingredients, it can speak to you by building lists of the seeds that will grow in the future, and provide you with early signs that those seeds will grow well.


P.S.: For those of you who are not yet full, have a go at interpreting the values on the following  graph.

predictive model.png

Intrigued? To get a sense of what we do at PredicSis.ai, please visit our demo page