Everyone knows the saying "no risk, no fun" or "he who doesn't take risks, has nothing". In production, business, and life – risk is always present, and with it comes opportunity.
In the broadly understood "Technique" it is similar, meaning risk is an element of everyday life (whether we like it or not). The Maintenance Department naturally manages risk through:
Unfortunately, this "natural risk management" is not always sufficient – especially when it is not done consciously, it can lead to failures.
A failure is the result of decisions (or rather their absence) that led to the materialization of risk. The risk was always there.
The question you should ask yourself is: was the risk visible and acceptable, and were you prepared for this failure?
Consider a scenario with 20 mobile transfer pumps, where only 10 are required in operation and the remaining units are fully available as standby. The switchover time is negligible at five minutes, repair cost is low, safety impact is none, and probability of failure is low as most pumps are mid-life. In this case, the overall risk is very low.
Now consider a different scenario: a single pump dedicated to a specific medium in a pharmaceutical process. A failure requires system opening, loss of a batch produced over two weeks, eight hours of replacement, additional cleaning time, and high repair costs. The probability of failure is high due to overdue overhaul. Here, the risk is very high.
When you see this risk, meaning you are aware of all the surrounding factors, your decision about maintaining these pumps, preventive actions, budget allocation, repair priorities, spare parts in stock, and the necessary competencies of the team is simple and logical. What if this information is missing - you see 2 pumps and nothing else… or you don’t even know you have 2 pumps to maintain because they are not in the CMMS?
Currently, many companies have implemented risk management systems at the business, financial, cybersecurity, health and safety levels, etc., but how many of them show what is actually happening at the level of technical assets? What is the REAL impact on how the assets are maintained?
What does the lack of information at this level lead to? Here are some consequences of a lack of risk awareness starting from the shortest time horizon:
Some of you may think that there are people with knowledge, skills, and experience for that. I agree, people are the foundation of every organization. Unfortunately, foundations are not eternal and erode over time; in the case of people, this erosion is rotation, promotions, limited memory. All this means that risks may not be managed at the level required by the technical system.
Below are some of the most practical tools, sources, and methods worth paying attention to in the context of risk management.
The first (basic) tool is, of course, the Asset Criticality Analysis (ACA/CA) – it simply shows the level of consequences resulting from the failures of individual devices. ACA is not an oracle, but it sheds new light on where, from an objective point of view, the greatest risk lies.
Unfortunately, in many cases, conducting a Criticality Analysis ends with placing a new binder on the shelf and pulling it out for "audit" – in such a situation, one cannot speak of risk management.
Since I have already written quite a bit about Criticality Analysis, I will not elaborate on this method in this article – I invite you to read earlier articles.
The second source of risk information is your CMMS/EAM. This is where you draw data for reports, KPIs, summaries, etc. Particularly noteworthy in this case is the Backlog – a compilation of uncompleted work orders. Both reactive (usually materialized risks) and preventive (unmaterialized risks). Briefly on how to use this information in practice:
Some of you may argue that such an interpretation is general and that additional factors should be considered, e.g., the time since planned execution – agreed, still such an interpretation is better than none.
Analyzing the backlog in conjunction with Criticality Analysis can indicate the most significant risks in current work management and help in deciding where to allocate resources.
Another example of using CMMS data for risk purposes is LEADING-type KPI indicators. Typically, it is common to focus on LAGGING-type indicators – those that show recorded past, e.g., technical availability, average repair time. LEADING-type indicators are based on past data but provide a perspective on the future. The best example is probably the percentage of planned preventive work completed – if this indicator is low or declining, it can be assumed that the risk of failure will increase, and consequently, technical availability will decrease.
In CMMS/EAM, there is much more information about risks, often contained in job order descriptions, failure codes, event recurrence on specific devices or types of devices, and even in the consumption of spare parts. Each of this information is another element of Root Cause Analysis (RCFA), which also aims for DECISION.
The third action providing significant information about operational risk is the Technical Condition Assessment. This assessment aims to aggregate information about the age of the equipment, results of diagnostic actions taken, and other information about aging equipment or infrastructure into one category, e.g., I, II, III (similar to Criticality Analysis). What does the Technical Condition Assessment inform? Above all, it shows which devices have the lowest remaining operational potential, i.e., working time. This facilitates the screening of devices requiring investment or modernization, and along with Criticality Analysis, indicates the risk of loss of function, where Criticality Analysis shows the level of consequence, and Technical Condition Assessment shows the level of probability.
One cannot overlook the full Risk Analysis, which involves identifying risks (undesirable events), assessing them (using the Risk Matrix), addressing mitigating actions, and then re-evaluating the risk after mitigation. Full Risk Analysis in the area of technical asset management deserves a separate article (or a whole series), and I will not discuss it in detail here.
It is worth mentioning that the element of risk analysis is also implemented in many other popular and recognized methods used in maintenance, e.g., RCM (Reliability Centered Maintenance), FRACAS (Failure Reporting and Corrective Action System), RBM/RBI (Risk Based Maintenance/ Risk Based Inspection).
Risk management should not be another report, model, or standard 'from a consultant to the shelf.' Risk awareness should lead to DECISION making both short- and long-term, and only then does it make sense.
Here are some example decisions based on risk in the companies I have worked with:
Each of the above examples ultimately leads to a reduction in failure rates or their impacts, but none will eliminate them all. Moreover, in each of these cases, failure rates may actually increase in some areas! If that happens, it will be the result of conscious decisions, and the cumulative effect should be beneficial for the organization.
A failure resulting from conscious DECISIONS is better than a failure resulting from their absence or ignorance.
If risk analysis exists in your organization but does not translate into planning work and budgeting, that is where the biggest gap lies to be closed.
Criticality Analysis - Part II: How to Implement It Practically?
Operivo Sp. z o.o.
Aleja Jana Pawła II 27
00-867 Warsaw
+48 533 373 200
europe@operivo.com
Copyright @ Operivo 2025
Privacy Policy