Home Page >> Blog >> Technical risk management: failure does not come from nowhere

Technical risk management: failure does not come from nowhere

27 April 2026

Everyone knows the saying "no risk, no fun" or "he who doesn't take risks, has nothing". In production, business, and life – risk is always present, and with it comes opportunity.

In the broadly understood "Technique" it is similar, meaning risk is an element of everyday life (whether we like it or not). The Maintenance Department naturally manages risk through:

repairing failures (eliminating the technical effects of materialized risks),
preventive and predictive actions (reducing the probability of occurrence),
upgrades, e.g., introducing redundancy (reducing the consequences of failures),
maintenance logistics – warehouses, inventories, etc. (reducing repair time).

Unfortunately, this "natural risk management" is not always sufficient – especially when it is not done consciously, it can lead to failures.

A failure is the result of decisions (or rather their absence) that led to the materialization of risk. The risk was always there.

The question you should ask yourself is: was the risk visible and acceptable, and were you prepared for this failure?

When is a failure a problem?

Consider a scenario with 20 mobile transfer pumps, where only 10 are required in operation and the remaining units are fully available as standby. The switchover time is negligible at five minutes, repair cost is low, safety impact is none, and probability of failure is low as most pumps are mid-life. In this case, the overall risk is very low.

Now consider a different scenario: a single pump dedicated to a specific medium in a pharmaceutical process. A failure requires system opening, loss of a batch produced over two weeks, eight hours of replacement, additional cleaning time, and high repair costs. The probability of failure is high due to overdue overhaul. Here, the risk is very high.

When you see this risk, meaning you are aware of all the surrounding factors, your decision about maintaining these pumps, preventive actions, budget allocation, repair priorities, spare parts in stock, and the necessary competencies of the team is simple and logical. What if this information is missing - you see 2 pumps and nothing else… or you don’t even know you have 2 pumps to maintain because they are not in the CMMS?

Currently, many companies have implemented risk management systems at the business, financial, cybersecurity, health and safety levels, etc., but how many of them show what is actually happening at the level of technical assets? What is the REAL impact on how the assets are maintained?

What does the lack of information at this level lead to? Here are some consequences of a lack of risk awareness starting from the shortest time horizon:

Serious failures of critical machines.
"Burning" the time of the Maintenance Team on low-priority tasks.
Lack of spare parts.
Expenditures on prevention not aligned with technical availability.
Investments that do not meet real needs.

Some of you may think that there are people with knowledge, skills, and experience for that. I agree, people are the foundation of every organization. Unfortunately, foundations are not eternal and erode over time; in the case of people, this erosion is rotation, promotions, limited memory. All this means that risks may not be managed at the level required by the technical system.

Where to obtain information about risks in Maintenance?

Below are some of the most practical tools, sources, and methods worth paying attention to in the context of risk management.

Criticality Analysis

The first (basic) tool is, of course, the Asset Criticality Analysis (ACA/CA) – it simply shows the level of consequences resulting from the failures of individual devices. ACA is not an oracle, but it sheds new light on where, from an objective point of view, the greatest risk lies.

Unfortunately, in many cases, conducting a Criticality Analysis ends with placing a new binder on the shelf and pulling it out for "audit" – in such a situation, one cannot speak of risk management.

Since I have already written quite a bit about Criticality Analysis, I will not elaborate on this method in this article – I invite you to read earlier articles.

CMMS - data source

The second source of risk information is your CMMS/EAM. This is where you draw data for reports, KPIs, summaries, etc. Particularly noteworthy in this case is the Backlog – a compilation of uncompleted work orders. Both reactive (usually materialized risks) and preventive (unmaterialized risks). Briefly on how to use this information in practice:

Reactive backlog of faults (not failures!) indicates which machines have the highest probability of failure.
Predictive backlog – not always visible, may be related to prevention – shows where planned and unfulfilled repair work based on technical condition (e.g., vibration measurements, temperature, etc.) has been scheduled.
Preventive backlog (e.g., unfulfilled inspections) indicates where the probability of failure is increasing, but we may not be aware of it.

Some of you may argue that such an interpretation is general and that additional factors should be considered, e.g., the time since planned execution – agreed, still such an interpretation is better than none.

Analyzing the backlog in conjunction with Criticality Analysis can indicate the most significant risks in current work management and help in deciding where to allocate resources.

Another example of using CMMS data for risk purposes is LEADING-type KPI indicators. Typically, it is common to focus on LAGGING-type indicators – those that show recorded past, e.g., technical availability, average repair time. LEADING-type indicators are based on past data but provide a perspective on the future. The best example is probably the percentage of planned preventive work completed – if this indicator is low or declining, it can be assumed that the risk of failure will increase, and consequently, technical availability will decrease.

In CMMS/EAM, there is much more information about risks, often contained in job order descriptions, failure codes, event recurrence on specific devices or types of devices, and even in the consumption of spare parts. Each of this information is another element of Root Cause Analysis (RCFA), which also aims for DECISION.

Technical Condition Assessment

The third action providing significant information about operational risk is the Technical Condition Assessment. This assessment aims to aggregate information about the age of the equipment, results of diagnostic actions taken, and other information about aging equipment or infrastructure into one category, e.g., I, II, III (similar to Criticality Analysis). What does the Technical Condition Assessment inform? Above all, it shows which devices have the lowest remaining operational potential, i.e., working time. This facilitates the screening of devices requiring investment or modernization, and along with Criticality Analysis, indicates the risk of loss of function, where Criticality Analysis shows the level of consequence, and Technical Condition Assessment shows the level of probability.

Full Risk Analysis

One cannot overlook the full Risk Analysis, which involves identifying risks (undesirable events), assessing them (using the Risk Matrix), addressing mitigating actions, and then re-evaluating the risk after mitigation. Full Risk Analysis in the area of technical asset management deserves a separate article (or a whole series), and I will not discuss it in detail here.

It is worth mentioning that the element of risk analysis is also implemented in many other popular and recognized methods used in maintenance, e.g., RCM (Reliability Centered Maintenance), FRACAS (Failure Reporting and Corrective Action System), RBM/RBI (Risk Based Maintenance/ Risk Based Inspection).

How to manage risk in Maintenance and how does it relate to failures?

Risk management should not be another report, model, or standard 'from a consultant to the shelf.' Risk awareness should lead to DECISION making both short- and long-term, and only then does it make sense.

Here are some example decisions based on risk in the companies I have worked with:

Prioritization of repair and investment initiatives – objectively states what should be included in the repair and modernization plan for the next year (based on Criticality Analysis and Technical Condition Assessment).
Handling rejected initiatives – if initiatives with unacceptable risk were not included in the repair and modernization plan, a plan for further action must be established. Not all initiatives fit within the budget, and the risk of those not realized does not suddenly disappear upon rejection.
Operational strategy and preventive plan based on Criticality Analysis – simultaneously increasing engagement in critical machines and WISELY REDUCING the scope of prevention for machines with the lowest criticality.
Criticality of spare parts - Criticality Analysis of machines + well-defined BOM (Bill of Materials) is an excellent set indicating the risk associated with the lack of specific stock indices. Criticality of parts requires more information, but this is one of the more significant, often overlooked due to the lack of BOMs.
Prioritization of planned work orders based on equipment criticality – first preventive actions for critical machines, then minor faults.

Each of the above examples ultimately leads to a reduction in failure rates or their impacts, but none will eliminate them all. Moreover, in each of these cases, failure rates may actually increase in some areas! If that happens, it will be the result of conscious decisions, and the cumulative effect should be beneficial for the organization.

A failure resulting from conscious DECISIONS is better than a failure resulting from their absence or ignorance.

If risk analysis exists in your organization but does not translate into planning work and budgeting, that is where the biggest gap lies to be closed.

Related Services:

Operivo Sp. z o.o.

Aleja Jana Pawła II 27

00-867 Warsaw

+48 533 373 200

europe@operivo.com

Privacy Policy