RELIABILITY CENTRED MAINTENANCE FOR INDUSTRIAL USE : SIGNIFICANT ADVANCES FOR THE NEW MILLENNIUM

Maintenance organisations have to obtain the correct strategic 'mix' to ensure success. This includes having a strategically sound maintenance policy and managerial procedures, a well thought through maintenance plan, a proper maintenance management and operational system, proper operational procedures, employment of the necessary technology and sensible management of human resources. Reliability Centred Maintenance (RCM) has enabled maintenance users to develop, scientifically founded maintenance plans. However, due to shortcomings in the methodology and shortcuts taken by RCM practitioners, these benefits have not been forthcoming in industrial use. This paper addresses these limitations and develops an improved RCM methodology. It was tested against the 'classical' RCM in a typical industrial setting, with significant benefits being demonstrated. OPSOMMING Instandhoudingsorganisasies moet die regte strategiese 'mengsel' verkry om sukses te verseker. Dit sluit 'n strategies gesonde instandhoudingsbeleid en bestuursprosedures, sowel as a weldeurdagte instandhoudingsplan, 'n goeie instandhoudingbestuur en -operasionele stelsel, goeie operasionele prosedures, die regte vlak van instandhoudingstegnologie asook goeie mensbestuur in. Betroubaarheidsgebaseerde Instandhouding (BGI) stel gebruikers in staat om wetenskaplik gefundeerde instandhoudingsplanne daar te stel. Desnieteenstaande word hierdie voordele meesal nie in industriële toepassing verkry nie vanweë tekortkominge in die BGI metodologie asook die neem van kortpaaie deur BGI praktisyns. Hierdie artikel ondersoek hierdie beperkings en ontwikkel 'n verbeterde BGI metodologie. Dit is teenoor 'klassieke' BGI in 'n tipiese industriële omgewing getoets en het beduidende voordele getoon. http://sajie.journals.ac.za


INTRODUCTION
Reliability Centred Maintenance (RCM) started a new chapter in the history of preventive maintenance strategy setting.It was now possible to develop a scientifically based, highly successful maintenance program for complex systems.It developed as a result of the reliability problems and cost of maintenance of aircraft during the late 50's and early 60's.The result was a methodology called MSG-1, followed by the improved MSG-2.When MSG-2 was used contractually for the United States Department of Defence, it led to the present definition of RCM.
In academic circles there developed a growing dissatisfaction with the technique (Pintelon et al [15]), of which part stems from watering down its scientific basis to make RCM more marketable (Moubray [12]), while at least part is based on perceived inherent scientific weaknesses in the methodology itself.
The present article, which is based on the primary author's PhD thesis, in setting out to solve these limitations makes several important contributions to the RCM methodology.The first of these is a method of concentrating the RCM analysis effort on the most important failure modes encountered by the organisation.Secondly, it introduces a Quality Improvement task in the RCM task selection tree, based on a limitation identified by Harris [6].The third contribution is the addition of a formal task packaging methodology, following Gits [3].The thesis also combines the use of RCM for the most important failure modes with conventional maintenance tasks for the remaining failure modes, to form a total methodology for the typical industrial concern.It furthermore introduces the application of sound management principles in the implementation of RCM and lastly, blends concepts from different RCM authors, together with the innovations listed above, into one logical whole.In summary, the proposed revised methodology can play a very important part to achieve the goal of World Class manufacturing standards, including ensuring that the organisation's maintenance effort is as proactive as possible.

OUTLINE
It is impossible to present the full results of extensive PhD research in a single paper.However, by assuming that the reader is conversant with RCM nomenclature and techniques, and by using a familiar structure, much of the needed result can be achieved.This paper thus follows the general structure of the methodology as proposed by Nowlan and Heap [14] and which was followed by most authors on the subject (e.g.Moubray [11] and Coetzee [2]), but including the changes proposed by the present work.This framework is introduced in figure 1.
In the development of the proposed model's components, use was made of the best work of sources such as Nowlan and Heap [14], Moubray [11], Smith [17] and MSG-3 [13].Newer work and related work, such as that found in Gits [3], Harris [6] and Coetzee [2] were also taken into account.
The proposed methodology is integrated with the more intuitive methods of Maintenance plan design such as Business Centred Maintenance (Kelly [8]), equipment manufacturer's recommendations, statutory requirements, NOSA standards and HAZOP studies.It is also placed in context with the various maintenance task classifications such as preventive vs. corrective vs. design-out maintenance, scheduled vs. unscheduled work and planned vs. unplanned work.
The proposed model was tested, using a high-risk chemical pump system as test bed.The system's failure history is analysed in full operational context, using the improved methodology.This leads to a proposed maintenance plan for the system.This proposed maintenance plan was then critically compared against a 'classical' RCM analysis done previously for the same system.The proposed methodology is found to be superior to the classical approach, leading to a more focussed, proactive and concise maintenance plan.

Figure 1: Framework for RCM
Finally, the result was critically assessed against the following five baseline references: • 'Classical' RCM as embodied in the SAE Standard JA1011 [16] • 'Classical' RCM as embodied in the various RCM texts1 .• MSG-3 [13], the latest version of the airlines' methodology.
It also recommends that certain follow-up research/work needs to be done, which would lead to further enhancing and improving the RCM methodology.

THE PROBLEM
There are two major problems with the application of RCM.The first of these is that some maintenance people are strong proponents of the technique, whilst others are strongly anti-RCM.It is often difficult to find out what problems people falling in the second category have with the technique, apart from the cursory claim that 'the technique is unscientific'.Moreover, ask them what the alternative is and they will murmur 'use general reliability principles', without any reference to the difficulties involved in practising such principles in the average production concern without the structure afforded by RCM.The second problem is that there are a large number of unreliable 'consultants' who, while selling RCM programs, violate the very principles of the technique.This has given RCM a bad name in many parts of industry.In addition, many of the opponents of RCM have it specifically against the unscientific approaches and methods of these so-called RCM experts.
Many of the opponents of RCM come from the broad Operations Research community.They make a living through the development of mathematical models for maintenance strategy setting.For them, it is heresy to imply that you can formulate maintenance strategy without detailed mathematical analysis, using a mathematical model of some sort.Very often, this is due to a lack of understanding of the maintenance problem.Due to a lack of failure data, it is often not viable to apply mathematical modelling in maintenance strategy setting (which has as consequence that it will be impossible to specify a use based maintenance task).Furthermore, the maintenance problem includes factors (such as the behaviour of people and the interfaces between systems/equipment/components) that cannot be adequately modelled using mathematics only and where responsible managerial discretion and synthesis plays a major role.
The problems mentioned in the previous paragraph are however not indicative of unscientific methods.Certainly the scientific basis of any method or technique depends on whether its different components are based upon premises, which were properly researched and tested, and were found to work.In both cases, RCM passes to the test of being scientific.Referring to the results achieved by the airline industry, one cannot but conclude that the methodology produces excellent results (Smith [17], pp.52, 53).During the period from 1964 to 1987 the percentage of components allocated to time based maintenance have dropped from 58% to 9%.In the same period the percentage of components left to fail before maintenance action increased from 2% to 51%.In a study of comparing the first 10 years of RCM use (1970 to 1980) with the last years of pre-RCM operation, it was revealed that the maintenance cost per flight hour remained virtually constant.This is a miracle, taking into account the increase in sophistication and in the carrying capacity per flight hour (the fuel cost per flight hour has more than quadrupled in the same period).This is conclusive evidence that the maintenance strategies produced by RCM produces the required results (of course in combination with improvements in design, which included many redundancy features, which results in lower levels of preventive maintenance).
The original treatise on RCM (Nowlan and Heap [14]) can in a sense be regarded as being incomplete.This may lead to the idea that the technique is unscientific.But this is certainly true of many new developments.As one works through Nowlan and Heap [14] and get a grasp their own personal views of reliability modelling, and its effect on maintenance programs, one cannot but come to the conclusion that they were serious reliability practitioners.They certainly had a lack of understanding of some important issues that is understood today, for example the difference between wearout (IFOM2 ) and Reliability Degradation (increasing ROCOF3 ).But then, most reliability practitioners do not even understand it today (Ascher and Feingold [1]).It is also true that there are gaps in RCM, which should be filled in.The objective of the present work is to make a meaningful contribution in this regard.
Nevertheless, many of the applications of RCM are certainly unscientific.Because the basic premises of RCM are not properly understood, fundamental changes are often made to the technique to make it simpler and more palatable.These changes undermine the scientific basis of RCM.Examples are both Moubray's and Smith's insistence on not applying failure data analysis when making choices regarding maintenance tasks (Moubray [11], pp.218-223), (Smith [17], pp.102, 103).In both cases they side-step the issue.Even MSG-3 [13] suffers from this.Another example is that of 'Streamlined RCM', which degrades the methodology to a mere decision tree approach (Moubray [12]).

IMPROVEMENTS TO THE RCM METHODOLOGY
The improvements to RCM proposed by the present research are considerable.However, many of these improvements are evolutionary steps rather than quantum jumps.These smaller improvements will certainly be included in this article as part of diagrams and descriptions of the more noteworthy developments, but will not be described specifically.
Referring to the framework presented in figure 1, the following sub-paragraphs will endeavour to describe the major improvements to the methodology, which are the result of the PhD research.

Selection of application areas
Three methods of selection of application areas are in general use.These are firstly partitioning (Nowlan and Heap [14]), secondly use of the plant register (Moubray [11]), and thirdly analysis at the systems level (Smith [17]).The most frequent used is the method of partitioning (breaking down) the equipment to a level which ensures, on the one hand, that no failure mode is missed and, on the other hand, that the failure modes selected have an impact on the equipment function.
Which one of these three methods to use depends on the technology involved and the business culture.The best way of analysis (in pure technical terms) is by combining elements of these three methods into one.In most businesses a plant register exists that can be used to identify the technical structure of equipment/infrastructure of the business.This plant register can at least be used to identify the top structure.Smith's idea to work at systems level is a good one for identifying the various systems for which further analysis should be performed.Those systems can then be subjected to a partitioning process to identify the Maintenance Significant Items (MSI's).

Identification of major systems
Most businesses have an asset register (plant register) that can be used to identify the technical structure of the business.This register can at least be used to identify the top structure of the equipment/infrastructure (divisions and systems).This entails some or other combination of the use of the asset register and the identification of major systems.However, as different businesses' structure differs, the way in which this identification process is conducted will also differ.The identification process for four types of businesses is shown in figure 2.
The objective of this identification step is to identify systems (or units) at a high enough level to facilitate the next step, that of choosing the most important systems (or equipment types or assets) for further analysis.The resultant 'system' level must be high enough to easily determine the relative business impact of each such 'system', while being low enough to effectively limit the RCM analysis workload to a plausible one.Following the identification of systems, a method must be devised to choose the most important of these for the application of RCM.Coetzee [2] suggests that this should be done using the profit contribution of each of these units to prioritise the units in order of their relative contribution.Although this method is an improvement on the standard RCM methodology, it is imperfect.The main problem with the approach is that it accentuates profit only, without due regard to other impact parameters, such as safety and environmental effects.It also does not allow comparison with other impact parameters, due to a difference in measurement units.
Jones [7] proposed a risk method to quantifying the relative criticality of the various failure modes.This method is also suitable for the quantification of the relative importance of the different units.His method allows for the simultaneous evaluation of five impact-parameters in direct maintenance-related terms, using money as the common denominator.The parameters evaluated are lost production, lost quality and maintenance cost, as well as safety and environmental effects.A combined risk figure is then calculated for each unit -these figures can be used to identify the "20" % of units with the highest maintenance risk impact for further RCM application.
The specific method used for deriving the figures for this risk calculation is very businessspecific.The method proposed for failure modes in paragraph 0 (table 2) may provide some insights.

Prioritisation of MSI's
In line with Coetzee [2], it is expedient to add another selection process to streamline the RCM process such that only the most important MSI's are subjected to detailed RCM analysis first.This prioritisation can be done using the risk approach outlined above, but will most probably, because of its detailed analysis, be difficult to apply at this level of the analysis.A more likely method is that of Coetzee [2], using the downtime contribution of each MSI to the downtime of the equipment (or system) to identify the "20" % of MSI's (using the Pareto principle) that contribute most to the downtime of the equipment (or system).
It is difficult to use a downtime prioritisation in all cases.For some equipment safety could, for example, be more important than downtime and then that factor could be used to prioritise the MSI's.The point is that the user of the RCM methodology should decide which single parameter or combination of parameters makes most sense for the prioritisation process.
These two prioritisation steps, together with the prioritisation process at the failure mode level ( § 0), constitutes a prioritisation mechanism, which serves as a means for achieving fast results from the RCM methodology.It acts as a 'funnelling process' to concentrate the RCM analysis on the more important units (prioritisation 1), MSI's (prioritisation 2) and failure modes (prioritisation 3).Each of these 'funnelling actions' is progressively more short term in nature, because the impact is higher at each lower level due to the effect of the higher level prioritisation.One would thus as soon as possible, following the initial RCM result, increase the failure mode funnel size from "20" % to 100 %, after which the MSI funnel size will be increased, and lastly the systems funnel size.
The prioritisation mechanism, as described above, is illustrated in figure 3. It results in approximately "1" percent of the failure modes of the business (the most important failure modes) being addressed during the first phase of the RCM process.These 'funnels' are then progressively widened as described above.

Information Assembly
One of the most important steps in designing a maintenance plan for the organisation, is the assembly of information regarding the business.This is because RCM is very contextspecific.The analyst thus has to understand the business, the technology involved and the operating context in order to design a plan that will be worthwhile.100% of equpiment = 100% of MSI's = 100% of Failure Modes "20"% of equpiment = "20"% of MSI's = "20"% of Failure Modes "20"% of "20"% MSI's = "4"% of MSI's = "4"% of Failure Modes Impact Selection Process 3 "20"% of "4"% of Failure Modes = "1"% of Failure Modes Critical Failure Modes

Figure 3: RCM prioritisation processes
The following table lists information that should be obtained to ensure that the RCM outcome is a scientifically valid maintenance plan.Each of the columns of this table contains a separate category-specific listing, so that no value should be attached to row context (each column represents a separate list or table).
http://sajie.journals.ac.zaAs part of this step, a thorough study of the information at hand should be performed.One often finds that the process of information retrieval and analysis leads to additional understanding and insight into the business and the assets to be maintained.This leads to further direct added value in terms of improvement of operational and maintenance procedures, early fault identification, and asset reliability, operability, as well as maintainability improvements through redesign.

Asset-specific information
A valid concern, is that important failure modes may be missed through the application of the prioritisation processes.This is a further reason for the detailed study that this process step requires.Although the concern is real, it is unlikely that, in such a thorough study and the analysis that follows, any important failure modes will be missed.On the contrary, experience has shown that the most important failure modes tend to show up readily in this process of study and analysis.

Identification of Failure Modes
Once the Maintenance Significant Items (MSI's) to which RCM must be applied are known, the FMEA technique is used to identify the failure modes and its effects for each such MSI.The accepted structure for the FMEA used by most authors is item->function->functional failure->failure mode.This structure is inherent to the methods of Nowlan and Heap [14], Moubray [11], Coetzee [2] and Smith [17].This accepted structure is used as the de facto standard, with the addition of failure effects at the local, system and unit levels in line with MIL-STD-1629A [10] and Smith [17], but using slightly different terminology.The use of a three-level effects-structure is deemed important to ensure that all possible effects of the failure mode are taken into account when doing task selection.The other columns in the standard FMEA, and those of Moubray [11] are not deemed important and will not be used as they are very specific to certain classes of users and can be added as necessary.The only other data entities that is used is a component reference number and a line reference number to cross-reference back to the specific FMEA analysis line from later parts of the analysis.The resultant FMEA table is shown in figure 4.
The FMEA table itself has only two extraordinary features.The first of these is the separate effects columns, which creates space for descriptions of local, system and unit effects, while still leaving enough analysis space in the main FMEA table.The second feature is the two reference columns, the first of which references backwards to an item number, while the second references forward to the effects column and to the further parts of the RCM analysis.The item to which the table refers will typically be an MSI.Each item can have one or more functions (primary and secondary functions), each function one or more functional failures 5  and each functional failure one or more failure modes, as is the case in standard RCM.In each case the resultant failure mode should be reviewed to make sure that it contributes to the main system function, thus keeping the analysis function-driven.For this purpose there is a functional check (FC) column, where a check will imply that the failure mode has an adverse effect on the system function.The FRef column carries a special reference number identifying the failure mode uniquely within the system.This can be used for the remainder of the analysis to identify the particular failure mode.It is also used to reference the special effects columns below the normal analysis.
The effects columns carries the failure mode reference (FRef) as heading and each has three spaces for local, system and unit effects.The local effects are those that the failure mode has on its own function, while the system effect is the effect of the failure mode at the system level and the unit effect, the effect on the total production unit.These effects are very important as they largely contain the information on which the further analysis regarding task selection will be based.

Prioritisation of Failure Modes
As was stated in paragraph 0, a last prioritisation step is needed to achieve the "1" % of the failure modes of the business (the most important failure modes) being addressed during the first phase of the RCM process.This, together with the previous two prioritisation steps, constitutes a prioritisation mechanism, which serves as a means toward fast results from the RCM methodology.It acts as a 'funnelling process' to concentrate the RCM analysis on the more important units (prioritisation 1), MSI's (prioritisation 2) and failure modes (prioritisation 3).The total prioritisation mechanism, is shown in figure 3, where the prioritisation of failure modes is represented by the bottom-most 'funnelling' process.
As was explained there (paragraph 0), each of these 'funnelling actions' is progressively more short term in nature, because the impact is higher at each lower level due to the effect of the 5 Nowlan and Heap (1978) defined the three concepts failure, functional failure and potential failure as follows: 1.
A failure is an unsatisfactory condition.

2.
A functional failure is the inability of an item (or the system/sub-system in which it is installed) to meet a specified performance standard.

3.
A potential failure is an identifiable physical condition which indicates that a functional failure is imminent.While the second and third of these definitions are satisfactory, the first is not.It is far too wide and misleading.The following are better formulations of these three definitions: 1.
A failure is any condition which results in unsatisfactory performance or points to the fact that the instant of such unsatisfactory performance is near.

2.
A functional failure is the inability of an item (or the system/sub-system in which it is installed) to meet a specified functional performance standard.

3.
A potential failure is the imminence of the instant of functional failure.The presence of such potential failure is normally found through measurement of some physical parameter (detecting a deviation from its normal 'healthy' value).
http://sajie.journals.ac.za higher level prioritisation.One would thus soon, following the initial RCM result, increase the failure mode funnel size from "20" % to 100 %, after which the MSI funnel size will be increased, and so forth.
This last prioritisation process can take place using any one, or a combination of, methods including the severity rating of the standard FMEA (MIL-STD-1629A [10]), the criticality rating of FMECA (MIL-STD-1629A [10]) and the risk profile method of Jones [7].The FMECA method tends towards a complexity that would be beyond many industrial users, whereas Jones' method combines five diverse consequence factors into one risk figure, using only cost and probability estimates as basis.A further benefit of Jones' method is that it does not oversimplify the prioritisation into one single risk factor, whereas FMECA does exactly that (it measures risk based on operating time or number of cycles used).
Whether one calls the prioritisation result a criticality figure or a risk figure is a matter of personal choice.The term 'risk' is familiar to maintenance practitioners in industry (as they use the same concept to calculate safety risk) and will be used for our purpose.The calculation of the risk involved regarding any single failure mode is based on the combination of the various risk factors using the formula: where P i represents the probability of the risk consequence factor C i occurring and n is the number of risk factors.Jones [7] suggests safety, lost production, lost quality, environmental effects and maintenance as the five risk factors, but any combination of valid risk factors in the specific maintenance environment can be used for this purpose.
Coetzee [2] states the general objective of the maintenance function as follows: It is the task of the maintenance function to support the production process with adequate levels of availability, reliability and operability at an acceptable cost.
This objective statement has lately been modified to the following:

It is the task of the maintenance function to support the production process with adequate levels of availability, reliability, operability and quality at acceptable levels of safety, environmental effects and cost.
This sets the scene for the general application of risk principles to the failure modes of a system in the general commercial maintenance world.The quantities representing real risk are unavailability, unreliability, inoperability, poor quality, safety risk, environmental risk and high maintenance cost.Thus, seven risk factors, of which Jones [7] has identified five (he used lost production to combine the effects of unavailability, unreliability and inoperability).His approach is a very practical one, as long as one keeps in mind that 'lost production' consists of the effects of unavailability, unreliability and inoperability.A generalised method of calculating the risk for the various risk factors (all seven risk factors) is shown in table 2. These are used in equation 1 to calculate the total risk (in R/h) for each failure mode.One could of course decide to use only some of the above-mentioned five factors for the risk calculation, as some factors (e.g.safety and environmental) might not be relevant to a certain situation.Or, in an asset that does not contribute to production, the first two factors might not be relevant.Nevertheless, the technique provides both a practical way of prioritising failure modes and useful insights into the process of failure and its effects.

Probability of
It is also handy for ease of comparing the relative risk involved in the various failure modes, to use a process of normalisation.This involves defining a level of risk R max (R/h) that are deemed to be a 100% (or totally unacceptable) level of risk.Each failure mode then has a percentage risk equal to: It might be difficult to use the risk prioritisation presented above in some cases.For those equipment a single parameter or combination of parameters might make more sense for the prioritisation process.Another way of prioritisation could be to list the Failure Modes and then order them in order of importance based on one or more of the parameters listed in table 2 or otherwise using heuristics (based on the 'gut feel' of the user).
The task analysis sheet for the above results is shown in figure 11.The sheet starts with the failure mode reference F ref , which was described fully in paragraph 0 above, and a repeat of the failure mode column (for clarity).It then adds a column for the relative risk RR calculated using equation 2 above.This value is now used to select the "20%" of failure modes, which has "80%" of the risk impact, for further analysis.This could be done using standard Pareto analysis methods.Those failure modes that will be analysed further receives a tick mark in the risk check (RC) column.This prioritisation constitutes the bottom-most 'funnel' in figure 3.

Classification of Failure Modes
Before moving away from the area of failure mode selection (figure 1), we should deal with the only standard 'prioritisation' afforded by the original Nowlan and Heap [14] version of the methodology.Reading Matteson (1989) one soon realises that this classification process, together with the task selection process, were really the heart of the technique in MSG-1 and MSG-2.
This paper therefore proposes no fundamental changes to the consequence selection structure, apart from some small wording changes, in line with the best practice, and the two changes proposed by Moubray [11] and MSG-3 [13].The only fundamental change is that, in line with flow diagram convention, the rectangular question boxes have been replaced with diamond-shaped ones.The resulting decision tree is shown in figure 5.
The documentation of the Failure Mode Classification results is done on the Task Analysis worksheet as shown in figure 11.The results are written into the column headed 'Conseq Type' using the abbreviations H (Hidden Safety and Environmental Consequence), HO (Hidden Operational Consequence), S (Safety and Environmental Consequence), O (Operational Consequence) and NO (Non-operational Consequence).

Task Selection Process
For most users and authors of scripts on RCM, the principles in the task selection tree of Nowlan and Heap [14] still holds today.These principles are firstly a conservatism in the order in which tasks are selected and secondly the principle that the decision process is truncated once a valid task is found.
Gits [4], Smith [17], and MSG-3 [13] all challenge the truncation principle.Smith applies a task selection without truncation for all failure modes, MSG-3 truncates for economical consequences and does not truncate for safety consequences, while Gits practices a mixture of the two based on four qualities ('hiddenness', seriousness, shape of F.O.M. and the possibility to detect a failure).
The questions that now present themselves are whether the principle of truncation should remain and if the resultant answer should apply to all consequence categories.To the first question, one can categorically answer no.Taking into account the prioritisation process taking place before this analysis step, one can make the statement that only important failure modes are handled (those with a high impact) in this step.For such failure modes it is http://sajie.journals.ac.za obviously beneficial to consider all relevant maintenance options and then choosing the best task or combination of tasks.As far as the second question is concerned, it makes sense to apply such a rigorous approach only to the more important failure consequence categories.
The reason why many RCM texts retain the truncation principle across the board is that they do not provide suitable mechanisms (this includes MSG-3) for prioritisation of failure possibilities and thus have to limit the number of task selection steps to contain the scope of the RCM analysis.This approach does not make sense at all.The suggested principle of first selecting the "1%" most critical failure modes leads to an approach where one could look into all possible task options and/or task combinations when deciding on the best maintenance strategy.One thus makes certain that the most important failure modes are recognised and then spends enough time on the analysis.As the prioritisation 'funnel' is progressively widened, one can then use the less rigorous approach (truncation) more extensively.Because of this fact, the conservatism in the task selection tree should be retained.MSG-3 [13] added a non-truncated lubrication/servicing task at the top of the task ladder.This was regarded by the original (Nowlan and Heap [14]) version of RCM as a task that is added after the RCM analysis has been completed.However, as it is important to design the best maintenance strategy combination, this task and its role should be considered together with the standard RCM task train.The same argument holds for the suggestion of Harris [6] that a Non-Maintenance Improvement task be added to the top of the task train (this time with truncation, if that applies).This Non-Maintenance Improvement task really implies some quality improvement action (of both a maintenance or operational nature), and is thus named a Quality Improvement6 (QI) task.The two resulting standard tree structures are shown in figures 6 (rigorous tree without truncation) and 7 (tree with truncation).These two trees are named 'Task Decision Tree 1' for the one without truncation and 'Task Decision Tree 2' for the one with truncation.
Wording is based on the best practice.The scope of the servicing task is extended by the inclusion of adjustment.
When using these two decision trees for task selection in the case of hidden consequences, the failure finding task should be added after the lubrication task, resulting in the trees named 'RCM Task Decision Tree 1h' and 'RCM Task Decision Tree 2h'.These are shown in figures 8 and 9 respectively.

Default tasks
Three factors make a total rework of the default task options necessary: a.The failure finding task in the case of hidden consequences is no longer a default option but is amongst the first options considered.
b.The adaptability that the RCM tree now allows in terms of the specific task selection tree structure used in the case of non-safety items7 needs a more flexible approach regarding default tasks.
c.The need to challenge the corrective maintenance default outcome of the operational and non-operational task categories.
These three factors led to the following three sets of default actions, which are shown in figure 10: • Hidden (tree 1h) and Safety (tree 1) Consequence categories -following the last step (which involved choosing the best task combination), a check is made whether this task combination produces a solution that is 'both technically and economically feasible to the correct degree'.The wording 'correct degree' again allows flexibility to cope with various circumstances and situations.If the answer is yes, the task combination is used, otherwise design-out is mandatory.
• Hidden Operational (tree 1h), Operational (tree 1) and Non-operational (tree 1) Consequence categories -this is the case where the more conservative approach (without truncation) was chosen.
• Hidden Operational (tree 2h), Operational (tree 2) and Non-operational (tree 2) Consequence categories -this is the case where the less conservative approach (with truncation) was chosen.The default in this case would traditionally have been corrective maintenance, with design-out as option.This is now modified to include a cost trade-off study if deemed necessary to compare the corrective strategy with designout.

Documenting the results
The tasks found are documented on the Task Analysis worksheet as shown in figure 11.
i.The column 'Task Type' is filled in using the abbreviations: LSA Lubrication, Servicing or Adjustment Task FF Failure Finding Task QI Quality Improvement Task OC On-Condition Task Rec Reconditioning Task Rep Replacement Task CM Corrective Maintenance Task DO Design-out Task ii.The column headed 'TO' is used to document a cross-reference to the trade-off study, if applicable.
iii.The 'Task' and 'Task Detail' columns are self-explanatory.
There is a consequence of the task selection process, which is not obvious at first.This is the fact that, because of the change in the process not to truncate after the task selection, a single failure mode might have a whole list of tasks listed next to it on the analysis sheet.All of these will be valid tasks, but will not necessarily all be used.During the last process on the decision tree (figures 6 and 8), the best task combination is chosen from the documented tasks.
The tasks making up this best task combination is then checked in the TC (Task Combination check) column before handling the default part of the RCM decision process.Following the default analysis, this task combination will be confirmed by circles around the check marks if it survived the 'feasible to the correct degree' question.Otherwise, a further task, which can be a corrective task or a redesign task, will be listed with a circle next to it in the TC column to indicate that it was chosen.

Task Frequencies
Although It was not within the scope of the present research to investigate and research better methods for the choice of task frequencies, some of the many techniques available for the determination of task frequencies are shown in table 3.Many engineers, statisticians and operational researchers are doing research to find better ways of determining task frequencies.These efforts are invaluable, as it remains a challenge to find the correct task frequencies, given the scarcity of data in the typical maintenance environment.
It is recommended, that a company using RCM keep a list of standardised frequencies, which are acceptable to the organisation, either for the whole organisation or per workshop.This is in line with the third principle of Nowlan and Heap [14] and the second principle of Gits [4] (paragraph 3.2.9).
The results of the frequency analysis will be documented on the Task Analysis worksheet by http://sajie.journals.ac.za adding the specific standard task frequency symbol in the column headed 'F' in the Task Analysis worksheet (figure 11).At the same time the trade involved will also be added in the column headed 'T'.Possible trade abbreviations to be used are 'B' (Boilermaker), 'E' (Electrician), 'F' (Fitter), 'H' (Helper), 'M' (Millwright), 'R' (Rigger), 'W' (Welder), and so forth.
A summary of the task selection process is shown in figure 12.

Compile Maintenance Plan
The principles of task packaging is simple: the individual tasks, which is the result of the RCM analysis, must be put together in logical work-packages, such that the work is grouped by: i. plant/system/machine ii.
task frequency class iv.trade v.
task timing (Kelly [8]) The proposed RCM sheets (figures 4 and 11) are meant to be used per plant or system or machine, such that the tasks resulting from such an analysis will by definition be grouped according to the first grouping above.The RCM Task analysis worksheet shown in figure 11 includes columns for starting the task packaging process.Some of them have already been discussed, but will be included in the following listing for completeness sake: Conservative estimation (Smith [17])

Table 3: Listing of task frequency determination techniques
The set-up type for the specific business has to be defined during the actual analysis process, as it differs from business to business.The production indicator, on the other hand, is reasonably standard and can be chosen from the following list (Kelly [8]): P Work that can be done during production O Opportunistic -minor work, to be done during production stoppages SD Major work to be done during shutdowns The remainder of the task packaging process now consists of considering these various classifications and grouping tasks into logical work packets based on the constraints imposed by them.This process is rather involved and does not lend itself to standardisation.The principles can however be expounded: The task packaging process should normally be performed for the unit that was chosen for analysis to limit the complexity of the packaging process.(b) The tasks as listed in the Task Analysis worksheet should be sorted to obtain the logical work packages.This can be easily done if a spreadsheet was used in documenting the analysis results.The typical order of sorting would be: P | ST | F | T. This is different to the conventional T | F. (c) Often different parts of the work done on a single unit, but utilising different trades, come from different operating units of the business (workshops typically) or even outside concerns.This does not pose insurmountable difficulties, but has an implication for task packaging in the area of achieving an acceptable level of task co-ordination.(d) The suggestion that the business keep a list of standardised frequencies, which are acceptable to the organisation, either for the whole organisation or per workshop, embodies another important principle.Occasionally these standards may prove to be unacceptable, but then it is most probably time to approach higher management with a request to register another standard task frequency.(e) Task intervals that affect the production process should be spaced as far apart as possible (Nowlan and Heap [14] principles 1 and 2).(f) Too large work packets result in a major impact on maintenance resources.Whenever large low frequency work packets occur, there should be an attempt to spread this workload amongst smaller higher frequency work packets, such that equalisation of workload is achieved without jeopardising the end result (Nowlan and Heap [14], principle 4), (Gits [4], principle 3).

Implement Maintenance Plan
The RCM process itself was redeveloped in some detail in paragraphs 4.1 to 0. Although this was done fairly completely, there now remains some questions regarding the coherence of the total model, practical implementation issues and the integration of the methodology into the organisation.
One of the major contributions of paragraph 4.1 lies in the process of limiting the number of failure modes to which the RCM task selection process is applied.This was done through the 'funnelling' concept (figure 3), which resulted in only "1 %" of the total number of failure modes being addressed during the first RCM analysis process.However, that causes other potential problems to spring to mind.What happens to the remainder of the equipment?if some of the equipment outside the 'funnel' has important statutory or safety implications?While RCM is used to design an excellent plan for "1 %" of the failure modes, there still needs to be something else in place for the other "99 %".
The solution to this problem is to see RCM as a technique for optimising the most critical parts of the maintenance plan.This ensures that the old plan is not discarded immediately, which would cause a high level of instability, but is gradually improved by the use of the RCM methodology.The principle is depicted in figure 12.The figure is another way of presenting the 'funneling' process of figure 3, but with the added feature that the other "99 %" is also shown.It shows that the "1 %" is analysed using the RCM methodology, while the other "99 %" is analysed using conventional methodologies.Those could include the Business Centred Maintenance method (Kelly [8]), equipment manufacturer's recommendations, statutory requirements, NOSA standards and HAZOP studies.The figure adds one further feature: the time deployment of the technique.It shows that the idea is to progressively widen the 'funnel' to further optimise the maintenance mix of the organisation.
The second main topic of this section relates to some practical implementation issues.They are firstly the continuation of the process after it has been worked through once and secondly the continuous improvement process that follows after the full implementation of the first RCM maintenance plan.The first issue is addressed using figures 1 and 13 as basis.This uses the 'funnel' of figure 3 to further describe the process shown as the 'RCM time deployment' in figure 12.This shows that after the initial use of RCM to design a maintenance plan for the "1 %" of failure modes, the bottom-most failure mode 'funnel' is progressively opened until RCM has been applied to all the failure modes of the "4 %" of MSI's.Thereafter the MSI 'funnel' is opened further, again using a failure mode 'funnel' to progressively apply RCM to the 'new' MSI's, and so forth.After all the MSI's have been handled for the "20 %" of equipment, that funnel can be progressively opened.This opening-up process of the funnel is of course in the hands of the maintenance managers involved who should stop the process when it does not make economic sense any more.The exact point at which you stop using RCM to your advantage is based on economics: if you over-apply RCM, the marginal cost would exceed the gain achieved.On the other hand, if you stop too early you would not get the full benefit.
The second feature of importance in figure 1 is the feedback loop named 'continuous improvement'.This is what Smith calls the living RCM program (Smith [17]).It indicates that the RCM analysis process can never stop, due to the commitment that was made to continuous improvement of the maintenance plan.At the completion of the RCM implementation project (the 'funnels' have been opened up to their logical maximum opening) and all resultant tasks having been properly implemented, the Continuous Improvement loop is activated and a continued program of Maintenance Plan Improvement starts.This is normally driven through management activators such as a high ROCOF, high cost and catastrophic failure incidents, which indicates that further optimisation of the Maintenance Plan is necessary.

Results achieved
The principles embodied in the improved RCM methodology as expounded above proved themselves in application on a typical industrial system.The resultant maintenance plan was a marked improvement on the result of an analysis using 'classical' RCM.The main reasons for the remarkable improvement is deemed to be the following:

•
The 'funnelling' approach -this is a major contribution of the present research to the knowledge base of the RCM methodology.It ensures that the RCM effort is concentrated on the most important failure modes of the organisation.

•
The principle of progressive application (widening of the 'funnel').This is a very logical progression from the 'funnelling' approach, as it makes sense to further improve on the initial benefit that is obtained from RCM analyses.

•
The second major contribution of this research lies in the inclusion of the Quality Improvement task in the task selection tree.In the application of the analysis technique it identified no less than four opportunities for improving procedures of both operations and maintenance work to proactively prevent failure from occurring.This is not an original contribution of this thesis, as Harris [6] identified the need, but it will certainly increase the effectiveness and relevance of RCM analyses greatly.

•
Another contribution of significance is in the area of task packaging, which is an area totally neglected by all authors except Gits [3].This area has never been properly addressed in any RCM text.Gits [3] has developed an elaborate, but virtually unintelligible, scheme for this purpose.Nevertheless, his thoughts were used very profitably in the development of the task packaging procedure proposed.

•
The combination of the RCM maintenance plan (for '1%' of the failure modes) with the more intuitive conventional plan for the remaining 99% of failure modes to achieve a best maintenance plan for the organisation.

•
A very important contribution follows from the application of sound management principles in the implementation of RCM.These include understanding the position of RCM in the organisational context, proper structuring of RCM training, use of mechanisms such as a Steering Committee, a management champion, a well trained facilitator and proper review of the resultant plan.Good failure information support will also be ensured if the requirements of RCM regarding failure data are incorporated into the company's CMMS database.

•
The most important contribution of the present thesis to the RCM methodology most probably lies in blending concepts from different RCM authors and those of related techniques, together with the innovations listed above into one logical whole.
RCM is a core methodology in ensuring that the organisation can achieve World Class results from its production equipment.The proposed new RCM approach (methodology), can play a very important part to achieve this goal.It will specifically make a major contribution in ensuring that the organisation's maintenance effort is as proactive as possible.

Recommendations
In the approach and research of this thesis, the premise was to do a total study of the RCM methodology and, as far as is possible, propose a methodology without any inadequacies.This was largely achieved. http://sajie.journals.ac.za It is not within the scope of this thesis to investigate and research better methods for the choice of task frequencies.This is the field of Operations Researchers.Although much work is done in this area, it is often done from a theoretical angle, without any consideration for the maintenance practicalities.One of the greatest concerns is that the maintenance function often has to work with a scarcity of data, while operations researchers tend to assume that there are ample data available for the application of their models.Operations Researchers also tend to think simplistically or try to make as many as possible simplifying assumptions, which does not serve the practical maintenance purpose.A wide range of decision-making models is needed to ensure optimal RCM task and frequency decisions.Refer to table 3 for a listing of the required models.
It is necessary that Computerised Maintenance Management Systems start adding RCM facilities to their functionality and especially to their databases.There are a number of RCM computer packages, but they are all standalone systems with little or no interfacing facilities with CMMS's.These systems also do not address the full complexity of the RCM process as set forth in this thesis.

Table 2 : Risk factor calculation
Apart from P r (production rate in units/hour) and G (gain in Rand/unit), all of the above (table2) refers to the specific failure mode, i.e.: Ls = production time lost during safety incident (hours) • t Le = production time lost during environmental incident (hours) • L cs = capital loss during safety incident (Rand) • L ce = capital loss during environmental incident (Rand) • C f = Cost of repairing failure (spares + manpower) ://sajie.journals.ac.za http