ADVANCES IN RENEWAL DECISION-MAKING UTILISING THE PROPORTIONAL HAZARDS MODEL WITH VIBRATION COVARIATES

Increased competitiveness in the production world necessitates improved maintenance strategies to increase availabilities and drive down cost . The maintenance engineer is thus faced with the need to make more intelligent preventive renewal decisions . Two of the main techniques to achieve this is through Condition Monitoring (such as vibrat ion monitoring and oil analysis) and Statistical Failure Analysis (typically using probabilistic techniques) . The present paper discusses these techniques, their uses and weaknesses and then presents th e Proportional Hazard Model as an solution to most of these weaknesses. It then goes on to compare the results of the different techniques in monetary terms, using a South African case study. This comparison shows clearly that the Proportional Hazards Model is sup erior to the present techniques and should be the preferred model for many actual maintenance situations. OPSOMMING Verhoogde vlakke van mededinging in die produksie omgewing noodsaak verbeterde instandhouding strategies om beskikbaarheid van toerusting te verhoog en koste te min imeer. Instandhoudingsingenieurs moet gevolglik meer intellegente voorkomende hernuwings besluite neem. Twee prominente tegnieke om hierdie doelwit te bereik is Toestandsmonitering (soos vibrasie monitering of olie analise) en Statistiese Falingsanalise (gewoonlik m.b.v. probabilistiese metodes). In hierdie artikel beskou ons beide hierdie tegnieke, hulle gebruike en tekortkominge en stel dan die Proporsionele Gevaarkoers Model voor as 'n oplossing vir meeste van die tekortkominge. Die artikel vergelyk ook die verskillende tegnieke in geldelike terme deur gebruik te maak van 'n Suid-Afrikaanse gevalle studie. Hierdie vergelyking wys duidelik-uit dat die Proporsionele Gevaarkoers Model groter beloft e inhou as die huidige tegni eke en dat dit die voorkeur oplossing behoort te wees in baie werklike instandhoudings situasies. 99 http://sajie.journals.ac.za


Introduction
Often in practice, total replacement or complete overhaul to the as-good-as-new condition is the only feasible maintenance action after the failure of a component or sub-assembly . This is the only maintenance alternative considered in this paper. After such total replacement or complete overhaul, an item is said to be renewed.
Renewal of items that are bound to fail can be performed according to one of two strategies: (1) After (unexpected) component failure , in an unplanned corrective manner; or (2) Before compon ent failure , in a planned preventive manner. The renewal action can also be totally eliminated by redesign of the component, but this approach is not always technicall y and economically feasible. .
The different approaches to renewal mostly lead to different levels of cost. A high corrective cost , G/, is often experienced in the case of unexpected failure renewals because it usually results in unplanned production losses and costly unscheduled maintenance actions. The cost of preventive renewal, G p , is in most cases lower than C] , because of the more controlled nature of this approach.
For cases where G f » G p and the risk of the item to fail is increasing, preventive renewal is very often the most economic strategy and it is left to the maintenance engineer to decide when preventive renewals should take place. Three methods to assist in this decision considered in this paper are : (1) Vibration monitoring * (assuming that the item under discussion lends itself to vibration analysis) ; (2) Conventional probabilistic failure analysis (such as Weibull analysis); and (3) Proportional Hazards Modelling (PHM) (originally proposed by Cox [6]) with vibration parameters as covariates. These techniques are discussed in this pap er with specific reference to practices in South Africa. A data set obtained from a typical South African industry is also used to compare the different strategies.

Renewal Decisions Based on Vibration Monitoring
By analysing the vibration characteristics of a component, an -enorrnous amount of information about the component's condition is available. This fact has been proved over and over in the past and it has encouraged researchers to develop the theory of vibration analysis up to a very high technological level. However, very little of this advanced theory and high level technology are found in the typical South African industrial organisation. Often only the most basic vibration techniques are used in condition monitoring programs.
An overview of techniques often used in preventive renewal based on vibration monitoring in South Africa is presented in this section. The term "vibration monitoring" as used here, refers to the typical vibration monitoring practices found during this stud y and not necessarily to the total complex field.

Methodology
The methodology used in the practise of vibration monitoring principally comprises of three main steps : 1. Identifi cation of vibration analysis techniques which will warn th e vibr ation analys t of an impending failure.
2. Setting appropriate benchmarks for the identified vibration parameters and deciding on an inspection frequency.
' Ot her Condition Monitoring techniques (such as oil an alysis) could of course be used as well, but this paper specifically addresses th e issue of vibr ation moni toring 3. Monitoring the vibration levels on a regular basis and renewing the component if the specified benchmark levels are exceeded.
Steps 1 and 2 are considered to be the essence of vibration monitoring and will be considered in the paragraphs to follow.
Step 3 is more of a managerial concern than a vibration monitoring task and is not discussed in this paper.

Vibration Signal Analysis Techniques
With vibration signal analysis software readily available, it is seldom required to have a thorough knowledge of the mathematics involved in the analysis procedures. It is however important for any engineer involved in vibration analysis to interpret analyses correctly. The following discussion thus focuses on the typical vibration signal analyses most often performed in South Africa as well as their correct physical interpretation, without emphasising mathematical calculations.
Vibration analysis techniques can be divided into two distinct categories , which are discussed below.

Time Domain Analysis Techniques
Techniques of this type are mostly favored because of its mathematical simplicity. These techniques are: • Peak signal values This is a very good first line analysis to distinguish between acceptable and unsatisfactory conditions, especially where essentially harmonic motion is considered. See Broch [4].
• Root mean square (RMS) The RMS vibration level is used widely for general monitoring purposes in industry. Lui et . al. [17] published a good case study on RMS. It has the disadvantage that (1) it does not show appreciable changes in the early stages of bearing failure; and (2) cannot detect small gear defects effectively. The RMS vibration level of a signal x(t) over a period T is calculated by:

• High frequency detection (HFD)
This technique is predominantly used to detect early stages of bearing defects by trending the peak or RMS levels of the high frequency response of a machine. Kadushin [14] used HFD very successfully.
• Crest factors The Crest Factor (CF) of a vibration signal is the ratio of the peak level to the RMS level and is a measure of impulsiveness in the signal. Norton[21] explains this concept clearly. Damaged bearings have often got CF levels exceeding 3.5.
• Probability Density Functions (PDF)of ac~eleration signals Vibration accelerations recorded from a" good" bearing will have a Gaussian distributed PDF. The amount with which the measured PDF deviates from the Gaussian PDF is indicative of a bearing's condition. See Tandon and Nakra[26] for a comprehensive explanation.
• Kurtosis Kurtosis is also known as the fourth statistical moment of a distribution and is a measure of the distribution's spread. The Kurtosis value for a Gaussian distribution is 3. Kurtosis values higher than 3 indicate a larger spread in a distribution which is often the first signs of a bearing defect according to Heyns[lO). It is calculated by: with (T being the standard deviation. (2)

Frequency Domain Analysis Techniques
Spectral analysis requires considerably more computational resources that time domain analysis primarily because the Fast Fourier Transform (FFT) of time signals has to be calculated. Frequency domain techniques often encountered are: • Power Spectral Density analysis (PSD) Spectral analysis is the technique used most often in vibration monitoring programs in South Africa because of its versatility. Almost any mechanical defect in rotating machinery can be identified by this type of analysis as was shown by Drouiche et. aI. [8]. Spectral analysis is also the backbone of waterfall plots which is often used for decision making purposes.
• Cepstrum analysis Cepstrum analysis is in essence the spectrum of a logarithmic spectrum and it is used to identify periodicity in the frequency spectrum. It is very useful for echo detection and for the measurement of properties of reflecting surfaces. • High Frequency Resonance Technique (HFRT) This is a very complex envelope detection type of technique which is often used to detect outer race defects on bearings although it is not encountered frequently . See Tandon and Nakra(26) for more information.

Other
Several other techniques which can be considered hybrids between time domain and frequency domain techniques are also at the vibration analyst's disposal. The Shock Pulse Method (SPM) is such a technique . This technique is used to identify bearing defects by tuning the transducer's resonant frequency to the expected shock-frequency caused by the damaged bearing.
Other techniques which can be of enormous value are neural networks, self organising maps , fuzzy logic, time series analysis, coherence, frequency band energy methods, trending, correlation, envelope spectra analysis, propagation path identification using causality correlation techniques, frequency response functions and the recovery of temporal waveforms of source signals. These techniques are very rarely found in South African practice.

Typical Vibration Analysis Decision Making Techniques
The success of vibration monitoring, as a condition monitoring maintenance strategy, is almost entirely dependent on the appropriateness of benchmarks laid upon measured vibration parameters, since the benchmarks define renewal rules. These benchmarks should be specified in such a manner that the probability of an unexpected failure is very low without wasting useful remaining life of the component.
Original equipment manufacturers (OEM's) often give guidelines regarding safe vibration levels for their equipment, usually by means of vibration severity charts. These guidelines are more often than not very conservative to increase product turnover. Benchmark setting based on experience is the most reliable method to achieve optimal benchmarks. This can be a very expensive exercise however, because some failures will have to take pla ce to gain experience .

Shortcomings of Vibration Monitoring Decision Techniques
The theory of vibration analysis is well developed and can only be advantageous to the maintenance engineer if implemented correctly. A number of general shortcomings of vibration monitoring as pra ctised in the South African industry were identified by Vlok et . al.

Lack of Comparative Means between Current Vibration Condition and Past Vibration Behavior
Very often only short term changes in vibration levels are consider ed when assessing component reliability, i.e. only the vibrations measured during a specific component's life time are used to predict useful remaining life. This is usually done with the aid of waterfall plots where different vibration levels are presented in a user-friendly, graphical manner such th at it is easy to recognise trends in vibration behavior.
No verified or established means exist to consider long term vibration behavior in reliability estimations. Long term vibration behavior refers to vibration histories recorded from similar items that have failed under equivalent conditions in the past. Long term vibration behavior of items certainly holds extremely valuable information in terms of which current item reliability can be assessed since vibration conditions during an item's life tend to repeat itself in different components.

Significance of Vibration Parameters
Numerous vibration parameters or characteristics are usually measured and evaluated when monitoring the condition of a component as discussed above. In very few cases all of the mea sured parameters are significant in the failure process and often renewal decisions are made based on the level of a parameter totally insignificant in the failure process.

Calculation of Optimal Renewal Instant
Vibration monitoring is definitely not perfect as a predictive preventive maintenance strategy. A perfect predictive preventive maintenance strategy would be able to determine the exact length of an item's remaining life. No such method exists. Unexpected failures of items still do occur regardless of the fact that the vibration levels are monitored and given that unexpected failures are normally very expensive relative to preventive repla cements. Thus, renewal decisions based on vibration monitoring do not bring into account the risk of an expensive unexpected failure or the possibility of loss of useful remaining life due to premature renewal.

Lack of Means to Determine Vibration Monitoring Efficiency
It is very difficult to determine the true efficiency of vibration monitoring because it is imp ossible to accurately predi ct the useful rem aining life of a component which has been renewed preventively. In many cases expensive vibrat ion monitoring programs are used to prevent unexpected failures while the difference between the cost of renewal with vibrat ion monitoring is not much less than the cost of a run to failure strategy.

Lack of Commitment Towards Vibration Monitoring
Vlok[27J found that there is a general lack of commitment towards vibration monitoring in industry. In many cases expensive vibration monitoring equipment is used as the flagship of the main tenance department although inspections are done very irregularly and are not recorded properly. Often the information supplied by vibration monitoring is totally disregarded when a decision has to be made and experience or intuition is relied upon. Even when the vibration information is considered, the final decision is frequently left to the discretion of the vibration techn ician.
It does not matter how technologically ad vanced vibration monitoring is, if it is not practised correctly meaningful results are impossible to obtain.

Methodology
This approach strives to determine the long term minimum Life Cycle Cost (LCC) of a component by making use of a continuous statistical distribution that represents the renewal process within acceptable confidence limits. The renewal rule is to renew a particular item preventively at the instant! where the minimum LCC occurs or at failure , whichever comes first. The renewal rule balances the risk of having to spend Cf and the advantage in the cost difference between Cf and C p , without wasting useful remaining life of the item.

Statistical Model
The statistical distribution used most ext ensively to model maintenance failure data is t he Weibull distribution because of its enormous flexibility. Its probability density function is given by (see Hastings[9J): he sur vival function corresponding to (3) is determined by: In (3) t Time will be the only use para meter referred to when describing an item 's age in this paper although any other use paramet er can be used, e.g. tonn es processed, milage, etc.
for a data set with n renewals each at time T, with c, = 1 in case of failure and Ci = 0 for observed preventive renewals (censored observations). See Crowder et. al. [7] for a detailed discussion on censored observations. The goodness-of-fit of such fitted distribution can be evaluated with the x2-test as explained by Hines and Montgomeryll l].
From the fitted distribution the hazard rate of the specific item can be calculated. The hazard rate represents the instantaneous rate of failure of an item and is an indication of the item's risk to fail at a given time. It is calculated by: fJ er h(t) = ry' ry (6) The hazard rate is the most important function in probabilistic renewal theory because it gives guidelines regarding the choice of the most appropriate maintenance strategy. For a monotonic increasing hazard rate (increasing risk of failure with age),preventive maintenance would be a definite option although the costs involved would have the final say. The maintenance on an item with a constant or decreasing hazard rate would probably be dealt with most effectively using a corrective maintenance strategy. Condition monitoring or redesign is always a possibility however, regardless of the shape of the hazard rate.

Calculation of the LCC per Unit Time
With the renewal process modeled by the Weibull (or some other appropriate) distribution it is possible to calculate the LCC if renewed preventively at any time t p , provided that the costs involved, C f and C p , are known. The value of t p which results in the lowest LCC per unit time defines the renewal rule as was explained in the methodology of this approach.
The expected total LCC of an item, C t , if renewed at t p or at failure during every cycle, is calculated by (see Coetzee [5]): It is more convenient to express the LCC per unit time, therefore the expected cycle length has to be calculated as well: (8) with T p and T f being the time needed for preventive and failure renewals repectively. The total LCC per unit time ifrenewed at t p is thus given by C(t p ) = Cd L e . The minimum value of C(t p ) defines the renewal rule and is determined by graphical inspection or differentiation.
The LCC per unit time will not always have a distinct minimum. A distinct minimum is dependent on the shape of the hazard rate of the item and the ratio between C f and Cpo For aclear minimum, the hazard rate should be increasing (fJ > 1) and C f »C p •

Shortcomings of Renewal Decisions based on Probabilistic Failure Analysis
The single biggest shortcoming of probabilistic analyses as described above is the fact that it only uses recorded renewal times to model the renewal process and does not take any circumstantial influences (referred to as covariates) on renewal times into account. Circumstances in which an item operates, most often have an influence on the eventual renewal time of an item, e.g. pressure, temperature, foreign particles in lubricant, vibration levels, etc., and it makes thus sense to include these factors in the respective analyses. By doing this, the circumstantial influences are used as supporting evidence to explain the length of renewal times, thereby producing a much more accurate model. Traditional probabilistic models are however not able to do this.  [3]) in modelling the mortality of machines, i.e. failure of machines,

Methodology
The PHM is a regression model that models the hazard rate of an item while including the effects of covariates. (Covariates can be any numerical value describing the circumstances under which an item operates. In this paper we only consider vibration parameters). Because of this attribute of the model , it is ideal to overcome the mentioned shortcomings of renewal decisions based on vibration analysis and probabilistic failure analysis in that it includes both long term failure and vibration information, as well as the current vibration condition, in the model. By doing this, the PHM produces a much more accurate representation of the actual life situation at any given time.
Renewal decisions based on the PHM with vibration covariates are also based on the minimum LCC except that the minimum is not defined in terms of time only (as was the case in subsection 3.1) but in terms of the improved proportional hazard .rate, which is a function of time and covariates. Thus, an optimal risk level is calculated from recorded data of previous lives of similar items and then applied to an item currently in operation. If the working item's hazard rate, which is a function of time and the covariates measured at that time, exceeds the optimum, the item is renewed .

Statistical Model
The PHM consists of the product of a baseline hazard rate, dependent on time only, and a functional term, dependent on time and covariates.
h(t, z) = ho(t) . ).,("1 ' z) In (9), ho(t) is the time-dependent baseline hazard rate, ).,(;y . z) is the functional term , z is a vector containing measured covariate values and ;y is a regression vector associated with a particular data set . In practice, the exponential form of the functional term is used most often , i.e. ).,(;y. z) = exp(;y · z). The PHM is also used predominantly in its fully Weibull parameterised form for numerical convenience, then (9) becomes: Note that the covariate vector z(t) in (10) is presented as a function of time for the sake of generality and because vibration covariates are generally time-dependent.
The PHM as shown in (10) can be fitted to a data set by maximising the following expression for the likelihood: where i indexes failure times and j indexes both failure and suspension times. This model also works on the minimum LCC concept as described earlier, except that it is expressed in terI?s of risk.
To be able to calculate the optimal cost risk, d, at renewal it is necessary to predict the future covariate behavior. Markovian chains (see Hines and Montgomeryjl lj] are used for this prediction which results in transition probabilities of covariates from one state to the next. The expected LCC per unit time is a function of d and is given by: where Q(d) represents the probability that a failure will occur, while Wed) is the expected time until renewal, regardless of preventive action or failure. Makis and Jardine suggested algorithms with which the optimal value of d, d* can be calculated. The rule is to renew the item at the first instant where: A warning rule is also defined as: (13) ;y . 5 Case study (14) Data suitable to compare the three decision making techniques as described above was found at SASOL's Twistdraai plant at Secunda. The data was recorded from September 1, 1996 to November 1, 1999 on 8 identical Warman pumps used to circulate a water and magnetite solution under equivalent conditions. A vibration monitoring maintenance strategy was historically used on the pumps as basis for renewal decisions over the 791 day data collection horison . This vibration monitoring strategy lead to several unexpected failure renewals of the pumps, which is ideal for a comparative study. In this case study the vibration monitoring strategy is evaluated and compared to conventional probabilistic failure data analysis and PHM .:analysis.

Data Summary
Data on a total of 27 histories (life times) was collected which includes 11  Although the Twistdraai plant was willing to supply renewal data for this study, they were not able to provide real costs involved due to company policy. They did however provide scaled costs based on the average over the data collection horison of Cf=R 162 200 and Cp=R 25 000, which are proportionally correct.

Vibration Monitoring
Each pump has two roller element bearings between the driven pulley of the pump and th e pump itself. Vibration analysts believe that by monitoring a pumps's vibration levels on the bearings, all mechanical defects, whether it comes from the pump's end or the pulley's end, will be detected. This makes good theoretical sense according to Broach [4], Kadushin [14] , Norton[21] and Tandon et. al. [26] and it is also the methodology that vibration analysts concerned with the pumps at the Twistdraai plant have followed.
Vibration signals recorded from the pumps' bearings were analysed spectrally as described in subsection 2.2. Components in the vibration spectra of both bearings of special concern to the vibration technicians were: The components mentioned above were all considered to be good predictors of failure by the vibration technicians based on their experience with the pumps. Benchmarks were set for the 6'predictors and renewals were performed as soon as the benchmarks were exceeded. Waterfall plots were used as aids in the decision making process.
The real policy resulted in 11 unexpected failures and 8 preventive renewals . This resulted in a total real cost of R 345.16 per day. Of the total cost, R 63.21 per day (18.3%) was due to preventive renewals and R 281.95 per day (81.7%) due to failures.

Probabilistic Failure Analysis
The following statistically acceptable Weibull probability density function was obtained for the data set: which is shown graphically in Figure 1. It is clear from Figure 1 that h(t) is a strictly increasing function, which makes scheduled preventive renewal a definite option since CdCp :=:::; 6 (fairly high). With the Weibull function known it is also possible to calculate the LCC if replaced at any time t p using equations (7) and (

PHM with Vibration Covariates
For the PHM analysis, the components in the vibration spectra of importance to the vibration technicians as described above, were all used as covariates. Thus, the PHM process was started off with total of 12 covariates and then eliminated statistically (see where RF5 1 / 2 refers to 5xRF of bearing 1 and 5xRF of bearing 2 respectively. Equation (17) is a very surprising result in that only the two covariates associated with cavitation proved to be statistically significant in the failure process. Hence, the LCC was calculated in terms of the risk using equation (12). The results are shown in Figure 3. To test the model, the theoretical renewal policy was applied to th e data set as if it was used in a real life situation. This exercise yielded a total LCC of R 214.03 per day of which 47.0% was due to preventive renewals and 53.0% due to failures. A total of 80.0% of all renewals were done preventively and only 20.0% correctively. Clearly, the calculated policy is very realistic.

C onclusion
The possible economical benefit of preventive maintenance on items which fail according to a renewal process is clearly illustrated in the theory and case study of this paper. The magnitude of this benefit is determined by the quality of renewal decisions, which is often the responsibility of the maintenance engineer.
Vibration monitoring and probabilistic failure analysis are the most frequently encountered preventive renewal decision making techniques found in South Africa for predictive and use based preventive maintenance strategies. Both these techniques have had great industrial successes despite their shortcomings . The relative ly new approa ch of utilising the PHM with vibration covariates is an answer to many of these shortcomings . This approach enables the maintenance engineer to experience the best of both worlds, thereby making much improved renewal decisions, resulting in a very positive impact on the company's maintenance budget.