IMPROVING PROJECT SCHEDULE ESTIMATES USING HISTORICAL DATA AND SIMULATION

Many projects are not completed on time or within the original budget. This is caused by uncertainty in project variables as well as the occurrence of risk events. A study was done to determine ways of measuring the risk in development projects executed by a mining company in South Africa. The main objective of the study was to determine whether historical project data would provide a more accurate means of estimating the total project duration. Original estimates and actual completion times for tasks of a number of projects were analysed and compared. The results of the study indicated that a more accurate total duration for a project could be obtained by making use of historical project data. The accuracy of estimates could be improved further by building a comprehensive project schedule database within a specific industry. OPSOMMING Verskeie projekte word nie binne die oorspronklike skedule of begroting voltooi nie. Dit word dikwels veroorsaak deur onsekerheid oor projekveranderlikes en die voorkoms van risiko’s. 'n Studie is gedoen om 'n metode te ontwikkel om risiko te meet vir ontwikkelingsprojekte van 'n mynmaatskappy in Suid Afrika. Die hoofdoel van die studie was om te bepaal of historiese projekdata gebruik kon word om 'n akkurater tydsduur vir 'n projek te beraam. Die geraamde tydsduur van take vir 'n aantal projekte is ontleed en vergelyk met die werklike tydsduur. Die resultate van die studie het getoon dat 'n akkurater totale tydsduur vir die projek verkry kon word deur gebruik te maak van historiese projekdata. Die akkuraatheid kan verder verbeter word deur 'n databasis van projekskedules vir 'n bepaalde industrie te ontwikkel en by datum te hou. 1 This author was enrolled for the MEng (Project Management) at the Department of Engineering and Technology Management, University of Pretoria http://sajie.journals.ac.za


INTRODUCTION 1.1 Background of company
The economic growth rate in South Africa forces many companies to embark on extension or upgrade projects regularly.Examples of such projects are found in electricity generation, fuel production, aviation and roads to mention just a few.A mining company was recently faced with a similar situation to increase capacity to satisfy the anticipated demand for base metals arising over the next five years.To satisfy this demand the company embarked on an extensive de-bottlenecking project.
The purpose of the de-bottlenecking project was to increase the nickel production of the base metals refinery.The company decided to install an extra coal-fired boiler to supplement the existing steam generation plant.The addition of a sixth boiler would increase the steam generation from a nominal 85 tons/h to 106 tons/h at a pressure of 1300 kPa.The additional boiler would reduce the load on the existing boilers and alleviate the environmental impact of the existing boilers.The total project cost was estimated at R12,5 million.

Motivation for research project
During the initial planning for the de-bottlenecking project, the question was raised as to the accuracy of the schedule estimates for development type projects executed by the company.It was observed that many projects slipped in terms of schedule with a corresponding increase in the project cost up to completion and closeout.If a project had to be completed within the orginal schedule, an injection of resources, typically labour and money, was often made to ensure compliance.
The ability to accurately measure the schedule risks associated with a project could allow a tighter schedule to be kept and ultimately cost savings for the project.Accurate control of the project schedule would allow for better control of the overall project.An investigation was therefore performed to develop a method of measuring schedule risk, and to enhance estimating of project schedules by using historical data on development type projects.

Methodology
Various methods of measuring risk were analysed by means of a literature study.Three phases of project life cycle were identified, i.e.Administration, Construction and Commissioning.These phases were broken down further into 13 project activities and the actual duration for each of these activities was determined and compared with the estimated duration.The risk in schedule was thus measured for each of the different life cycle phases of the project.Schedule data was obtained for seven projects that had been completed within the company in the last decade.This data was then used to develop a model for more accurate determination of the duration of the project activities and phases, and ultimately the total project.http://sajie.journals.ac.za

Project risk management processes
A number of variations of project risk management (PRM) processes have been developed in the last decade.The most well known PRM process is the six-step process suggested in the PMBoK-2000 [1].Recently Smith and Merritt [2] also formulated a six-step process involving the following phases: • Identify risks  [4] suggested a process consisting of two main phases: risk assessment, which includes identification, analysis and prioritisation, and risk control, which includes risk management planning, risk resolution and risk monitoring.
It is evident from these representative PRM processes that there is general agreement regarding what should be included in the process, with differences depending on variations in the level of detail and on the assignment of activities to steps and phases.Various tools and techniques that can be used in the application of a PRM process are discussed in detail by Raz [5].

Measuring project risk
McGrew [6] indicated that to investigate the effect and impact of measuring the risk involved in a project, it was necessary to establish the impact of risk analysis itself on the project.
According to Ward [7] a common problem in project risk management processes is the need to determine the relative significance of different sources of risk so as to guide subsequent risk management effort.Also that the most common approach to risk assessment and management is the use of a summary risk register, which lists the http://sajie.journals.ac.za project risks and associated information in tabular format.A tabular form of a risk register has the attraction of simplicity and convenience, but has a number of shortcomings as indicated by Chapman [3], i.e.
• Individual risk drivers may not be described in sufficient detail to avoid ambiguity and misunderstandings about which risk is being described.• A table of risk drivers, particularly a lengthy one, provides limited guidance on the relative importance of the individual risk drivers.• Important inter-dependencies between risks are not readily highlighted.
Chapman and Ward [3] came to the conclusion that summation of the ratings for individual risks to indicate an overall amount of risk exposure on a project is of little value and any temptation to do so should be resisted.The desire to identify key risks in a project is a natural one, based on recognition that resources available to manage risk are limited, and that risk management efforts need to be cost-effective.It is important to include a risk register in the information on the timing of risks and the responses, the resources required by alternative responses, and information regarding the interdependencies of risks.The importance of risk typically rests on other factors besides probability and impact, and different considerations can apply in different phases of the risk management process.
Keeping in mind that this investigation will include the practice and measurement of project risk it is clear from the above discussion that simply constructing a risk register would be a useful management tool but it would warrant greater "substance".Thus determining a method of not simply summing individual risk, but rather obtaining a meaningful procedure of determining the total project risk, is needed.It will be useful to develop a means that incorporates the interdependencies of risk thus ensuring that the total effects of all the risks are accounted for.
The literature indicates that limited tools exist to measure total project risk.The measurement of risk can vary from a purely analytical method to a purely subjective method.This report investigates the utilisation of historical data to formulate a quantitative analytical method in determining the impact of risk events on schedule delays of the project.

The project schedule problem
When a project schedule is viewed the standard practice within the industry is to aim for a specific end date.Thus, once the project schedule is established, the project manager usually sees the end date as invariant and makes use of every possible measure to ensure that the end date is met.If problems are experienced during execution of the project, delays could occur.The project manager then attempts to recover from the delays experienced by reducing the duration of the activities that still have to be completed.
The duration of an activity can only be reduced by the application of additional http://sajie.journals.ac.za resources (labour, material or funds).These resources usually have associated monetary values which increase the final cost of the project.Once the injection of resources has been made it is usually not sufficient to bring the project back to the original schedule, and a delay to a certain extent is still experienced.

Experimental data
With the installation of the sixth boiler at a metals refinery the usual procedures were followed to ensure that an accurate schedule was established.This procedure consisted of several steps.First specialist and expert personnel were consulted to obtain three values for each activity within the project (optimistic value, most likely value, and a pessimistic value).The triangular distribution for each activity or phase of the project was then used to determine the most probable duration for the project from the activity durations.The @Risk [8] simulation software add-in for MS Excel was used to determine the expected value for each distribution.
The schedule information for a number of completed projects at the metals refinery was then analysed to determine whether a more accurate estimate for project schedule could be obtained.The total project cost for the selected projects varied from R1,3 million to R18,5 million.
Schedule data was obtained from historical records for seven projects of the company.Although each project is unique, e.g. the installation of the additional coal fired boiler, some of the individual activities are similar for various projects.The schedule risk associated with the different activities of the project, such as the civil or electrical components, could be estimated more easily for smaller activities.Ultimately the evaluation of the historical schedule data for each activity should allow for a better understanding of the total project risk.

Introduction
A model needed to be developed from the collected data that would enable the prediction of the durations of activities of the project.Although initially great effort was placed in the development of the schedule for the installation of the sixth boiler, historical data and experience clearly indicated that even if these measures were followed project delays would still be experienced and the original estimates for the schedule still remained inaccurate.
The overall development project was broken down into three main phases, and the phases were further broken down into activities.The percentage of over or under estimation of the duration of a specific activity within the project was then determined for each activity and for all seven projects.An example of the schedule data captured is shown in

Comparison of actual vs. estimated durations
Making use of the percentage calculation caters for different projects within the industry and it is foreseen that ultimately using this method will compensate for differences experienced within the different phases and different projects.Table 2 summaries the results obtained for all seven projects analysed.
The percentage over or below estimate for each project as indicated in table 2 was calculated by taking the original estimated duration for an activity and comparing that with the actual duration for the activity.In one case the actual duration took up to 900% longer than the estimate.In general the duration was longer, but in a few cases the task was completed in a shorter time than the estimated duration.On some projects the data was not available for all the project activities.As mentioned earlier, project managers usually compensated for the slippage early in the project by means of an injection of resources later in the project.
The data as given in  It was found that the Beta General distribution was adequate for most of the activities of the projects.Since no data was captured for some activities of some projects, only 5 data points were available in certain cases.Obviously, the more data points that are available, a greater confidence can be placed in the selected distribution and its parameters.These distributions were then used to develop a model for improving schedule estimates for future projects.

IMPROVEMENT OF SCHEDULE ESTIMATION
The information obtained from the fit of the data was used to test whether the technique could improve the estimates for the duration of project activities.Another project, i.e. the 'Boiler Installation', was therefore used to test the accuracy of the model.
For this investigation it was decided to use the mean values from the distributions for the activity durations.The new estimate was calculated by means of the following equation: (1) where: T est is the new estimate for activity duration that includes a risk factor T initial is the original estimate for the activity duration D is the over or under estimation based on the distribution of the fitted data The value of D will clearly vary from industry to industry, but should be reasonably accurate provided that a sufficient database within the specified industry exists.At completion of the 'Boiler Installation' project the actual activity durations were compared with the original estimates as well as the improved estimates using the probability distributions for each activity.The new estimates compared with the original estimates are shown in figure 1 below.
Figure 1 indicates the more accurate estimates obtained using the fitted distributions in comparison with the original estimates.Additional data from completed projects should lead to increased accuracy and the standard deviation of the fitted distributions should decrease.Using the fitted distribution from the historic schedule data of the seven projects, the improvement in estimating the duration of activities of a project is approximately 30%.
The "reliability" of the new estimates (and the underlying distributions) was determined by means of a visual representation on a graph.The standard deviation for the distribution of each project activity was used to calculate a mean plus standard deviation and a mean minus standard deviation.These data points were plotted on the same graph as the mean values and are shown in figure 2.
-  From figure 2 it can be seen that four of the project activities show a large standard deviation.These are (a) Compilation of documentation, (b) Approval of documentation, (c) Mechanicals, and (d) Hot commissioning.Further analysis of the duration of certain activities for some of the projects revealed some exceptional circumstances that would not normally be found on projects.These data points could be removed from the data, but that would leave too few data points for the required accuracy.More data points from historic projects would therefore be needed to improve the reliability of the model and estimate of the activity duration.Except for these 4 activities, the standard deviation for all other activities is small enough to be useful in estimating the duration for future projects.
When (a) 'Compilation of documentation', and (b) 'Approval of documentation' was analysed in detail it became apparent that the individual activities making up these two elements are overlapping to a certain extent due to the current way the procedures are set up.The measurement of these two activities should become more accurate once the method of analysis described in this paper is adopted and the estimate is done with this in mind.

TOTAL PROJECT SCHEDULE
The distributions for all thirteen activities of a development project of the mine could be used as an input to develop a model for estimating the total project duration.A Monte Carlo simulation uses the probability distributions for the project activities and phases to evolve a probability distribution for the total project duration as illustrated in figure 3   The time distributions for the 13 activities cannot simply be added to obtain the total project duration since some of these activities could be performed in parallel.A network diagram is therefore also required to perform a schedule simulation.Once a total time distribution has been calculated for the project, various measures of the total time distribution could be used to determine an estimate for total project time, for example the mean value, the 80% percentile, or the 90% percentile.It is customary to use the 90% certainty point and determine the project duration at that point on the cumulative distribution function.

CONCLUSIONS AND RECOMMENDATIONS
The investigation and analysis of the actual duration and estimated duration of thirteen project activities has indicated that historic data is useful to develop a model for obtaining better estimates of duration.Even though data from only 7 projects was used to determine a probability distribution for the percentage over or under estimation, the improvement in duration estimates was in excess of 30%.This technique is therefore useful for improving duration estimates on development projects.The expansion of a project database should progressively improve the reliability of the model and therefore the duration estimates for future projects.It can therefore serve as a valuable tool for the project manager in controlling the project schedule.
It is recommended that detailed monitoring of project schedules should be done, and at the completion of the projects detailed project schedule data should be compiled with the actual time spent on the various activities.These schedules should then be compared to the original schedules and any variations on the schedules should be clearly noted.This data could then be transferred to the database where it can be incorporated into the duration estimate model.Maintaining the database should be the responsibility of the project manager and it could be used as an analytical tool to determine the accuracy of the original time estimates.

Figure 3 :
Figure 3: Process to determine distribution for total project duration Define the key aspects of the project • Focus on a strategic approach to risk management • Identify where risk might arise • Structure the information about risk assumptions • Assign ownership of risk and responses • Estimate the extent of uncertainty • Evaluate the relative magnitude of the various risks •

Table 1
below for the 'Autoclave Relining' project.

Table 1 :
Actual vs. Estimate duration for 'Autoclave Relining' projectThe worst estimate in this project was for the 'Approval of Documentation' activity that took 85 days whereas the estimate was only 30 days.Some of the actual durations were close to the estimates, but for the 'Plating and Pipework' the actual duration was less than the estimate, i.e. 14 days whereas the estimate was 21 days.

Table 2 : Schedule difference for selected projects (values in %)
http://sajie.journals.ac.za table 2 was then used to fit a probability distribution that could represent the uncertainty in the activity duration.fit of the data to a probability distribution was done by means of the Bestfit software add-in which is part of the Decision Tools suite of the Palisade Corporation [8].This software determines the best distribution that fits a given set of data.All the schedule data shown in table 2 was then analysed with Bestfit and the best distributions are given in table 3.