EXACT RUN LENGTH DISTRIBUTION OF THE DOUBLE SAMPLING X� CHART WITH ESTIMATED PROCESS PARAMETERS

Since the run length distribution is generally highly skewed, a significant concern about focusing too much on the average run length (ARL) criterion is that we may miss some crucial information about a control chart’s performance. Thus it is important to investigate the entire run length distribution of a control chart for an in-depth understanding before implementing the chart in process monitoring. In this paper, the percentiles of the run length distribution for the double sampling (DS) X chart with estimated process parameters are computed. Knowledge of the percentiles of the run length distribution provides a more comprehensive understanding of the expected behaviour of the run length. This additional information includes the early false alarm, the skewness of the run length distribution, and the median run length (MRL). A comparison of the run length distribution between the optimal ARL-based and MRL-based DS X chart with estimated process parameters is presented in this paper. Examples of applications are given to aid practitioners to select the best design scheme of the DS X chart with estimated process parameters, based on their specific purpose.


INTRODUCTION
Control charts are used to ascertain whether a business or manufacturing process is in a state of statistical control.Unequivocally, the existing literature focuses too much on the use of the average run length (ARL) criterion as a performance measure of a control chart with estimated process parameters.Because of the highly-skewed property of the run length distribution, many researchers (e.g., [1], [2], [3]) criticise the use of the ARL as the sole criterion for measuring a chart's performance.Thaga [4] stated that only a fraction of a chart's behaviour is reflected in the size of the ARL.Accordingly, Chakraborti [2], Khoo and Quah [5] and Radson and Boyd [6] advocated using a more credible measure -the percentiles of the run length distribution -as an alternative way of assessing a chart's performance.The percentiles explain the run length properties and provide detailed and important information about the run length distribution, and thus about the chart's performance.Gan [7] claimed that knowing the run length properties, such as the early false out-of-control signal and the median run length (MRL) of a control chart, enables engineers to have a full understanding of the working of a control chart.Thus their confidence will not be eroded when they encounter a few short run lengths with undiscovered assignable causes.Undeniably, it is crucial for us to examine the run length distribution of a control chart before it is implemented in process monitoring.
In this paper, we consider the Daudin's [8] double sampling (DS) X chart, which is a modified Shewhart X chart incorporating the double sampling plans.Daudin [8] proposed an optimisation model to minimise the in-control average sample size (ASS 0 ); while Irianto and Shinozaki [9] developed their research to suggest the minimisation of the out-of-control ARL (ARL 1 ).The advantage of the DS X chart is improved statistical efficiency without increasing the sample size [10].Additionally, He and Grigoryan [11] pointed out that the DS chart is a good option when greater efficiency is required for small shifts, and when protection against large shifts is also vital.According to Daudin [8] and Costa [12], the DS X chart is superior to the Shewhart X , variable sample size (VSS) X , variable sampling interval (VSI) X , exponentially weighted moving average (EWMA), and cumulative sum (CUSUM) charts in some cases.For example, the DS X chart is preferred to the Shewhart X chart for identifying small and moderate process mean shifts.The former also dramatically reduces the in-control sample size to nearly 50 per cent of the latter [8].
When detecting small and moderate process mean shifts, Costa [12] stated that the sample size of the DS X chart is more economical than that of the VSS X chart.These advantages lead to the conclusion that the DS scheme is an appropriate choice for process monitoring with higher inspection costs or destructive testing [13].
Given the motivations for and merits of using the DS chart, a number of researchers have studied the DS charts extensively in recent years.Costa and Claro [14] applied the DS X chart to monitor a process in which the measurements can be modeled as a first-order autoregressive moving average.Torng and Lee [15] studied the DS X chart when the observations follow non-normal distributions.The combined DS and VSI (DSVSI) X chart was proposed by Carot et al. [16], who claimed that the DSVSI X chart is more sensitive to small and moderate process mean shifts.Inspired by Carot et al. [16], Torng et al. [17] furthered this research and discussed the DSVSI X chart under non-normality.Then Lee et al. [18] proposed the economic design of the DSVSI X chart.Recently, Khoo et al. [19] introduced a synthetic DS X chart, which performs better than the standard DS X and synthetic X charts, for all levels of shifts.
In real applications, a control chart is applied in a two-phase operation.Control charts are used to determine an in-control historical data set in the Phase-I analysis; while in the Phase-II monitoring, control charts are used to detect an out-of-control signal after an unknown time point.In practice, the process parameters are rarely known, and they are usually estimated from the Phase-I dataset.Therefore, in this paper, we focus on the DS X chart when the process parameters are estimated.
A number of researchers ( [20], [21], [22]) have contributed to the field of control charts with estimated process parameters.A thorough literature review on the effect of parameter estimation on control charts' properties was done by Jensen et al. [23].The DS X chart with estimated process parameters, optimally designed based on the ARL, was first developed by Khoo et al. [24].Teoh et al. [3] took the study of Khoo et al. [24] further and developed an optimisation model to minimise the out-of-control MRL (MRL 1 ).They concluded that the proposed optimal MRL-based DS X chart with estimated process parameters has a lower false alarm rate and provides a more straightforward interpretation.
In this paper, we investigate the percentiles (5 th , 10 th , 25 th , 50 th , 75 th and 95 th percentiles) of the DS X chart with estimated process parameters, which is a thorough and exact analysis of the entire run length distribution.These percentiles of the run length distribution are computed for a given desired in-control ARL (ARL 0 ) or MRL (MRL 0 ) value.This analysis is currently not yet available for the DS X chart with estimated process parameters.Since process parameters are usually estimated from a historical dataset, our results provide added insight into the DS X chart's performance.This will arouse interest among practitioners, as further knowledge regarding the exact behaviour of the DS X chart with estimated process parameters is now made available.For instance, the 5 th and 95 th percentiles give beneficial information about the spread or dispersion of the run length distribution.
The remainder of this paper is organised as follows: In Section 2, the operation of the DS X chart is outlined.Section 3 studies the run length properties of the DS X chart with known and estimated process parameters.The performance based on the percentiles of the run length distribution for the DS X chart is examined in Section 4. In Section 5, examples of applications to aid engineers in the selection of a suitable design scheme for the DS X chart with estimated process parameters are presented.Finally, conclusions are drawn in Section 6.

THE DS 𝑿 � CHART
Let Y be the random variable representing observations taken from a Phase-II process, where Y follows a normal distribution with the in-control mean, µ 0 and standard deviation, σ 0 .The DS . Note that L 1 >0 is the warning limit for the first-sample stage; while ≥ 1 L L and L 2 >0 are the control limits for the first- sample and combined-sample stages respectively.With the aid of Figure 1, the operation procedure for the Daudin's [8] DS X chart is as follows:  ) (c) The process is considered as out-of-control if ( ) + n n .
(f) The process is deemed as in-control if ( )

Figure 1: Graphical view of the operation for the DS X chart
First sample Combined samples

THE RUN LENGTH PROPERTIES OF THE DS X CHART
Let RL be the run length of a control chart.Then, the ( ) ( ) For the DS X chart with known process parameters, the cumulative distribution function (cdf) is RL where  ∈ {1, 2, 3, ...} and = + 1 2 a a a P P P is the probability that the process is in-control.Daudin [8] showed that for a given magnitude of standardised mean shift, and Here, ( ) Φ ⋅ and ( ) φ ⋅ represent the cdf and probability density function (pdf) of the standard normal random variable, The ARL, standard deviation of the run length (SDRL), and average sample size (ASS) at each sampling time are defined as ( [8]): When the process parameters are unknown, we need to take m samples, each of size n, from an in-control Phase-I process in order to estimate the mean 0 μ and standard deviation 0 σ before starting the Phase-II process monitoring.For the DS X chart with estimated process parameters, the cdf of the RL is expressed as ([3]): where  ∈ {1, 2, 3, ...} and the conditional probability .Here, and ( ) where and ( ) Note that the random variables U and V in Equations ( 10), ( 12) and ( 13) are defined as ( ) , respectively.The pdfs of U and V, i.e , respectively, where is the pdf of the gamma distribution.
Teoh et al. [3] also demonstrated that the unconditional ARL, SDRL and ASS of the DS X chart with estimated process parameters are equal to and ( ) ( ) where

PERFORMANCES BASED ON PERCENTILES OF THE RUN LENGTH DISTRIBUTION
In this section we investigate and compare the percentiles of the run length distribution for both the optimal ARL-based and MRL-based DS X charts with estimated process parameters, proposed by Khoo et al. [24] and Teoh et al. [3] respectively.When ARL 0 =250, ASS 0 =5 and δ opt ∈ {0.5, 1.5}, Tables 1 and 2 provide us with the exact ARL, SDRL, ASS and percentiles for the ARL-based DS X chart.Here, δ opt represents the desired mean shift, for which a quick detection is intended.The optimal chart parameters (n 1 , n 2 , L 1 , L, L 2 ) for the estimated-process-parameter case (m ∈ {10, 20, 40, 80}) and known-process-parameter case ( m = +∞ ) are acquired from the optimisation algorithms aiming at minimising the ARL 1 , which were suggested by Khoo et al. [24] and Irianto and Shinozaki [9] respectively.By using these optimal chart's parameters (n 1 , n 2 , L 1 , L, L 2 ), the ARL, SDRL, ASS and percentiles of the run length distribution, for different shifts δ, are computed based on the formulae shown in Section 3. Note that Equation (1) together with Equations ( 2) and ( 9) are used to calculate the percentiles for the cases with known and estimated process parameters respectively.
The 25 th and 75 th percentiles reveal some useful information, because the middle half of the distribution is included between these values.By referring to Table 2, when m=40 and δ=0.5, there is a probability of 0.25 that the run length of the DS X chart is less than 4. Also, 25 per cent of the time, the run length of the chart is greater than 16.The shape or the degree of skewness of the run length distribution is also of great interest to engineers.This skewness can be observed from the difference between the 5 th and 95 th percentiles or the 25 th and 75 th percentiles.For example, when δ opt =0.5, m=10 and δ=0, the 5 th percentile is somewhat smaller (i.e., 5) and the 95 th percentile is significantly larger (i.e., 957).This indicates that there is more variation in the run length distribution that has a longer right-tail.The difference between these two percentilesand hence the degree of skewness of the run length distribution -reduces as m, δ opt and δ increase.
It is clear from Tables 1 and 2 that the 50 th percentiles (MRL 0 s) are much less than their respective ARL 0 s when δ=0.When δ opt =1.5, the MRL 0 values for { } 10, m ∈ +∞ are 166 and 173, which are remarkably less than the ARL 0 =250.This condition worsens when m and δ opt are small.For example, when m=10 and δ opt =0.5, although the ARL 0 value is still 250, the MRL 0 value drops to 88.For all the cases considered in Tables 1 and 2, despite the same ARL 0 =250 value, the number of samples required to signal a false alarm in 50 per cent of the time is different for all the m considered.This single example shows that interpretations and conclusions based on ARL alone are confusing.Therefore, the percentiles of the run length distribution provide a more representative and reliable quantity for evaluating a control chart's performance.However, when m or δ increases, the ARL 1 becomes closer to the MRL 1 .This shows that the skewness of the run length distribution decreases as m or δ increases.
Undoubtedly, an analysis of the early false alarms is viewed by engineers as vital.A high false alarm rate is undesirable in industry, as it will lead to time-and cost-wasting corrective actions and unnecessary process adjustments.The lower percentiles of the run length distribution, such as the 5 th , 10 th and 25 th percentiles when the process is in-control, i.e. δ=0, provide information about the early false alarms.For example, in the case of δ opt =0.5, m=10, there is a 10 per cent chance that a false out-of-control signal will occur by the 10 th sample.We notice that when process parameters are estimated, the run length of the 5 th , 10 th and 25 th percentiles when δ=0 are short, indicating very early false alarms with the specified rates of 0.05, 0.1 and 0.25.This situation improves when m or δ opt increases, as the values associated with the 5 th , 10 th and 25 th percentiles increase when δ=0.From Tables 1 and 2, we observe that setting the ARL 0 at a desired value will not ensure an acceptable early false alarm.Therefore, an analysis of the early false alarms should be conducted in the design of a control chart.
Tables 3 and 4 give an overview of the ARL, SDRL, ASS and percentiles of the MRL-based DS X chart when MRL 0 =250, ASS 0 =5 and δ opt ∈ {0.5, 1.5}.The optimisation algorithms of the DS X chart with known and estimated process parameters for minimising the MRL 1 , proposed by Teoh et al. [3], are employed here to obtain the optimal (n 1 , n 2 , L 1 , L, L 2 ) combinations.The ARL, SDRL, ASS and percentiles of the run length distribution shown in Tables 3 and 4 are calculated based on the formulae provided in Section 3.
All the cases considered in Tables 3 and 4 have the same MRL 0 =250, irrespective of the value of m used.This provides useful information to practitioners that, 50 per cent of the time, a false alarm will occur by the 250 th sample, following process start-up.We wish to highlight that, for the optimal MRL-based chart with estimated process parameters, with a fixed MRL 0 =250, for each m, the chart will have the same MRL 0 value even though the ARL 0 values for each m are different.However, these differences in ARL 0 values will not pose any practical problems to practitioners, because MRL represents 50 per cent of the time; but this is not the case for the ARL.For a skewed distribution, such as the run length distribution that follows a geometric distribution, the mean is greater than the median, and so the mean is not a suitable representation of the center of the distribution.Thus the computation of ARL for the optimal MRL-based chart does not provide enough practical information to practitioners.Note that a similar situation is obtained for the ARLbased chart; that is.when ARL 0 =250 for any m, the MRL 0 values are different for each m (see Tables 1 and 2).As discussed above, since ARL is not an intuitive representation of the run length distribution, the percentiles of the run length distribution need to be computed to supplement the ARL-based chart.An investigation of the percentiles of the run length distribution in Tables 1 to 4 when δ=0 reveals that the false alarms for the ARL-based chart occur considerably earlier than for the MRL-based chart, for cases of both known and estimated process parameters (see the lower percentiles, e.g., 5 th , 10 th and 25 th ).If the probability of an early false alarm is of main concern, the lower percentiles (say, the 5 th percentile) can be used as an additional criterion in selecting a suitable scheme for a control chart.
Concerning the out-of-control average sample size (ASS 1 ), the value decreases conspicuously when using the MRL-based chart over the ARL-based chart.For instance, when δ opt =1.5, m=20 and δ=1, the ASS 1 decreases from 8.22 (ARL-based chart) to 5.69 (MRL-based chart).As expected, when the shift is small, the difference in the percentiles of the run length distribution between the MRLbased and ARL-based charts is noticeable.This difference, however, becomes negligible when the shift is moderate or large.Tables 1 to 4 show that this difference is larger when δ=0 than that when δ>0.

EXAMPLES OF APPLICATIONS
Computing the percentiles of the run length distribution of the DS X chart provides valuable information to help engineers in the selection of an appropriate control chart scheme.In the Phase-I analysis, suppose that an engineer would like to take 20 samples, each of five observations, in a certain manufacturing process.In the Phase-II process monitoring, four different situations are considered in the following four examples:

EXAMPLE 1
In many industrial processes, early false alarms are a major consideration.A smaller value of the lower percentiles (e.g., 5 th , 10 th and 25 th ) when δ=0 will give rise to earlier (or more frequent) false alarms, resulting in repeated process stops and start-ups within shorter time intervals.Based on the lower percentiles of the run length distribution displayed in Tables 1 to 4, we notice that the ARL-based chart produces shorter run lengths at the lower percentiles of the run length distribution when δ=0, compared with those of the MRL-based chart.For example, when δ opt =0.5, the run length of the 10 th percentile (when δ=0) for the ARL-based chart is 16 (see Table 1) as opposed to 30 (see Table 3) for the MRL-based chart.This value of the run length increases by nearly twofold by using the ARL-based chart in place of the MRL-based chart.Accordingly, it is recommended to choose the MRL-based chart with estimated process parameters, as the ARLbased chart signals early false alarm earlier than the MRL-based chart.

EXAMPLE 2
One of the meaningful pieces of information obtained from the computation of the percentiles of the run length distribution is the MRL.If management really wants to have a good understanding of the control chart and to avoid making inaccurate conclusions based on the ARL alone, the design scheme based on MRL helps to alleviate this problem.The MRL-based chart gives a clear picture that, on average, 50 per cent of all the run lengths are less than 250 when δ=0.Meanwhile, for the ARL-based chart, ARL 0 =250 only provides information about the expected run length; it does not indicate the likelihood, say 50 per cent of the time as in the case of the MRL 0 , of getting a false alarm by a certain sample.Thus there could be a risk that a practitioner incorrectly interprets the ARL 0 as a false alarm that would occur by the 250 th sample, with half the chance; although in an actual scenario, a false alarm occurs significantly earlier, i.e. by the 123 rd sample when δ opt =0.5 (see Table 1), with half the chance.Similar interpretation problems will be encountered for the out-of-control cases.If an engineer's confidence and understanding is viewed as crucial, the MRLbased chart with estimated process parameters is a more suitable option, as it provides more intuitive and critical information to the engineer.

EXAMPLE 3
In a manufacturing process, assume that an engineer determines that a mean shift in the range of 1.0≤δ≤2.0 is not acceptable and must be identified as quickly as possible.The control-chart scheme that is more sensitive in detecting an out-of-control condition at this shift will be the best choice.Since Tables 2 and 4 provide us with the results computed from the optimal chart parameters of δ opt =1.5, we will select one from these two schemes.For example, when δ=1, the design scheme of the ARL-based chart (see Table 2) gives the engineer 95 per cent confidence that an out-of-control signal is disclosed by the 4 th sample, i.e. 10 samples earlier than the MRL-based chart (see Table 4).If the detection speed is a main concern, it would be a better choice to opt for the design scheme of the ARL-based DS X chart with estimated process parameters.Therefore the chart's parameters (n 1 , n 2 , L 1 , L, L 2 )=(4, 6, 1.4232, 4.4648, 2.8008) will be selected for this case.Nevertheless, the engineer needs to realise that the ARL-based chart, although it detects the changes quickly, has a higher false out-of-control signal (see Example 1).Moreover, as discussed in Example 2, he/she may have a tendency to relate the ARL value to the MRL, and make an inappropriate decision based on the ARL alone.

EXAMPLE 4
If making a large number of observations in a manufacturing process is not a problem, as in the case of mass production, an engineer may consider making more observations in each sample for the Phase-I and Phase-II processes, in order to increase the sensitivity of a control chart.Table 5 shows the ARL, SDRL, ASS and various percentiles of the run length distribution when n=ASS 0 =10 is used in both the Phase-I and Phase-II processes.Suppose that an engineer plans to take 20 samples, each of size 10, in the Phase-I process, and intends to discover a mean shift in the range of 1.0≤δ≤2.0.When n=10, the engineer can claim with 90 per cent certainty that the DS X chart with estimated process parameters will detect a shift of size δ=1 by the 3 rd sample (see Table 5) compared with that of the 10 th sample (see Table 4) when n=5.This implies that increasing n will increase the detection speed of a control chart.Thus, when process parameters are estimated, the DS X chart designed based on a larger sample size n will be our consideration.

(a) After taking a first sample of size 1 n
, calculate the sample mean 1i Y at the i th sampling time of the first sample.(b) The process is in-control if