Small Business Resources, Business Advice and Forms from AllBusiness.com

A simple and logical alternative for making PERT time estimates.

By Zhang, Yue
Publication: IIE Transactions
Date: Friday, March 1 1996

1. Introduction

The following PERT formulae are among the most basic and widely taught MS/OR techniques:

[[Mu].sub.e] = (a + 4m + b)/6, (1a)

[[Sigma].sub.e] = (b - a)/6, (1b)

where [[Mu].sub.e] and [[Sigma].sub.e] are, respectively, the estimated mean and standard

deviation of a task's stochastic time T; and a, m and b are, respectively, the expert's estimates of the 'optimistic', 'most likely' and 'pessimistic' times.

Present and future MS/OR practitioners learn these formulae from standard textbooks and references, which often also indicate that formulae (1a) and (1b) result from assuming that T is beta distributed, and are therefore valid for a variable (i.e., T) that can have a wide variety of distribution shapes. Moreover, the applications of (1a) and (1b) are not limited to PERT project networks, because their relevance for estimating uncertain quantities in any stochastic model has long been recognized; see, for example, DeCoster (1964) in managerial accounting, Ostwald (1974) in engineering cost estimation, and Van Horne (1980) in capital budgeting and financial management.

In the research literature, many authors (e.g., Healy, 1961; Grubbs, 1962; MacCrimmon and Ryavec, 1964; Donaldson, 1965; Moder and Rodgers, 1968; Swanson and Pazer, 1971; Sasieni, 1986; Littlefield and Randolph, 1987; Gallagher, 1987) have pointed out that the assumption of beta-distributed T does not lead to (1a) and (1b); instead, (1a) and (1b) apply to only a very restricted subset of the beta distribution (Grubbs, 1962; Gallagher, 1987), whereas real-life T distributions are not similarly restricted. Nevertheless, the MS/OR community continues to use (1a) and (1b) as the standard, perhaps because many perceive that: (i) the formulae are not too incorrect logically and numerically; (ii) no simple alternative exists. In this paper, two other logical shortcomings of (1a) and (1b) are pointed out, and numerical errors resulting from using them are shown to be far from negligible. More importantly, we develop alternative formulae that are consistent with the basic objectives of (1a) and (1b), but are also logical, simple and considerably more accurate.

1.1. Overview

The basic objectives behind using something like (1a) and (1b) are:

Objective 1, elicit subjective time estimates from an 'expert';

Objective 2, convert these estimates into T's [Mu] (mean) and [Sigma] (standard deviation) recognizing that T's distribution can have a wide variety of shapes (as do the beta distribution).

Section 2 points out the shortcomings of (1a) and (1b) in handling Objective 1, and a more logical alternative is proposed. Section 3 develops alternative formulae for computing T's [Mu] and [Sigma] (Objective 2); it also demonstrates that these alternatives are much more accurate than (1a) and (1b). Our findings are summarized in Section 4.

2. On making subjective time estimates

2.1. The fractile method

Estimating a, m and b in PERT is the initial step for obtaining the 'subjective probability distribution' of the stochastic task time T. A huge literature exists on the elicitation of subjective probability distributions (see, for example, Hampton et al., 1973; Chesley, 1975; Spetzler and Stael von Holstein, 1975; Wallsten and Budescu, 1983). From this literature, it is apparent that the most straightforward and common method for eliciting T's subjective probability distribution is the 'fractile method' (a 'fractile' is also known as a 'quantile'):

Define [T.sub.[Alpha]] as T's [Alpha] fractile. For example, [T.sub.0.1] is the 0.1 fractile, i.e., Prob(T [less than] [T.sub.0.1]) = 0.1. In the fractile method, a number of required fractile levels [[Alpha].sub.i] are specified, and the expert is asked to estimate these fractiles. For example, if a set of [[Alpha].sub.i] are 0.05, 0.5, 0.95, then the expert is asked to estimate [T.sub.0.05], [T.sub.0.5] and [T.sub.0.95].

With reference to the subjective probability literature, the PERT procedure of estimating a, m and b has the following shortcomings.

2.2. Ambiguity of a and b

To the original PERT developers (Malcolm et al., 1959), a and b are the 'absolute endpoints' [T.sub.0] and [T.sub.1], respectively. As recently re-confirmed by Littlefield and Randolph (1987) and Gallagher (1987), (1a) and (1b) are valid for only a small subset of beta distributions and only if a = [T.sub.0] and b = [T.sub.1]. A good number of standard references (e.g., Cleland and King, 1975; Wiest and Levy, 1977; Hillier and Lieberman, 1986; Markland and Sweigart, 1987; Taha, 1987; Frankel, 1990) do not specify explicitly what fractiles a and b correspond to, but many of them imply in their statements and diagrams that a and b correspond to [T.sub.0] and [T.sub.1]. However, the probability elicitation literature (e.g., Alpert and Raiffa, 1969; Selvidge, 1980) as well as common sense indicate that it is difficult for a person to estimate accurately the absolute endpoints ([T.sub.0] and [T.sub.1]) of a stochastic quantity.

On the other hand, current undergraduate/graduate and professional textbooks in operations management (e.g., Buffa and Miller, 1979; Fogarty and Hoffmann, 1983; Chase and Aquilano, 1989) state that a and b should be T's 0.01 and 0.99 fractiles. Although the probability elicitation literature indicates that it is more appropriate to estimate the 0.01 and 0.99 fractiles than the absolute endpoints, these inner fractiles are inconsistent with the (already limited) justifications of (1a) and (1b) given in Malcolm et al. (1959), Littlefield and Randolph (1987) and Gallagher (1987). One might justify the substitution of these incorrect fractiles into (1a) and (1b) by claiming that [T.sub.0.01] (or [T.sub.0.99]) is very close to [T.sub.0] (or [T.sub.1]), and therefore the discrepancy is negligible. We will illustrate later that this claim can be very far from the truth.

Moder and Rodgers (1968) and Perry and Greig (1975) proposed that a and b should be T's 0.05 and 0.95 fractiles. Notably, they proposed modified formulae to take into account their modified definitions of a and b. Unfortunately, their proposals have been largely ignored by the textbooks. Their studies, however, motivated the current paper.

2.3. Shortcomings of estimating m

Asking a person to estimate m (a modal value) is not the same as asking one to estimate a fractile, because the mode corresponds to very different fractiles in different distributions. Although there appears to be no evidence that an expert can estimate a central fractile (e.g., the median) more accurately than a modal value, a large amount of empirical knowledge has been accumulated in the probability elicitation literature on fractile estimation; in contrast, little is known about modal estimation. Moreover, Trout (1989) and his reviewer raised a very plausible supposition: most managers are not clear about the distinction between a mode and a median. Therefore, when such a manager is asked to make three estimates (i.e., a, b, m) where two (i.e., a and b) are prescribed fractiles but one is not, there is little assurance that the manager will not end up estimating the median (a fractile) instead of the mode for m. Unfortunately, the actual difference between the median and the mode can be very substantial for the type of asymmetrical distributions that PERT and beta distributions are explicitly designed to handle.

2.4. Specification of a 'clean' fractile method

The preceding discussion suggests that one should use a 'clean' fractile method; i.e., only fractiles are to be estimated, and the fractile levels (i.e., the [[Alpha].sub.i]) should be clearly specified. The next question is: how many and which fractiles should be estimated?

The 'standardized' beta random variable 's' has range (0,1). Its density function, having only two parameters (p and q), is:

[f.sub.s[Beta]](s) = [s.sup.p - 1][(1 - s).sup.q - 1]/B(p, q), (0 [less than or equal to] s [less than or equal to] 1, p [greater than] 0, q [greater than] 0) (2a)

where B(p, q) is the beta function evaluated at (p, q).

However, the generalized beta variable 'x' used in classical PERT can have any range (U, V), and the density function is

[f.sub.[Beta]](x) = [(x - U).sup.p - 1][(V - x).sup.q - 1]/B(p, q)[(V - U).sup.p + q - 1],

(U [less than or equal to] x [less than or equal to] V, p [greater than] 0, q [greater than] 0). (2b)

The variables x and s in (2a) and (2b) are related by the linear transformation s = (x - U)/(V - U). [f.sub.[Beta]](x) has four parameters (U, V, p, q). Because the elicited fractiles will be used to fit this four-parameter distribution, it appears somewhat illogical to require less than four fractile estimates. For a given set of three fractiles ([T.sub.x], [T.sub.y], [T.sub.z]), it is theoretically possible that an infinite number of different beta distributions (i.e., with different U, V, p and q) will be found to have these same fractiles. Indeed, if T's distribution can always be defined by only three fractiles, one might as well use a three-parameter distribution to represent T.

After empirically comparing several fractile estimation schemes, Selvidge (1980) showed that the following fractile estimation procedure performed best:

1. Assess seven fractiles. That is, the three central fractiles: the 0.25, 0.50 and 0.75 fractiles; and the four extreme fractiles: the 0.01, 0.10, 0.90 and 0.99 fractiles.

2. Assess the central fractiles first.

Although Selvidge (1980) did not assume any specific 'parent' distribution for the subjective probabilities, the preceding scheme appears reasonable for the four-parameter beta parent distribution. The requirement of assessing the central fractiles first is also intuitively attractive because the estimated central fractiles can then act as 'anchors' for estimating the extreme fractiles. Several empirical studies (Alpert and Raiffa, 1969; Murphy and Winkler, 1974; Selvidge, 1980; Lichtenstein et al., 1982) have confirmed that people can estimate central fractiles more accurately than extreme fractiles.

Details on training managers to give good fractile estimates and on the actual wording of fractile-eliciting questions can be found in many papers and textbooks, including Winkler (1967), Hampton et al. (1973), Vatter et al. (1978), Solomon (1982) and Winterfeldt and Edwards (1986). For example, Solomon (1982) presented his 'fractile elicitation instrument' and also described how he trained public accountants to estimate seven fractiles (the 0.01, 0.10, 0.25, 0.50, 0.75, 0.90 and 0.99 fractiles) of an uncertain accounting item (the account balances); Vatter et al. (1978, p. 169) gave a detailed example of the steps that one might follow in determining the five fractiles (the 0.01, 0.25, 0.50, 0.75 and 0.99 fractiles) of one's own subjective probability distribution.

2.5. Recommendation

The preceding discussions suggest that Selvidge's (1980) seven-fractile procedure be used to estimate T in PERT. In an introductory POM/MS course where simplicity is most important, shorter formulae (see the next section) may be presented using only the five central fractiles. In practical applications where the fractile estimates will become inputs for planning large projects, the cost difference between making seven or five (or three) fractile estimates is hardly an issue worth mentioning.

3. Computing T's mean and standard deviation

3.1. Some basic properties of the beta distribution

For [f.sub.[Beta]](x) given in (2b), x's mean and standard deviation are:

[Mu] = U + (V - U)p/(p + q),}

[Sigma] = (V - U) [square root of pq/[[(p + q).sup.2](p + q + 1)]].} (3)

Detailed properties of [f.sub.[Beta]](x) are given in, e.g., Johnson and Kotz (1970). The parameters (U, V) in [f.sub.[Beta]](x) are the distribution's two endpoints, and the parameters (p, q) control the distribution's 'shape' (skewness and kurtosis). The distribution is symmetrical when p/q = 1; its skewness and kurtosis increase as p/q deviates from 1. If p [less than] 1 and/or q [less than] 1, the distribution is J- or U-shaped. As p and q increase from 1, the distribution evolves from a uniform distribution and tends to a normal distribution as p and q become large. Therefore for practical purposes we will consider only [f.sub.[Beta]](x) with 1 [less than] (p, q) [less than] 100 (say).

Note that beta distribution's shape versatility is obtained by allowing p and q to vary: (i) independently, and (ii) over a wide range. In contrast, Grubbs (1962), Swanson and Pazer (1971), Littlefield and Randolph (1987) and Gallagher (1987) have shown that (1a) and (1b) not only require q to be completely determined by p, they also restrict the magnitudes of p and q to approximately p + q [less than] 9 (see Figs 2 and 3 in Swanson and Pazer (1971)). Furthermore, even for these restricted beta distributions, (1a) and (1b) are only approximations, not exact relationships (Gallagher, 1987).

3.2. Development of alternative formulae for [Mu] and [Sigma]

Define the following 'symmetrical inter-fractile sums and differences':

S01 = [T.sub.0.99] + [T.sub.0.01]; D01 = [T.sub.0.99] - [T.sub.0.01];} S10 = [T.sub.0.90] + [T.sub.0.10]; D10 = [T.sub.0.90] - [T.sub.0.10];} S25 = [T.sub.0.75] + [T.sub.0.25]; D25 = [T.sub.0.75] - [T.sub.0.25]}. (4)

Pearson and Tukey's (1965) results suggest that a distribution's [Mu] and [Sigma] may be approximated by linear functions of the distribution's fractiles. We show in Appendix A that, for the seven (or five) fractiles proposed in the preceding section, the linear functions should have the form:

for seven fractiles: [Mu] = [k.sub.1](S01) + [k.sub.2](S10) + [k.sub.3](S25) + [k.sub.4]([T.sub.0.5]), (5a)

[Sigma] = [k.sub.5](D01) + [k.sub.6](D10) + [k.sub.7](D25); (5b)

for five fractiles: [Mu] = [c.sub.1](S10) + [c.sub.2](S25) + [c.sub.3]([T.sub.0.5]), (6a)

[Sigma] = [c.sub.4](D10) + [c.sub.5](D25); (6b)

where the [k.sub.i] and [c.sub.i] are constants to be determined.

3.3. Estimating the coefficients [k.sub.i] and [c.sub.i] in (5) and (6)

The objective is to determine the values of the [k.sub.i] and [c.sub.i] in (5) and (6) that will estimate [Mu] and [Sigma] accurately for all beta distributions (i.e., for all combinations of parameters p and q). We show in Appendix A that values of [k.sub.i] and [c.sub.i] applicable to one set of parameters (U, V) should also be applicable to all other sets of (U, V); therefore, we only need to consider 'standardized' beta distributions with U = 0 and V = 1. The following linear regression procedure is used to estimate the [k.sub.i] and [c.sub.i].

To construct the data set on which linear regression is performed, (p, q) values are generated randomly in the ranges 1 [less than] p [less than] 100 and 1 [less than] q [less than] 100. For each (p, q), the required fractiles for the standardized beta distribution with these (p, q) parameters are computed using subroutine BETIN in the IMSL (1987) Library. For example, assume that we randomly generated

(p, q) = (61.98, 20.62). (7)

The fractiles of a standardized beta distribution with these (p, q) parameters are then computed by BETIN as:

[T.sub.0.01] = 0.6322, [T.sub.0.1] = 0.6882, [T.sub.0.25] = 0.7194,} [T.sub.0.5] = 0.7524, [T.sub.0.75] = 0.7835, [T.sub.0.9] = 0.8098,} [T.sub.0.90] = 0.8507.} (8)

Substituting the values in (7) and (8) into (3) and (4) gives

[TABULAR DATA FOR TABLE 1 OMITTED]

dependent variables: [Mu] = 0.7503, [Sigma] = 0.04734; (9a) independent variables:

S01 = 1.4829, D01 = 0.2185,} S10 = 1.4980, D10 = 0.1216,} S25 = 1.5029, D25 = 0.0641.} (9b)

Repeating (7), (8) and (9) 2000 times gives 2000 sets of values similar to those illustrated in (9). Straightforward linear regression using these 2000 'observations' and the models stated in (5) and (6) give the required estimates of the [k.sub.i] and [c.sub.i]. For example, the estimates and standard errors (in parentheses) for (5a) are:

[Mathematical Expression Omitted]

The estimates are summarized as 'data set 1' in Tables 1 and 2. Repeating the regressions with another random set of 2000 'observations' gives the results reported as 'data set 2' in Table 1.

The very high [R.sup.2] values in Table 1 confirm the earlier conjecture that a beta distribution's [Mu] and [Sigma] can be accurately estimated by linear functions of the distribution's fractiles. Note also that the regression-generated [Mu] functions always turn out to be a 'weighted average' of the fractiles; i.e., noting that each of S01, S10 and S25 consists of two fractiles, the 'sum of weights' {2([k.sub.1] + [k.sub.2] + [k.sub.3]) + [k.sub.4]} or {2([c.sub.1] + [c.sub.2]) + [c.sub.3]} always comes to I for both sets 1 and 2, even though no prior restriction was imposed in the regression procedure on the values of [k.sub.i] and [c.sub.i] in the regressions. This confirms the theoretical prediction made with (A8) and (A9) in Appendix A.

However, one seemingly disturbing factor in Table 1 is that data sets 1 and 2 have quite different [k.sub.i] and [c.sub.i]; e.g., [c.sub.2] is -0.4390 and -0.3429 for data sets 1 and 2, respectively. We used the [Mu] and [Sigma] functions developed with data set 1 to estimate the [Mu] and [Sigma] in data set 2 (and vice versa), and found that the 'switched' functions give the same high [R.sup.2] values with the other data set. We also repeated this experiment with several data sets with sample sizes of n = 4000 and 1000 (instead of n = 2000 in data sets 1 and 2) and found the same substantial variations in the [k.sub.i] and [c.sub.i]. Therefore the differences in the [k.sub.i] and [c.sub.i] between data sets 1 and 2 (and also among the various data sets with different n) are apparently due to the existence of wide bands of near-optimal values for the [k.sub.i] and [c.sub.i]. This property then encourages us to search for formulae with 'cleaner' coefficients (i.e., values of the [k.sub.i] and [c.sub.i]) by testing various round-off modifications of the values shown in Table 1. After many trials, the following formulae emerge:

for seven fractiles:

[[Mu].sub.e] = 0.04 x S01 + 0.11 x S10 + 0.23 x S25 + 0.24 x [T.sub.0.5], (11a)

[[Sigma].sub.e] = 0.2 x D01 - 0.6 x D10 + 1.2 x D25; (11b)

for five fractiles:

[[Mu].sub.e] = 0.4(S10 - S25) + [T.sub.0.5], (12a)

[[Sigma].sub.e] = 0.7 x D10 - 0.59 x D25. (12b)

A less accurate but simpler alternative to (11a) is

for seven fractiles:

[[Mu].sub.e] = 0.05 x S01 + 0.10 x S10 + 0.25 x S25 + 0.2 x [T.sub.0.5], (13a)

which can be written as:

[[Mu].sub.e] = (S01 + 2 x S10 + 5 x S25 + 4 x [T.sub.0.5])/20. (13b)

We explain in Appendix B why the [[Mu].sub.e] and [[Sigma].sub.e] formulae given in (10)-(13) and Table 1 can give such high [R.sup.2] values over a wide band of [k.sub.i] and [c.sub.i] values; this explanation also reveals a 'statistical' shortcoming (in contrast to the shortcomings identified in [section]2 relating to definitions and human-estimation behavior) on the choice of parameters a, m and b in (1a) and (1b).

This section has developed [Mu] and [Sigma] formulae using fractiles at [Alpha] levels of 0.01, 0.10, 0.25, 0.50, 0.75, 0.90 and 0.99. Note that we are not aware of and are not implying that there is any mathematical-statistical evidence indicating that these are the best fractiles for constructing our type of [Mu] and [Sigma] formulae. These particular fractiles are used because the empirical behavioral literature recommends that these fractiles be elicited from human estimators.

3.4. Error analyses for (11)-(13)

To evaluate the accuracy of (11)-(13), we used (7), (8) and (9) to generate three new data sets (nos 3, 4 and 5), each with 2000 sets of fractiles and (actual) [Mu] and [Sigma]. Whereas data sets 1 and 2 were both generated with (p, q) in the range of 1 to 100, the (p, q) ranges in data sets 3, 4 and 5 are (1,100), (1,50), and (1,500), respectively; this is done to ensure that our conclusions are not dependent on the (p, q)-ranges of the data sets. The [[Mu].sub.e] and [[Sigma].sub.e] values computed with (11) and (12) are then compared with the actual [Mu] and [Sigma]. Two types of errors are considered:

Absolute Error (AE) = [absolute value of [[Mu].sub.e] - [Mu]] or [absolute value of [[Sigma].sub.e] - [Sigma]] (14a)

Absolute Percentage Error (APE) = 100AE/[Mu] or 100AE/[Sigma]. (14b)

Table 2 summarizes the statistics of the AEs and APEs. For example, the first row of entries in Table 2 indicates that the average [Mu] of the 2000 distributions in data set 3 is 0.504. In applying (11a) to compute the [[Mu].sub.e] values, the average of and maximum of the 2000 AEs are 0.000027 and 0.0001, respectively, and the 99th percentile of the 2000 AEs is 0.0001. It is evident from Table 2 that (11), (12) and even (13) are sufficiently accurate for most practical purposes. Two points need clarification:

1. First, Table 2 shows that, in computing [[Sigma].sub.e], (11b) with more explanatory variables have larger average errors than (12b) with fewer explanatory variables. This should not be disturbing when one remembers that these formulae are not the 'best' formulae as shown in Table 1; they are modifications with rounded-off coefficients. In order to obtain the cleaner coefficients in (11b) and (12b), it turns out that more accuracy is sacrificed in (11b). For the same reason, (13) using seven fractiles is less accurate than (12a) using five fractiles.

2. However, even with the 'best' coefficients, the [R.sup.2]-values in Table 1 indicate that there is hardly any difference in accuracy between using seven and five fractiles. Therefore, one may question why one should bother to use seven instead of five fractiles. The reason is that we need to consider two separate factors here for formulae such as (1), (11) and (12); i.e., Factor 1, eliciting the fractile estimations; and Factor 2, computing [Mu] and [Sigma] with the estimated fractiles. When dealing with Factor 2 (i.e., developing and justifying (1), (11), (12), etc.), one eliminates the confounding effect of Factor 1 by assuming that the estimated fractiles are error-free. Thus, the fractiles shown in (8) are the exact fractiles for the purportedly 'real' subjective probability distribution. However, fractiles usually cannot be estimated error-free. Selvidge's (1980) results suggest that when a person is required to estimate the seven 'Selvidge' fractiles instead of other sets of fractiles, the person tends to make more accurate estimations. Thus, the justification for using seven instead of five fractiles is in handling Factor 1, not Factor 2.

3.5. Some numerical examples

Consider the exact fractiles in (8) for a standardized beta distribution with (U, V) = (0, 1) and (p, q) = (61.98, 20.62). First, note that while [T.sub.0] = 0 and [T.sub.1] = 1, the figures in (8) show that [T.sub.0.01] = 0.6322 and [T.sub.0.99] = 0.8507. This illustrates our earlier statement: one cannot assume that [T.sub.0.01] (or [T.sub.0.99]) is usually close to [T.sub.0] (or [T.sub.1]).

Using (1a) and (1b), if one defines a = [T.sub.0.01] and b = [T.sub.0.99], then [[Sigma].sub.1] = (b - a)/6 = (0.8507 - 0.6322)/6 = 0.0364. If one defines a = [T.sub.0] and b = [T.sub.1], then [[Sigma].sub.2] = (1 - 0)/6 = 0.167, which differs from [[Sigma].sub.1] by 358%. Also, both [[Sigma].sub.1] and [[Sigma].sub.2] are poor estimates of the correct [Sigma] (= 0.04734, see (9a)). In contrast, using (11b) and (12b) with the figures in (8) gives

[[Sigma].sub.3] = 0.2 x 0.2185 - 0.6 x 0.1216 + 1.2 x 0.0641 = 0.04766, (15a)

and [[Sigma].sub.4] = 0.7 x 0.1216 - 0.59 x 0.0641 = 0.04730; (15b)

[[Sigma].sub.4] is practically identical to the exact [Sigma], while [[Sigma].sub.3] is within 1% of [Sigma].

3.6. Error analysis of (1b)

Although it may be intuitively obvious from the preceding discussions that (1a) and (1b) are considerably less accurate than (11)-(13), we present results of a systematic error analysis of (1b) in Table 3, which is a counterpart of Table 2 for (1b). Two versions of (1b) are considered: version A uses [T.sub.0] and [T.sub.1] for 'a' and 'b', [TABULAR DATA FOR TABLE 2 OMITTED] whereas version B uses [T.sub.0.01] and [T.sub.0.99]. As expected, the errors of using either version are substantial. It is interesting to note, however, that version A is more inaccurate than version B, even though version A uses the 'correct' interpretation of 'a' and 'b'. This is because the 'correct' interpretation is only relevant in conjunction with the very restrictive subset of beta distribution required by (1b), but in Table 3 we have used (1b) to handle beta distributions in data sets 1 to 3, which contain (essentially) all bell-shaped beta distributions.

One may then argue that the above analysis is therefore 'unfair' to (1b), because (1b) has been used to handle distributions it is not supposed to handle. As clarified by Gallagher (1987), (1b) is applicable to two types of beta distribution: (1) those with [Sigma] = ([T.sub.1] - [T.sub.0])/6, with which (1b) is error-free by declaration; and (2) those with p + q = 6, with which (1b) is an approximation. To evaluate the accuracy of (1b) with distributions having p + q = 6, we consider standardized beta distributions with p from 1.01 to 4.99 in steps of 0.01 and q = 6 - p (note that both p and q must exceed 1 to give a bell-shaped curve); the resultant 399 distributions constitute data set 6. The performance of versions A and B of (1b) on data set 6 is summarized in the last two rows of Table 3. As expected, version A now performs better, but they are still much less accurate than (11)-(13), even for this very restricted subset of beta distributions where (1b) is supposed to be applicable.

Interestingly, Table 3 also shows that, if one insists on [TABULAR DATA FOR TABLE 3 OMITTED] using (1b) to estimate [Sigma], one might as well also use the 'wrong' definitions a = [T.sub.0.01] and b = [T.sub.0.99], since the accuracy of version A (but not version B) deteriorates considerably when it is applied to beta distributions outside the 'restricted' subset (e.g., to data sets 3 to 5).

4. Conclusion

In spite of the well-established logical flaws of the PERT formulae, most standard references still preach these formulae without any mention of their shortcomings. The implied justifications are: (i) they are good approximations; and (ii) no simple alternative exists. We have: (i) shown in Table 3 that formulae (1) are not good approximations; and (ii) developed reasonably simple formulae ((11)-(13)) that are logical and considerably more accurate than (1a) and (1b). Although (11)-(13) are not as simple as (1a) and (1b), they should be easy enough even for undergraduates. Furthermore, their associated fractile-estimation step is consistent with the probability-elicitation literature, whereas the estimation of a, m and b in (1a) and (1b) is not. The only disadvantage of our (11) and (13) is that they require the subjective estimation of five (or seven) values (fractiles), whereas (1a) and (1b) require only three estimated values. However, there is ample evidence that the estimation of five (or seven) fractiles can be routinely implemented, and the cost differential between making three versus five (or seven) subjective estimates should be trivial compared with the typical cost magnitudes associated with projects that use PERT.

Our proposed procedure was developed with the explicit intention of keeping intact as many of the original PERT assumptions as possible. One such assumption is that T is beta distributed. Although this assumption is reasonable, it is nevertheless arbitrary, because there are other density functions that can assume a wide variety of shapes (see, for example, Kendall and Stuart, 1969). It is unclear how well (1), (11), (12) and (13) will perform if T is assumed to follow other density functions.

References

Alpert, M. and Raiffa, H. (1969) A progress report on the training of probability Assessors. Unpublished manuscript, Harvard University.

Buffa, E. and Miller, J. (1979) Production-Inventory Systems, Irwin, Illinois.

Chase, R. and Aquilano, N. (1989) Production and Operations Management, Irwin, Illinois.

Chesley, G.R. (1975) Elicitation of subjective probabilities: a review. The Accounting Review, 60 (2), 325-337.

Cleland, D. and King, W. (1975) Systems Analysis and Project Management, McGraw-Hill, New York.

Decoster, D. (1974) The budget director and PERT, in Accounting for Managerial Decision Making, DeCoster D. et al. (eds), Melville Publishing Company, California.

Donaldson, W.A. (1965) Estimation of the mean and variance of a PERT activity time. Operations Research, 13, 382-385.

Fogarty, D. and Hoffmann, T. (1983) Production and Inventory Management, South-western Publishing Company, Ohio.

Frankel, E. (1990) Project Management in Engineering Services and Development, Butterworth, London.

Gallagher, C. (1987) A note on PERT assumptions, Management Science, 33, 1360.

Grubbs, F.E. (1962) Attempts to validate some PERT statistics or 'picking on PERT'. Operations Research, 10, 912-915.

Hampton, J.M., Moore, P.G. and Thomas, H. (1973) Subjective probability and its measurements. Journal of the Royal Statistical Society, A136 (1), 21-42.

Healy, T.H. (1961) Activity subdivision and PERT probability statements. Operations Research, 9, 341-348.

Hillier, F. and Lieberman, G. (1986) Introduction to Operations Research, Holden-Day, California.

IMSL (1987) IMSL Library User's Manual, IMSL, Houston.

Johnson, N.L. and Kotz, S. (1970) Continuous Univariate Distributions, vols. 1 and 2, Houghton Mifflin, New York.

Kendall, M.G. and Stuart, A. (1969) The Advanced Theory of Statistics, Hafner Publishing Company, New York.

Lichtenstein, S., Fischhoff, B. and Phillips, L. (1982) Calibration of probabilities: the state of the art to 1980, in Judgement under Uncertainty: Heuristics and Biases, Kahneman, D., Slovic, P. and Tversky, A. (eds), Cambridge University Press, New York.

Littlefield, T. and Randolph, P. (1987) An answer to Sasieni's question on PERT times. Management Science, 33, 1357-1359.

MacCrimmon, K.R. and Ryavec, C.A. (1964) Analytical study of PERT assumptions. Operations Research, 12, 16-36.

Malcolm, D.G., Roseboom, J.H., Clark, C.E. and Fazar, W. (1959) Application of a technique for research and development program evaluation. Operations Research, 7, 646-669.

Markland, R.E. and Sweigart, J.R. Quantitative Methods: Applications to Managerial Decision Making, John Wiley, New York.

Moder, J.J. and Rodgers, E.G. (1968) Judgement estimates of the moments of PERT type distributions. Management Science, 15, B76-B83.

Murphy, A.H. and Winkler, R.L. (1974) Subjective probability forecasting experiments in meteorology: some preliminary results. Bulletin of the American Meteorological Society, 55, 1206-1216.

Ostwald, P., (1974) Cost Estimating for Engineering and Management, Prentice-Hall, New Jersey.

Pearson, E. and Tukey, J. (1965) Approximate means and standard deviations based on distances between percentage points of frequency curves. Biometrika 52, 533-546.

Perry, C. and Greig, I.D. (1975) Estimating the mean and variance of subjective distributions in PERT and decision analysis. Management Science, 21, 1477-1480.

Sasieni, M.W. (1986) A note on PERT times. Management Science, 32, 1652-1653.

Selvidge, J.E. (1980) Assessing the extremes of probability distributions by the fractile method. Decision Science, 11, 493-502.

Solomon, I. (1982) Probability assessment by individual auditors and audit teams: an empirical investigation. Journal of Accounting Research, 20, 689-710.

Spetzler, C.S. and Stael von Holstein C-A. S. (1975) Probability encoding in decision analysis. Management Science, 22, 340-358.

Swanson, L.A. and Pazer, H.L. (1971) Implications of the underlying assumptions of PERT. Decision Science, 2, 461-480.

Taha, H. (1987) Operations Research: An Introduction, MacMillan, New York.

Trout, M. (1989) On the generality of the PERT average time formula. Decision Sciences, 20, 410-412.

Van Horne, J. (1980) Financial Management and Policy, Prentice-Hall, New Jersey.

Vatter, P., Bradley, S., Frey, S. and Jackson, B. (1978) Quantitative Methods in Management, Irwin, Illinois.

Wallsten, T. and Budescu, D. (1983) Encoding subjective probabilities; a psychological and psychometric review. Management Science, 29, 151-173.

Wiest, J. and Levy, F. (1977) A Management Guide to PERT/CPM, Prentice-Hall, New Jersey.

Winkler, R. (1967) The assessment of prior distributions in Bayesian analysis. Journal of the American Statistical Association, 62, 776-800.

Winterfeldt, D. and Edwards, W. (1986) Decision Analysis and Behavioral Research, Cambridge University Press, Cambridge.

Appendix A. Feasible linear functions of fractiles for approximating [Mu] and [Sigma]

All earlier empirical works on estimating subjective probability distributions recommend that, except for the median, fractiles should be estimated in symmetrical pairs. Therefore only the median and symmetrical fractiles will be considered here.

Let T be any beta variable, and T be the corresponding standardized variable; hence

T = U + (V - U)t = U + Wt, (A1)

where W = V - U and (U, V) are T's endpoints. For simplicity, assume that only the symmetrical fractile-pair [t.sub.0.1] and [t.sub.0.9] will be used to construct a linear function for estimating [Sigma](t). The general form is then (a, b and c are constants):

[Sigma](t) = a + b[t.sub.0.1] + c[t.sub.0.9]. (A2)

If (A2) is also valid for T, then [Sigma](T) can be computed as

[Sigma](T) = a + b[T.sub.0.1] + c[T.sub.0.9]. (A3)

Combining (A1) and (A3) gives

[Sigma](T) = a + (b + c)U + (b[t.sub.0.1] + c[t.sub.0.9])W. (A4)

However, [Sigma](T) = W[Sigma](t) must also hold. Substituting (A2) into this gives:

[Sigma](T) = aW + (b[t.sub.0.1] + c[t.sub.0.9])W. (A5)

Since (A4) and (A5) must always be equivalent, one must have a = 0 and b + c = 0. That is, the intercept a in (A2) must be 0, and we can combine the symmetrical fractiles [t.sub.0.1] and [t.sub.0.9] into a single term D10. Using straightforward extension, it can be shown that when more fractile pairs are to be used in (A2), the [Sigma]-functions should contain only the inter-fractile differences of each symmetrical fractile pair (see (4), (5b) and (6b)).

Apply now the preceding [Sigma]-function logic to the [Mu] functions. Assume that a desired [Mu] function is of the form

[Mu](t) = a + b[t.sub.0.1] + c[t.sub.0.9] + d[t.sub.0.5] (A6)

and

[Mu](T) = a + b[T.sub.0.1] + c[T.sub.0.9] + d[T.sub.0.5]. (A7)

Combining (A1) and (A6) gives

[Mu](T) = U + W[Mu](t) = U + aW + (b[t.sub.0.1] + c[t.sub.0.9] + d[t.sub.0.5])W. (A8)

However, combining (A1) and (A7) gives

[Mu](T) = a + (b + c + d)U + (b[t.sub.0.1] + c[t.sub.0.9] + d[t.sub.0.5])W. (A9)

Since (A8) and (A9) must always be equivalent, one must have a = 0 and (b + c + d) = 1. The latter equality means that the 'weights' attached to the various fractiles in a [Mu] function must add up to 1.

Consider now a symmetrical variable t with [Mu] = [t.sub.0.5] = 0. Knowing now that a = 0, [Mu](t) in (A6) can produce the correct answer of [Mu] = 0 only if b = c. That is, [Mu] functions should contain only the inter-fractile sums of each symmetrical fractile pair.

Appendix B. Explanation of the power of (5) and (6) to predict [Mu] and [Sigma]

S([Alpha])'s as perfect [Mu]-estimator for symmetrical distributions

Define a 'symmetrical inter-fractile sum' for a random variable T as

S([Alpha]) = [T.sub.[Alpha]] + [T.sub.1 - [Alpha]], where [T.sub.[Alpha]], is T's [Alpha] fractile. (B1)

For any symmetrical distribution (e.g., normal, uniform), it is obvious that

[Mu](T's mean) = S([Alpha])/2, or R([Alpha]) [equivalent to] S([Alpha])/[Mu] = 2 (B2)

for any [Alpha] value. Therefore, S([Alpha])/2 at any [Alpha] value is a perfect estimator of [Mu]; also, any linear combination of S([[Alpha].sub.i]) is a perfect estimator of [Mu]; i.e.,

[Mu] = [c.sub.1]S([[Alpha].sub.1]) + [c.sub.2]S([[Alpha].sub.2]) + ... [c.sub.n]S([[Alpha].sub.n]), (B3)

where each [c.sub.i] can have any value provided that 2[[[Sigma].sub.n]([c.sub.i])] = 1.

We investigate below the behavior of R([Alpha]) = S([Alpha])/[Mu] for selected asymmetrical distributions; the purpose is to show that (B2) (and hence (A8)) remains approximately valid for some asymmetrical distributions (such as the beta), but not for some others (such as the lognormal).

Log-normal distribution's R([Alpha])

If x is log-normally distributed with scale parameter g and shape parameter h, then its mean, standard deviation (s.d.) and density function (df) are (Johnson and Kotz, 1970):

[[Mu].sub.x] = exp(g + [h.sup.2]/2), [[Sigma].sub.x] = exp(2g + [h.sup.2]) x [exp([h.sup.2]) - 1], (B4)

f(x) = exp{-[[ln(x) - g].sup.2]/2[h.sup.2]}/(h[square root of 2[Pi]]). (B5)

Also, z = ln(x) is normally distributed with [[Mu].sub.z] = g and [[Sigma].sub.z] = h. Therefore it can be easily shown that x's [Alpha]-fractile can be computed as

[x.sub.[Alpha]] = exp[h [multiplied by] [[Phi].sup.-1] ([Alpha]) + g], (B6)

where [Phi]([center dot]) is the standard normal cumulative density function (cdf), and [[Phi].sup.-1](k) is the k-fractile of a standard normal variate. The fractiles of the corresponding standardized log normal variate y with the same shape as x but with [[Mu].sub.y] = [[Sigma].sub.y] = 1 can then be obtained as

[y.sub.[Alpha]] = ([x.sub.[Alpha]] - [[Mu].sub.x])/[[Sigma].sub.x] + 1. (B7)

Assume now that a lognormal x has shape parameters h = 1.86 (the scale parameter's magnitude is irrelevant and can be arbitrarily set at g = 0); (B4), (B6) and (B7) give the following fractiles for y:

[y.sub.0.01] = 0.82025, [y.sub.0.1] = 0.82277, [y.sub.0.25] = 0.82650;} [y.sub.0.75] = 0.93185, [y.sub.0.9] = 0.97269, [y.sub.0.99] = 3.23884.} (B8)

Substituting these values into (B1) and (B2) gives

R(0.01) = 4.061, R(0.10) = 1.795, R(0.25) = 1.758. (B9)

In contrast to (B2), R([Alpha]) does not stay constant at 2.

To see how the R([Alpha]) values behave generally in different log-normal distributions, we generate randomly 2000 h values in the range 0.1 [less than] h [less than] 2. For each h value, R([Alpha]) values are computed as in (B8) and (B9). The means and s.d.s of the 2000 R([Alpha]) values are given in Table B1 for various [Alpha] values. Table B1 shows that the mean R([Alpha]) values vary considerably with [Alpha]; and the s.d. column shows that even at a given [Alpha] level, different h values give very different R([Alpha]) values. However, the s.d. of R([Alpha]) is quite small at [Alpha] = 0.2, where the mean of R([Alpha]) is 1.800; therefore, S([Alpha])/1.800 may be a crude [Mu]-estimator for log-normal distributions having 0.1 [less than] h [less than] 2.

Beta distribution's R([Alpha])

Again, it can be easily shown that the scale parameters (U, V) are irrelevant, and we will consider a standardized beta-distributed x with (p, q) = (61.98, 20.62) as defined in (7) to (9), from which we have

R(0.01) = 1.976, R(0.10) = 1.997, R(0.25) = 2.003. (B10)

The R([Alpha]) values remain fairly close to 2. Similarly to the construction of Table B1, we randomly generated 2000 sets of (p, q) in the range of 1 [less than] p [less than] 20 and 1 [less than] q [less than] 20, and the means and s.d.s of the 2000 R([Alpha]) values are shown at the left of Table B2. Table B2's right-side is the counterpart of its left side for the parameter range 1 [less than] (p, q) [less than] 100.

Compared with Table B1, Table B2 shows that not only are R([Alpha])'s means close to 2, but their s.d.s remain quite small for all [Alpha]; i.e., the individual R([Alpha]) values remain close to 2 for the entire range of [Alpha] and (p, q) considered here. As [Alpha] increases, R([Alpha])'s s.d. decreases to a minimum at around [Alpha] = 0.2 and then increases. For our purpose, Table B2 shows that S(0.01), S(0.10) and S(0.25) can each individually be a very good [Mu]-estimator. Using the data set considered in Table B2, regressions of [[Mu].sub.x] against S(0.01), S(0.10) and S(0.25) individually give [R.sup.2] values of 0.9977, 0.9999 and 1.0000, respectively. Note that [R.sup.2] increases as [Alpha] increases, which corresponds to the fact that R([Alpha])'s s.d. decreases as [Alpha] increases in Table B2.

Table B1. Mean and s.d. of R([Alpha]) for the log-normal
distribution

[Alpha]      Mean of       S.d. of R([Alpha])
            R([Alpha])

0.01          4.465              0.784
0.02          3.564              0.499
0.05          2.667              0.245
0.10          2.167              0.109
0.20          1.800              0.070

[TABULAR DATA FOR TABLE B2 OMITTED]

The above results indicate that (B2) and (B3) are closely satisfied by beta distributions, and this explains the phenomena depicted in Table 1 and (10)-(13):

1. Because the [R.sup.2] achievable with any single S([Alpha]) is already very high, it is not surprising that a [Mu]-predicting equation with more than one S([Alpha]) will have [R.sup.2] = 1.00. One should be reminded that [R.sup.2] = 1.00 does not mean that the prediction is error-free; it simply means that the errors are small compared with the variation of the 2000 [[Mu].sub.x] values. The actual magnitudes of [Mu]-estimation errors are analyzed in Table 2.

2. Because [c.sub.i] can have any value in (B3) provided that 2[[[Sigma].sub.n]([c.sub.i])] = 1, we can see why the [c.sub.i] values can have a wide band of near-optimal values.

3. Because R([Alpha])'s s.d. decreases as [Alpha] changes from 0.01 to 0.10 to 0.25, the coefficients in (10), (11a) and (13a) increase from S(0.01) to S(0.25).

Contrasting the fractile approach with the 'Classical' equation (1a)

The preceding subsections show that it is very fortunate that the beta distribution was chosen for PERT, because not every distribution has such good relations between its mean and fractiles. Unfortunately, this advantage is completely wasted when one uses (1a) to estimate [[Mu].sub.e] with endpoints a and b, which are much poorer predictors for [Mu] than the fractiles. In fact, among standardized beta variates with a = 0 and b = 1, (a + b) is always 1 regardless of the mean; i.e., for the standardized variates a and b contain practically no predictive information on [Mu].

We have considered only the relation between the fractiles and [Mu]. The relation between the fractiles and [Sigma] can be similarly analyzed, but is beyond the objectives of this paper.

Biographies

Amy Hing-Ling Lau is Kerr-McGee Professor of Accounting at Oklahoma State University. She graduated from St Nicholas High School in Singapore and received her M.P.A. and Ph.D. from Texas Christian University and Washington University, respectively. Among her publications are numerous articles in such journals as The Accounting Review, Contemporary Accounting Research, Decision Sciences, Journal of Accounting Research, and Management Science.

Hon-Shiang Lau is Regents Professor of Management and Carson Professor of Business Administration at Oklahoma State University. He graduated from Catholic High School in Singapore, and received his Ph.D. in business administration from the University of North Carolina at Chapel Hill. His research interests range from human resource accounting to production line design to commodity futures. The results reported in more than 100 refereed articles have appeared in, for example, The Accounting Review, Decision Sciences, EJOR, Journal of Business & Economic Statistics, and Management Science.

Yue Zhang is a Ph.D. candidate in management science and operations management at Oklahoma State University. He majored in mechanical engineering at East China Institute of Technology (Nanjing, China) and received his M.S. in industrial engineering and management from Zhejiang University (China).

In addition, make sure to read these articles: