Results from the comparison of D, G, and A efficiencies and the scaled average prediction variance IV criterion are presented for the central composite, small composite, Notz, Hoke, Box-Draper, and computer-generated designs. These design optimality criteria are evaluated over the cuboidal design
KEY WORDS: Design optimality criteria; Response surface methodology.
Research projects in industry, manufacturing, and the engineering and physical sciences require experimentation to uncover relationships between design variables and the responses of interest. Problems in response surface methodology (RSM) involve a response of interest [eta], which is a function of a vector x = ([x.sub.1],...,[x.sub.k]) of k independent variables; that is, [eta] = v(x) and the form of v is unknown. It is assumed that v can be approximated by a low-order polynomial f(x) such as the second-order response surface model
y = [[beta].sub.0] + [[summation over].sup.k.sub.i=1][[beta].sub.i][x.sub.i] + [[summation over].sup.k.sub.j=i+1][[summation over].sup.k-1.sub.i=1][[beta].sub.ij][x.sub.i][x.sub.j] + [[summation over].sup.k.sub.i=1][[beta].sub.ii][x.sup.2.sub.i] + [epsilon], (1)
where the [beta]'s are parameter coefficients, y is the measured response, and [member of] is an error term that accounts for random error and the deviation between the true model and the polynomial model in (1). In this article, we assume that the experimenter believes that the model in (1) can adequately approximate the true model. For a thorough discussion of RSM, see Box and Draper (1987), Khuri and Cornell (1996), and Myers and Montgomery (1995).
After considering practical constraints (e.g., time and money), design optimality criteria are often used to evaluate a proposed experimental design. Four commonly used design optimality criteria, often referred to as alphabetic optimality criteria, are the D, G, A, and IV criteria. Based on these optimality criteria, D, G, and A efficiencies and the scaled average prediction variance (IV) criterion can be calculated and can then be used to compare designs. This article is based on the following practical considerations:
1. Design selection via optimality criteria is highly dependent on the approximating response surface model (such as a second-order polynomial) that is proposed prior to data collection. Thus, different models lead to different design optimality criteria values.
2. After data are collected and the model's parameters are fitted, many parameters are deemed insignificant. A reduced model retaining only significant terms is then adopted for use.
Design optimality criteria based on the adopted reduced model are equally if not more important than the optimality criteria for the proposed model. Therefore, a design should be robust over classes of reduced models; that is, the design's optimality criteria should remain high over a wide assortment of potential models. Most statistical commentary, however, focuses solely on comparisons associated with the initially proposed model.
Other authors have studied the design-selection problem when the proposed approximating model is an underparameterized approximation of the true response surface. The common case is the use of a low-order polynomial when a higher-order polynomial is a better approximating function (e.g., Box and Draper 1959, 1963; Karson, Manson, and Hader 1969). With regard to this design problem, many authors have studied the mean squared error MSE = V + B, where B is the systematic (squared) bias resulting from underfitting and V is the prediction variance. For a general discussion of designs that (a) minimize V while ignoring B, (b) minimize B while ignoring V, or (c) minimize MSE, see Box and Draper (1987), Myers and Montgomery (1995), and Khuri and Cornell (1996). In particular, the IMSE (integrated mean squared error) is used as a design comparison criterion.
Of related interest are supersaturated designs (Booth and Cox 1962; Lin 1993, 1995) in which the number of factors studied is at least N, the design size. A supersaturated design assumes a sparsity of effects; that is, the number of relevant factors is small relative to the large number of potentially important factors considered in the design. Although a higher-order model may be a better approximating model, a first-order model is assumed to be adequate for screening purposes-that is, separating a small subset of factors that are actually important from the majority of factors of lesser importance.
This research, however, addresses a different problem. We provide an in-depth study of robustness against many classes of model misspecification for which the model in (1) is an overparameterized approximation of the true response surface. Based on D-, G-, A-, and IV-optimality criteria, the robustness properties of central composite, small composite, Notz, Hoke, and computer-generated algorithmic designs over a collection of reduced models are evaluated. Because the majority of response surface designs run in practice contain [less than or equal to] 5 design variables and given the very large number of reduced models for [greater than or equal to] 6 design variables, only three, four, and five design variables in hypercube design regions are considered in this research. We provide recommendations to assist the practitioner in choosing a design among competing designs despite the likelihood of an initial model misspecification.
1. DESIGN OPTIMALITY CRITERIA
The goals of the four optimality criteria are
D -- criterion goal [right arrow] minimize [absolute value of [(X'X).sup.-1]], or, equivalently, maximize [absolute value of (X'X)]
A -- criterion goal [right arrow] minimize trace [[(X'X).sup.-1]]
G -- criterion goal [right arrow] minimize [max.sub.x[member of]x][Nf'(x)[(X'X).sup.-1]f(x)]
IV -- criterion goal [right arrow] minimize average[Nf'(x)[(X'X).sup.-1]f(x)] over x [member of] X,
where X is the design matrix, x is any point in the design region X, N is the design size, and f(x) = [[f.sub.1](x),...,[f.sub.p](x)] is a vector of p real-valued functions based on the p model terms. When considering a design for implementation, several of its properties can be determined by computing measures of design efficiency. To calculate design efficiencies, the optimal values must first be found. Ideally, a design's alphabetic criterion value is close to the optimal approximate design value. See Atkinson and Donev (1992) regarding approximate designs.
In this article, D, G, and A efficiencies are evaluated. For consistency with the literature involving the IV criterion, the scaled average prediction variance was used. Thus, larger is better for D, C, and A efficiencies, while smaller is better for the IV criterion.
The appeal of D optimality is based on the fact that [absolute value of X'X] is inversely proportional to the square of the volume of the confidence hyperellipsoid. Thus, a larger [absolute value of X'X] implies better estimation of the model parameters. Statistical packages implementing algorithms that generate designs with high D efficiencies are available for the practitioner. For a discussion of D optimality and design comparisons, see St. John and Draper (1975) and Lucas (1974, 1976).
G and IV optimality are based on the scaled prediction variance V(x) = Nf'(x)[(X'X).sup.-1]f(x). The G criterion is a minimax criterion; that is, the design should minimize the maximum prediction variance in the design region. The IV criterion is an averaging criterion; that is, the design should minimize the average prediction variance in the design region.
When considering G optimality, it is known that the G efficiency on the hypercube is p/G, where p is the number of model parameters and G = [max.sub.x[member of]x] V(x) (Kiefer and Wolfowitz 1960). Exact G efficiencies are known for central composite designs on the hypercube (Borkowski 1995). However, G efficiencies are typically approximated. For example, the SAS[c]OPTEX procedure (SAS Institute 1995) and ECHIP[c] (Wheeler 1993) minimize p/G over finite sets of points in the experimental design region.
The IV criterion, like G efficiency, is evaluated over a continuous design region and thus involves integration over the design space. Software packages approximate the IV criterion (or a scaled version of it) by averaging the scaled prediction variance over a subset of points in the design space. The approximation, however, can be very poor (Borkowski 2000). In this article, exact IV-criterion values are generated by evaluating the appropriate integrals.
2. COMPOSITE, COMPUTER-GENERATED, AND SMALL RESPONSE SURFACE DESIGNS
The central composite design (CCD), first introduced by Box and Wilson (1951), is one of the most popular response surface designs. The typical k-factor CCD consists of (a) f factorial points ([+ or -]1, [+ or -]1,..., [+ or -]1) from a full or fractional factorial design of at least resolution V, (b) [n.sub.0] center points (0, 0,...,0), and (c) 2k star points ([+ or -][alpha], 0,..., 0),...,(0,...,0, [+ or -][alpha]). Thus, N = f + 2k + [n.sub.0] for the CCD. See Lucas (1974, 1976), Giovannitti-Jensen and Myers (1989), and Myers, Vining, Giovannitti-Jensen and Myers (1992) for a discussion of the optimality properties of CCD's and a comparison of CCD's to other designs. See Borkowski and Valeroso (1996) for results on the evaluation of the D-, G-, A-, and IV-optimality criteria for CCD's.
Two other smaller composite designs are also considered: the small composite designs (SCD's) of Hartley (1959) for k = 3, 4 and the Plackett--Burman composite designs (PBCD's) of Draper (1985) for k = 4, 5. Both designs, like the CCD's, consist of factorial points, axial points, and center points. However, unlike the CCD, the SCD employs a fractional factorial design less than resolution V, while the factorial points of the PBCD are the points from a Plackett--Burman design. See Draper and Lin (1990) and Myers and Montgomery (1995) for a more detailed discussion of SCD's.
Other smaller designs are also studied. The Hoke (1974) and Notz (1982) designs consist of a subset of the [3.sup.k] factorial design. For each k, there are seven Hoke designs designated Dl to D7. The Dl to D3 designs have [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] points and the D4-D7 designs have [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] points.
For k = 3,4, the Notz and Hoke Dl designs are identical, and for k = 5, a CCD with no center points and the Hoke D4 design are identical. For each k, Hoke designs from Dl to D3 and from D4 to D7 were selected based on performance across the four optimality criteria while also remaining distinct from the CCD and the Notz designs mentioned. The Box-Draper (1974) designs were also included.
Computer packages are available that can generate designs for prespecified design sizes. This availability and flexibility contributed significantly to the popularity of computer--generated designs, especially for use in industrial applications requiring designed experiments. A design optimality criterion is used in algorithmic design generation, with the D criterion being the most common criterion. Moreover, a user must specify the model and, typically, must also define a candidate set of permissible design points. In this article, designs were generated using ECHIP and PROC OPTEX of SAS.
The ECHIP and SAS OPTEX computer-generated designs are given in Table 1. The CCD's, SCD's, PBCD's, Notz, Hoke, and Box--Draper designs are readily available in the literature. All of these designs can also be found at the research Web site www.math.montana.edu/~umsfjbor.
3. OPTIMALITY CRITERIA FOR REDUCED MODELS
The robustness of a design against model misspecification is quantified by calculating the D, G, and A efficiencies and the IV criterion. These values are computed for the proposed model in (1) and for "reasonable" reduced models that are formed by removing terms based on hierarchy. That is, as follows:
1. if a model contains an [x.sup.2.sub.i] term, then it must contain the corresponding [x.sub.i] term.
2. if a model contains an [x.sub.i][x.sub.j] term, then it must contain the corresponding [x.sub.i] and/or [x.sub.j] term.
Table 2 contains the 43 models considered when k = 3. The 1s and 0s in the L, Q, and C columns indicate, respectively, the presence or absence of that term in the reduced model. Columns indicating the number of model parameters (p), the number of design variables present in the model (dv), and the number of linear (l), quadratic (q), and cross-product (c) terms are also included in these models. The importance of the dv, q, c, and l values will be seen in Section 5 when the optimality criteria are compared. Because 224 and 839 models were considered in the k = 4 and k = 5 factor cases, respectively, the corresponding tables are not included but can found at the research Web site.
For each of the designs considered, robustness was quantified by calculating the following D, G, A, and IV design optimality measures over reduced models of the second-order model in (1):
D efficiency = 100 [[absolute value of X'X].sup.1/p]/N,
G efficiency = 100 p/N / [[sigma].sup.2.sub.max]
A efficiency = 100 p/trace[N[(X'X).sup.-1]],
IV criterion = N[[sigma].sup.2.sub.ave], (2)
where N is the design size, p is the number of model parameters, [[sigma].sup.2.sub.ave] is the average of f'(x)[(X'X).sup.-1]f'(x) over the design space, and [[sigma].sup.2.sub.max] is the maximum of f'(x)[(X'X).sup.-1]f'(x) approximated over the set of points from a [5.sup.k] factorial designs (with factor levels 0, [+ or -].5, [+ or -]1). These D- and A-efficiency measures represent the percent of the number of runs required by a hypothetical orthogonal design to achieve the same [absolute value of X'X] and trace[N[(X'X).sup.-1]]. G efficiency and the IV criterion are based on the scaled prediction variance function.
These design optimality measures are used to compare designs across the set of reduced models. For the designs considered in this article, values of the optimality criteria in (2) were generated using Matlab software (Math Works 1999). For economy, the D, A, and G efficiencies and the IV criterion in (2) are denoted simply as D, A, G, and IV. For the CCD's, exact evaluation of D, G, A, and IV is also possible by using Equations (3), (4), (5), and (6) given in Section 4.
4. D, G, A, AND IV FOR THE CCD
Let X be the expanded design matrix of a CCD associated with the response surface model given in (1) on k design variables [x.sub.1], [x.sub.2], ..., [x.sub.k]. For each reduced model, let l = number of linear terms, c = number of cross-product terms, and q = number of pure quadratic terms. After performing the tedious matrix algebra, we have the following results:
D = 100[\X'X\.sup.l/p]/N = 100[[[omega].sup.lf.sup.c][[delta].sup.q-1]d].sup.l/p]/N (3)
and
A = 100p/trace[N[(X'X).sup.-1]]
= 100p/N([delta]+kf)/d + k/[omega] + k(k-1)/2f + k/[delta](1 - (Nf - [[omega].sup.2])/d), (4)
where [omega] = f + 2[[alpha].sup.2], [delta] = 2[[alpha].sup.4] and d = N[delta] + kNf - k[[omega].sup.2]. G and IV depend on the scaled prediction variance V(x) = Nf'(x)[(X'X).sup.-1]f(x), where f'(x) = ([x.sub.1],...,[x.sub.k],[x.sub.1][x.sub.2],...,[x.sub.k-1][x.sub.k], [x.sup.2].sub.1],...,[x.sup.2.sub.k]. It can be shown that the G efficiency of a CCD can be expressed as
G = 100p/[max.sub.x[epsilon]x] V(X)' (5)
where
V(X) = [B.sub.0] + 2[B.sub.1] [summation over (i=1) .sup.q] [x.sup.2.sub.i] + 1/[omega] [summation over (i=1) [1] [x.sup.2.sub.i] + 1/f [summaration over (i[not equal to]j] [c] [x.sup.2.sub.i] [x.sup.2.sub.j]
+ 1/[delta] [B.sub.2]([[summation over (i=1) [q] [x.sup.2.sub.i]).sup.2] + [summation over (i=1) .sup.q] [x.sup.4.sub.i]],
[B.sub.0] = ([delta] + fq)/d, [B.sub.1] = -[omega]/d, and [B.sub.2] = (Nf - [[omega].sup.2]/d. Similarly, we can show that for a CCD on the hypercube, IV is given by
IV = N {[B.sub.0] + l/3[omega] + c/9f + 2q[B.sub.1]/3 + q/15[delta][9-c(4+5q)]}. (6)
5. DESIGN COMPARISONS of D, A, G, AND IV
In this section, we compare D, G, A, and IV for sets of reduced models for various response surface designs on the hypercube. Recall that large D-, A-, and G-efficiency and small IV-criterion measures are desirable.
5.1 The Central Composite Designs
We first examine the k = three-, four-, and five-factor CCD's. Figure 1, (a), (b), and (c), shows plots of D, G, and IV for the three-factor 18-point (four-center-point) CCD against the number of model parameters. Figure 1, (d), (e), and (f), shows the analogous plots for the four-factor 28-point (four-center-point) CCD. The plotting symbol is q, the number of [x.sup.2.sub.i] terms in the reduced model. The A-efficiency plots are not included because they are very similar to the D-efficiency plots. The following patterns exist for the three-factor 18-point and the four-factor 28-point CCD's, as well as for three, four, or five-factor CCD's with varying numbers of center points. (Recall that l, c, and f represent the number of linear, cross-product, and quadratic terms, respectively.)
1. For D and A: Distinct groups are formed by the values of q, with D, and A increasing dramatically as q decreases. Although D and A tend to decrease as l and c decrease, they are less variable for all models having the same q (the same number of [x.sup.2.sub.i] terms) when q > 0. That is, D and A are more robust to changes in l and c than to changes in q. In the unlikely case in which all [x.sup.2.sub.i] terms are removed (q = 0), D and A increase as l and c decrease. D and A for the full model are, in general, conservative (i.e., tend to be smaller) relative to the set of reduced models. Therefore, the full-model D and A efficiencies are deflated measures when compared to the potential reduced-model efficiencies for k = 3, 4, and 5.
2. For G: Removing an [x.sup.2.sub.i] term from a model has varying effects on G. The decrease in G when [x.sup.2.sub.i] is removed is greater if cross-product terms involving [x.sub.i] are present. If these cross-products are not present, then G changes little. Moreover, G for the full model is quite liberal (i.e., tends to be larger) relative to the set of reduced models. Therefore, the full-model G efficiency is an inflated efficiency measure when compared to the potential reduced-model efficiencies for k = 3, 4, and 5.
3. For IV: Distinct groups are formed by the values of q. Figure 1, (c) and (f) shows diagonal IV bands based on the values of q. There is a large decrease in IV when an [x.sup.2.sub.i] term is removed, while there is only a small decrease in IV when an [x.sub.i] or [x.sub.i][x.sub.j] term is removed. Thus, IV is more robust for all models having the same q than for models having the same l or c. The robustness, however, becomes poorer as q decreases.
One question of importance is "What are the effects on D, A, G, and IV for the CCD when center, star, and factorial points are replicated?" To address this question, seven CCD's were compared for k = 3, 4, 5. If [n.sub.0], [n.sub.1], and [n.sub.2], respectively, are the number of center-point, star-point, and factorial-point replicates, then the design size N = [n.sub.0] + 2[n.sub.1]k + [n.sub.2]f. The seven CCD's compared have ([n.sub.0], [n.sub.1], [n.sub.2]) = (0, 1, 1), (2, 1, 1), (4, 1, 1), (0, 2, 1), (2, 2, 1), (0, 1, 2), and (2, 1, 2).
Three comparisons are performed for each k--(1) across the full set of models, (2) across the set of models having at least one squared term (q > 0), and (3) across the set of models having at least two squared terms (q > 1). For each comparison, the percentage of models for which one design is superior to another is determined for each of the four optimality criteria. These percentages are then ranked. The ranks are summarized in Table 3. For k = 3 and 4, each row/column table entry contains three ranks ([r.sub.0], [r.sub.1], [r.sub.2]). Each rank ranges from 1 ("best") to 7 ("worst"). Rank [r.sub.i] (i = 0, 1, 2) represents that design's rank relative to the other six designs. For k = 5, each row/column table entry contains a fourth comparison--(4) across the set of models having at least three squared terms (q > 2). In several cases, there are tied ranks. The boldface ranks represent the best overall set of ranks. Table 3 indicates the following general results:
1. Replicating center points (increasing [n.sub.0]) tends to reduce D, A, and G whether or not star points are replicated ([n.sub.1] = 1, 2). The only exception is for k = 3 when increasing [n.sub.0] from 0 to 2 will improve A, but then it decreases when [n.sub.0] = 4. When factorial points are replicated ([n.sub.2] = 2), increasing [n.sub.0] will slightly improve A. The effect of increasing [n.sub.0] on IV is inconsistent and depends on k, [n.sub.1], and [n.sub.2].
2. Replicating star points (increasing [n.sub.1]) tends to reduce D and G, while it tends to improve A and IV. On the other hand, replicating factorial points (increasing [n.sub.2]) tends to worsen A, G, and IV, while it tends to improve D.
3. D-efficient designs have no center points but replicated factorial points; that is, ([n.sub.0], [n.sub.1], [n.sub.2]) = (0, 1, 2). A-efficient designs and designs with smaller average prediction variances (low IV) have no center points but replicated star points; that is, ([n.sub.0], [n.sub.1], [n.sub.2]) = (0, 2, 1). G-efficient designs have no center points and unreplicated factorial and star points, that is, ([n.sub.0], [n.sub.1], [n.sub.2]) = (0, 1, 1). Unfortunately, these G-efficient designs will not allow a check for model lack of fit.
This table suggests that replication affects the different criteria in very different ways. That is, what improves one criterion may be detrimental to a different criterion. Therefore, the decision rests with the experimenter to weight these criteria based on professional preference.
5.2 CCD's Versus Computer-Generated Designs
In this section we present two comparisons, between two 18-point and between two 28-point designs. CCD's are compared to the ECHIP-generated designs having four replicated points shown in Table 1.
Before we can compare these designs, first note that the ECHIP designs are asymmetric with respect to an efficiency criterion. That is, the value of the criterion is not necessarily unique over the set of permutations of the design variables for any particular reduced model. Equivalently, this says that relabeling the design variables may yield multiple optimality criterion values for certain reduced models. For example, when k = 3, there are six permutations of [x.sub.1], [x.sub.2], and [x.sub.3] (or, there are six ways to assign factors to the columns of the design matrix). Note, however, that, if the [x.sub.2] and [x.sub.3] columns in the ECHIP design are switched, the design remains the same. This is not the case if the [x.sub.1] and [x.sub.2] (or [x.sub.1] and [x.sub.3]) columns are switched. Thus, the efficiencies across the 43 reduced models in Table 2 will not all be the same if the [x.sub.1] and [x.sub.2] (or [x.sub.1] and [x.sub.3]) columns of the ECHIP design are switched. Therefore, to make a fair comparison of the CCD to the ECHIP design, D, A, G, and IV will be calculated for all relevant permutations of the design variables. The k = 3 results are displayed in Figures 2, (a), (b), and (c). The k = 4 results are displayed in Figure 2, (d), (e), and (f). Each plot contains a reference line indicating when D, A, G, and IV are equal. The comparison of the designs shows the following:
1. For D and A: Groups cluster by q. D and A for the ECHIP design are larger than the corresponding D and A for the CCD for almost all reduced models. This is not surprising because ECHIP's design points were selected to maximize D. Differences in D and A (or deviations from the line) increase as q decreases.
2. For G: The CCD is superior to the ECHIP design in 85% and 61% of the models for k = 3 and 4, respectively. In those cases, when the ECHIP design is superior to the CCD, the majority of the reduced models contain no [x.sup.2.sub.i] terms (q = 0).
3. For IV: Distinct groups form by q, and the CCD is always superior to the ECHIP design whenever q > 0. The CCD is inferior only in the unlikely case that no squared terms are in the model (q = 0). As q decreases from k to 1, differences in IV decrease.
In summary, the ECHIP designs are clearly superior to the CCD's based on D and A. However, the CCD's are, in general, superior to the ECHIP designs based on G and IV except in the unlikely cases when no squared terms are in the reduced model. We observed similar results when the CCD with a different number of center points is compared to an ECHIP design with a comparable number of replications. It should also be noted that forcing replication at the center of the CCD is detrimental because it tends to lower D, A, and G and raise IV. These optimality criteria would improve if replication occurred at the factorial or axial points instead.
5.3 Other Three-Factor Designs
In this section we summarize the optimality criteria comparisons for 9 three-factor designs -- the Box -- Draper, 18-point CCD, Hoke D2, Hoke D6, Notz, SCD, and the three k = 3 computer-generated designs in Table 1. Because the computer-generated designs are asymmetric, results will depend on the assignment of experimental factors to design variable labels. D, A, G, and IV will be calculated for all relevant permutations of the design variables.
For each pairwise design comparison, we determined the percentage of all models in which the efficiency for Design 1 is superior to Design 2. Summary tables of D, A, G, and IV for all of the symmetric designs can found at the research Web site. Tables 4, 5, and 6 will help put the comparisons into perspective. As in Table 3, each row/column entry for k = 3 in Tables 4, 5, and 6 contains three ranks ([r.sub.0], [r.sub.1], [r.sub.2]) but also contains an italicized rank [r.sub.f] below. Rank [r.sub.i] (i = 0, 1, 2) represents that design's rank relative to the other eight designs when at least i squared terms (q [greater than or equal to] i) are in the model. For i = 0, 1, 2, respectively, there are 43, 31, and 15 models. Rank [r.sub.f] is a design's rank when only the proposed second-order model in (1) is considered.
Based on the k = 3 results in Table 4, the CCD is the superior design for A, G, and IV, but fares poorly based on D. We personally favor G and IV over D and A as design evaluation criteria because the fitted response surface model will most often be used for prediction and optimization and G and IV are based on the scaled prediction variance. Thus, we recommend the CCD over any of the other eight designs. If there are not enough resources to run the 18-point CCD, the Hoke D6 and the Box-Draper designs are robust with respect to all four criteria and require five and eight fewer runs than the CCD, respectively. If desired, several replicated center points could be added to these two designs to allow for a lack-of-fit check.
Table 4 shows some variability in the k = 3 full-model-only ranks and the ranks for the different reduced-model subsets. For example, the Notz design has the worst rank (9) for A, G, and IV for the full model only but is ranked higher when the reduced models are considered. This should be of concern to experimenters who base design selection only on the second-order model in (1) and ignore the likelihood that the ultimate model will be a reduction of it.
5.4 Other Four-Factor Designs
In this section we summarize the reduced-model results for 9 four-factor designs--the Box--Draper, 28-point CCD, Hoke D2, Hoke D5, Notz, SCD, PBCD, and the two k = 4 computer-generated designs in Table 1. The setup for k = 4 results in Table 5 is analogous to the k = 3 setup, except that we consider ranks [r.sub.0], [r.sub.1], and [r.sub.2] for 224, 181, and 109 models.
Based on the k = 4 results in Table 5, the CCD is the superior design for G and IV, and it also fares very well based on D and A. Thus, we recommend the CCD over any of the other eight designs. If there are not enough resources to run the 28-point CCD, the PBCD and the Hoke D5 design are robust with respect to all four criteria and require seven and nine fewer runs than the CCD, respectively. Replicated center points could be added to these two designs to allow for a lack-of-fit check.
Table 5 shows little variability in the k = 4 full-model-only ranks and the ranks for the different reduced-model subsets. Only G for the Hoke D5 design has a rank (7) for the full model only which is very different than the ranks of G for the three sets of reduced models.
5.5 Five-Factor Designs
In this section we summarize the reduced-model results for 6 five-factor designs--the Box--Draper, 30-point CCD, Hoke D2, Hoke D7, PBCD, and Notz. The setup for k = 5 results in Table 6 is analogous to the k = 3 setup, except that we consider four ranks [r.sub.0], [r.sub.1], [r.sub.2], and [r.sub.3] for 839, 736, 525, and 289 models.
Based on the k = 5 results in Table 6, the Hoke D7 is marginally the superior design with the CCD being a very close second. The Hoke D7 design, however, requires four fewer points. Among the saturated designs, the Hoke D2 design is marginally better than the Notz design. For k = 5, the Box--Draper and the PBCD perform very poorly across all four criteria. The full-model ranks are very consistent with the reduced-model ranks.
6. SUMMARY
By considering a large number of potential reduced models and through numerous examples, we have shown that design optimality criteria can be sensitive to deviations from the full second-order response surface model. When a researcher must decide which response surface design is "best" based on one or more design optimality criteria, it is important that the optimality criteria be determined over a subset of possible reduced models.
Not surprisingly, the CCD is robust with respect to the set of reduced models as well as across the four optimality criteria. Less surprisingly, the larger Hoke designs possess similarly good properties as the CCD. The computer-generated designs do not perform very well across the four criteria and across the set of reduced models.
If an experiment is run and the results are expected to be analyzed using a reduced model determined from a reduction of the full second-order response surface model, then the practitioner should exercise caution when choosing a design. In general, D, A, G, and IV are not robust across reduced models. They vary primarily in the following ways:
1. with the removal of [x.sup.2.sub.i] terms and, to a lesser extent, the [x.sub.i][x.sub.j] terms
2. when models contain differing numbers of design variables (different dv values) (the differences in D, A, G, and IV occur because the new design space becomes a projection of the original k-dimensional hypercube design space into a lower-dimensional hypercube and, hence, has greater space filling properties in the lower-dimensional hypercube design space)
3. for asymmetric designs because of the dependence on the assignment of design factors to the variable labels
If the experimenter has limited resources (e.g., resources do not allow a CCD with replicated center points), then one might start with a small design that is robust to multiple optimality criteria and then augment it with additional points. For example, the Box-Draper design for k = 3 and the Hoke D2 design for k 4, 5 perform well with respect to multiple criteria. Adding center-point replicates or replicates of existing design points would allow a check for lack of fit while exploiting the design's beneficial optimality properties despite the small sizes.
As should always be the case, the experimenter should, when possible, use any prior knowledge regarding what potential reduced models are more likely to occur.
ACKNOWLEDGMENTS
We thank the editor, associate editor, and two referees for their constructive insights and recommendations.
REFERENCES
Atkinson, A. C., and Donev, A. N. (1992), Optimum Experimental Designs, Oxford, U.K.: Clarendon.
Booth, K. H. V., and Cox, D. R. (1962), "Some Systematic Supersaturated Designs," Technometrics, 4, 489-495.
Borkowski, J. J. (1995), "Spherical Prediction Variance Properties of Central Composite and Box-Behnken Designs," Technometrics, 37, 399-410.
_____ (2000), "Estimation and Exact Evaluation of the Average Scaled Prediction Variance for Response Surface Designs," Technical Report 6-8-00, Montana State University, Dept. of Mathematical Sciences.
Borkowski, J. J., and Valeroso, E. S. (1996), "Exact Evaluation of the D, G, A, and IV-Optimality Criteria for the Central Composite and Box-Behnken Designs," Technical Report 9-1-96, Montana State University, Dept. of Mathematical Sciences.
Box, G. E. P., and Draper, N. R. (1959), "A Basis for the Selection of a Response Surface Design," Journal of the American Statistical Association 54, 622-654.
_____ (1963), "A Choice of a Second Order Rotatable Design," Biometrika. 50, 335-352.
_____ (1987), Empirical Model Building and Response Surfaces, New York: Wiley.
Box, G. E. P., and Wilson, K. B. (1951), "On the Experimental Attainment of Optimum Conditions," Journal of the Royal Statistical Society, 13, 1-45.
Box, M. J. B., and Draper, N. R. (1974), "On Minimum-Point Second-Order Designs," Technometrics, 16, 613-616.
Draper, N. R. (1985), "Small Composite Designs," Technoinetrics, 27, 173-180.
Draper, N. R., and Lin, D. K. J. (1990), "Small Response-Surface Designs," Technometrics, 32, 187-194.
Giovannitti-Jensen, A., and Myers, R. H. (1989), "Graphical Assessment of the Prediction Capability of Response Surface Designs," Technometrics, 31, 159- 17 1.
Hartley, H. 0. (1959), "Smallest Composite Designs for Quadratic Response Surfaces," Biometrics, 15, 611-624.
Hoke, A. T. (1974), "Economical Second-Order Designs Based on Irregular Fractions of the 3" Factorial," Technometrics, 16, 375-384.
Karson, M. J., Manson, A. R., and Hader, R. J. (1969), "Minimum Bias Estimation and Experimental Designs for Response Surfaces," Technometrics, 11,461-475.
Khuri, A. I., and Cornell, J. A. (1996), Response Surfaces (2nd ed.), New York: Marcel Dekker.
Kiefer, J., and Wolfowitz, J. (1960), "The Equivalence of Two Extremum Problems," Canadian Journal of Mathematics, 12, 363-366.
Lin, D. K. J. (1993), "A New Class of Supersaturated Designs," Technometrics, 35, 28-31.
_____ (1995), "Generating Systematic Supersaturated Designs," Technometrics, 37, 213-225.
Lucas, J. M. (1974), "Optimum Composite Designs." Technometrics, 16, 561-567.
_____ (1976), "Which Response Surface Is Best," Technometrics, 18, 411-417.
MathWorks (1999), Matlab Student Version: Learning Matlab Version 5.3, Natick, MA: Author.
Myers, R. H., and Montgomery, D. A. (1995), Response Surface Methodology, New York: Wiley.
Myers, R. H., Vining, G. G., Giovannitti-Jensen, A., and Myers, S. L. (1992), "Variance Dispersion Properties of Second Order Response Surface Designs," Journal of Quality Technology, 24, 1-11.
Notz, W. (1982), "Minimal Second-Order Designs," Journal of Statistical Planning and Inference, 6, 47-58.
SAS Institute (1995), SAS/QC Software: Usage & Reference, Version 6 (Vol. 1), Cary, NC: Author.
St. John, R. C., and Draper, N. R. (1975), "D-optimality for Regression Designs: A Review," Technometrics, 17, 15-23.
Wheeler, B. (1993), ECHIP Reference Manual: Version 6 for Windows, Hokessin, DE: ECHIP, Inc.
Table 1
Computer-Generated Designs
11-pt ECHIP 11-pt OPTEX
[X.sub.1] [X.sub.2] [X.sub.3] [X.sub.1] [X.sub.2] [X.sub.3]
-1 -1 -1 -1 -1 1
-1 -1 1 -1 1 -1
1 1 -1 -1 1 1
-1 1 1 1 -1 -1
1 -1 1 1 -1 1
1 1 -1 1 1 1
1 1 1 1 1 -1
0 -1 -1 -1 -.1 0
1 -1 0 -.4 -1 -1
1 0 -1 -.3 .3 1
0 0 0 .6 1 .2
18-PT ECHIP 15-pt ECHIP
[X.sub.1] [X.sub.1] [X.sub.2] [X.sub.3] [X.sub.1] [X.sub.2]
-1 -1 -1 -1 -1 -1
-1 -1 -1 1 -1 -1
1 -1 1 -1 -1 1
-1 -1 1 1 -1 1
1 1 -1 -1 1 -1
1 1 -1 1 1 -1
1 1 1 -1 1 1
0 1 1 1 1 1
1 -1 0 0 1 -1
1 0 1 0 1 1
0 0 0 1 -1 -1
1 -1 0 -1 1
1 0 -1 0 -1
0 -1 -1 -1 0
-1 -1
15-pt ECHIP 28-pt ECHIP
[X.sub.1] [X.sub.3] [X.sub.4] [X.sub.1] [X.sub.2] [X.sub.3]
-1 -1 -1 1 1 -1
-1 -1 1 1 -1 1
1 1 -1 -1 1 1
-1 1 1 -1 -1 -1
1 1 -1 1 -1 1
1 1 1 1 1 1
1 -1 -1 -1 1 1
0 -1 1 -1 -1 -1
1 -1 0 1 1 1
1 1 0 1 1 -1
0 1 0 0 -1 1
-1 0 0 1 -1
-1 0 -1 0 1
-1 0 1 0 -1
0 0 -1 1 0
1 -1 0
1 -1 -1
-1 1 -1
1 1 1
-1 -1 1
-1 -1 0
-1 0 -1
0 -1 -1
0 0 0
28-pt ECHIP
[X.sub.1] [X.sub.4]
-1 -1
-1 -1
1 -1
-1 -1
1 1
1 -1
1 1
0 1
1 1
1 1
0 1
1
1
1
1
1
0
0
0
0
0
0
0
-1
NOTE: Italics indicate two replicates of that point.
Table 2.
Reduced Models (k = 3)
Design p dv L Q C (l, q, c)
1 10 3 (1,1,1) (1,1,1) (1,1,1) (3,3,3)
2 9 3 (1,1,1) (0,1,1) (1,1,1) (3,2,3)
3 9 3 (1,1,1) (1,1,1) (0,1,1) (3,3,2)
4 8 3 (1,1,1) (0,0,1) (1,1,1) (3,1,3)
5 8 3 (1,1,1) (0,1,1) (0,1,1) (3,2,2)
6 8 3 (1,1,1) (1,1,0) (0,1,1) (3,2,2)
7 8 3 (1,1,1) (1,1,1) (0,0,1) (3,3,1)
8 8 3 (0,1,1) (0,1,1) (1,1,1) (2,2,3)
9 7 3 (1,1,1) (0,0,0) (1,1,1) (3,0,3)
10 7 3 (1,1,1) (0,0,1) (0,1,1) (3,1,2)
11 7 3 (1,1,1) (0,1,0) (0,1,1) (3,1,2)
12 7 3 (1,1,1) (0,1,1) (0,0,1) (3,2,1)
13 7 3 (1,1,1) (1,1,0) (0,0,1) (3,2,1)
14 7 3 (1,1,1) (1,1,1) (0,0,0) (3,3,0)
15 7 3 (0,1,1) (0,0,1) (1,1,1) (2,1,3)
16 7 3 (0,1,1) (0,1,1) (0,1,1) (2,2,2)
17 6 3 (1,1,1) (0,0,0) (0,1,1) (3,0,2)
18 6 3 (1,1,1) (0,0,1) (0,0,1) (3,1,1)
19 6 3 (1,1,1) (1,0,0) (0,0,1) (3,1,1)
20 6 3 (1,1,1) (0,1,1) (0,0,0) (3,2,0)
21 6 3 (0,1,1) (0,0,0) (1,1,1) (2,0,3)
22 6 3 (0,1,1) (0,0,1) (0,1,1) (2,1,2)
23 6 3 (0,1,1) (0,1,0) (0,1,1) (2,1,2)
24 6 2 (0,1,1) (0,1,1) (0,0,1) (2,2,1)
25 6 3 (0,1,1) (0,1,1) (0,1,0) (2,2,1)
26 5 3 (1,1,1) (0,0,0) (0,0,1) (3,0,1)
27 5 3 (1,1,1) (0,0,1) (0,0,0) (3,1,0)
28 5 3 (0,1,1) (0,0,0) (0,1,1) (2,0,2)
29 5 2 (0,1,1) (0,0,1) (0,0,1) (2,1,1)
30 5 3 (0,1,1) (0,1,0) (0,1,0) (2,1,1)
31 5 3 (0,1,1) (0,0,1) (0,1,0) (2,1,1)
32 5 2 (0,1,1) (0,1,1) (0,0,0) (2,2,0)
33 5 3 (0,0,1) (0,0,1) (0,1,1) (1,1,2)
34 4 3 (1,1,1) (0,0,0) (0,0,0) (3,0,0)
35 4 2 (0,1,1) (0,0,0) (0,0,1) (2,0,1)
36 4 3 (0,1,1) (0,0,0) (0,1,0) (2,0,1)
37 4 2 (0,1,1) (0,0,1) (0,0,0) (2,1,0)
38 4 3 (0,0,1) (0,0,0) (0,1,1) (1,0,2)
39 4 2 (0,0,1) (0,0,1) (0,0,1) (1,1,1)
40 3 2 (0,1,1) (0,0,0) (0,0,0) (2,0,0)
41 3 2 (0,0,1) (0,0,0) (0,0,1) (1,0,1)
42 3 1 (0,0,1) (0,0,1) (0,0,0) (1,1,0)
43 2 1 (0,0,1) (0,0,0) (0,0,0) (1,0,0)
NOTE: P = # of parameters; dv = # of design variables appearing in the
reduced model. Model terms: L = ([x.sub.1], [x.sub.2], [x.sub.3]); Q =
([[x.sup.2].sub.1] [[x.sup.2].sub.2], [[x.sup.2].sup.3]); C =
([x.sub.1][x.sub.2], [x.sub.1][x.sub.3], [x.sub.2] [x.sub.3]).
Table 3.
CCD Comparison Ranking of D, A, G, and IV
Design criterion (k = 3)
[n.sub.0], [n.sub.1], [n.sub.2] N D A G IV
(0,1,1) 14 3,3,1 3,5,4 2,1,1 5,5,5
(2,1,1) 16 4,4,4 1,2,3 4,2,2 1,3,3
(4,1,1) 18 6,6,7 4,4,5 6,6,5 3,4,4
(0,2,1) 20 5,5,5 2,1,1 5,5,3 3,2,2
(2,2,1) 22 7,7,6 5,2,2 7,7,7 3,1,1
(0,1,2) 22 1,1,2 7,7,7 2,3,4 7,7,7
(2,1,2) 24 2,2,3 6,6,6 2,4,6 6,6,6
Design criterion (k = 4)
[n.sub.0], [n.sub.1], [n.sub.2] N D A G IV
(0,1,1) 24 3,3,1 3,4,3 1,1,2 5,5,5
(2,1,1) 26 4,4,4 4,3,4 2,2,4 4,4,3
(4,1,1) 28 5,5,6 5,5,5 3,4,5 3,3,4
(0,2,1) 32 6,6,5 1,1,1 4,3,1 1,1,1
(2,2,1) 34 7,7,7 2,2,2 5.5,5,3 2,2,2
(0,1,2) 40 1,1,2 7,7,7 6,6,6 7,7,7
(2,1,2) 42 2,2,3 6,6,6 6.5,7,7 6,6,6
Design criterion (k = 5)
[n.sub.0], [n.sub.1], [n.sub.2] N D A G IV
(0,1,1) 26 3,3,3,2 2,3,3,3 1,1,1,1 2,2,3,3
(2,1,1) 28 4,4,4,4 4,4,4,4 2,2,2,3 3,3,4,4
(4,1,1) 30 5,5,5,5 5,5,5,5 3,3,4,5 5,5,5,5
(0,2,1) 36 6,6,6,6 1,1,1,1 6,5,3,2 1,1,1,1
(2,2,1) 38 7,7,7,7 3,2,2,2 7,7,5,4 4,4,2,2
(0,1,2) 42 1,1,1,1 6,7,6,6 4,4,6,6 7,7,7,7
(2,1,2) 44 2,2,2,3 7,6,7,7 5,6,7,7 6,6,6,6
Table 4.
Design Comparison Ranking of D, A, G, and IV: k = 3
Design Box- Hoke Hoke ECHIP ECHIP
criterion Draper CCD D2 D6 Notz SCD (N = 11) (N = 18)
D 7,6,3 8,8,8 6,7,7 2,1,1 5,4,5 9,9,9 1,2,4 3,3,2
4 8 6.5 1 6.5 9 3 2
A 3,3,3 1,1,1 5,5,6 2,2,3 7,7,7 9,6,5 6,8,8 4,4,4
3 1 5 2 9 8 6 4
G 4,4,4 2,2,2 8,8,8 1,1,1 7,7,7 9,9,9 5,5,5 3,3,3
2 1 7 3 9 8 6 4
IV 2,2,2 1,1,1 8,8,8 3,3,3 5,6,6 9,5,4 4,7,7 6,4,5
3 1 5 2 9 8 6 4
Design
criterion OPTEX
D 4,5,6
5
A 8,9,9
7
G 6,6,6
7
IV 7,9,9
7
NOTE: For k = 3, ([r.sub.0], [r.sub.1], [r.sub.2]) = rank across
(43, 31, 15) models with at least (0, 1, 2) squared terms.
Italicized value = rank for full model only.
Table 5.
Design Comparison Ranking of D, A, G, and IV: k = 4
Design Box- Hoke Hoke ECHIP
criterion Draper CCD D2 D5 Notz SCD PBCD (N = 15)
D 8,8,7 5,5,5 3,3,3 2,2,2 4,4,4 9,9,9 7,6,6 6,7,8
8 3 4.5 2 4.5 9 7 6
A 8,8,8 1,1,2 6,7,7 3,3,3 5,6,6 7,5,5 2,2,1 9,9,9
7 2 6 3 5 8 1 9
G 4,4,4 1,1,1 5,5,6 3,3,3 6,6,7 9,8,8 7,7,5 8,9,9
3 1 4.5 7 4.5 9 6 8
IV 8,8,8 1,1,1 4,4,4 3,3,3 5,5,5 7,6,6 2,2,2 9,9,9
7 1 4 3 5 8 2 9
Design ECHIP
criterion (N = 28)
D 1,1,1
1
A 4,4,4
4
G 2,2,2
2
IV 6,7,7
6
NOTE: For k = 4, ([r.sub.0], [r.sub.1], [r.sub.2]) = rank across (224,
181, 109) models with at teast (0,1,2) squared terms. Italicized value =
rank for full model only.
Table 6.
Design Comparison Ranking of D, A, G, and IV: k = A5
Design Box- Hoke Hoke
criterion Draper CCD D2 D7 PBCD Notz
D 5,5,5,6 4,4,4,4 2,2,3,3 3,3,2,1 6,6,6,5 1,1,1,2
5 4 3 2 6 1
A 6,6,6,6 2,2,2,2 3,3,4,4 1,1,1,1 4,4,3,3 5,5,5,5
6 2 3 1 5 4
G 5,5,5,5 2,3,2,2 3,2,3,3 1,1,1,1 6,6,6,6 4,4,4,4
5 2 3 1 6 4
IV 6,6,6,6 1,1,1,1 3,3,4,4 2,2,2,2 4,4,3,3 5,5,5,5
6 1 3 2 5 4
NOTE: For k = 5, ([r.sub.0], [r.sub.1], [r.sub.2], [r.sub.3]) = rank
across (839,736,226,96) model with at least (0,1,2,3) squared terms.
Italicized value = rank for full model only.