Supersaturated designs and associated analysis methods have been proposed by several authors to identify active factors in situations in which only a very limited number of experimental runs is available. We use simulation to evaluate the abilities of the existing methods to achieve model identification-related
KEY WORDS: Optimal experimental design: Simulation optimization: Stepwise regression.
**********
1. INTRODUCTION
In the context of screening experiments, one might have a number of potentially important or "active" factors larger than or equal to the available number of experimental runs. Also, it may be reasonable to assume that only a few of the factors under consideration have important effects, but that the identity of these factors is unknown before experimentation (Box and Meyer 1986). So-called supersaturated experimental designs and analysis methods have been proposed by several authors for these cases (e.g., Booth and Cox 1962; Lin 1993; Wu 1993), with the goal of identifying a subset that contains the important factors.
As noted by Tang and Wu (1997), most of the research on supersaturated designs has focused on design generation based on surrogate criteria that permit analytical tractability. In particular, much attention has been given to the so-called "average [s.sup.2] optimality" introduced by Booth and Cox (1962), which maximizes the average of all squared pairwise inner products of the design columns. Classes of computer-generated supersaturated designs derived from these criteria include designs of Booth and Cox (1962), Nguyen (1996), and Tang and Wu (1997). Designs from related criteria such as the criteria such as the criteria proposed by Wu (1993) include ones provided by Li and Wu (1997). In addition, other supersaturated designs have been proposed that do not derive from optimality criteria, including randomly generated balanced designs of Satterthwaite (1959) and designs constructed from fractions of Hadamard matrices of Lin (1993) and Yamada and Lin (1997).
Several analysis methods have been proposed for identifying the active factors once the data have been collected using a supersaturated design. Lin (1993), Wang (1995), and Westfall, Young, and Lin (1998) explored the application of stepwise regression. Westfall et al. (1998) and Abraham, Chipman, and Vijayan (1999) concluded that in general, the analysis of supersaturated designs is "very tricky" in the sense that the probability of correct model selection is in general unacceptably low. Yet, those authors stopped short of providing clear criteria as to the advisability of using supersaturated designs and stepwise regression in a specific context. Recently, several authors have begun to explore relatively sophisticated approaches for model identification, including methods based on stochastic search variable selection (SSVS) from George and McCulloch (1993) and intrinsic Bayes factor analysis from Berger and Pericchi (1996). Discussions and reviews of these approaches in the context of supersaturated designs have been given by Chipman (1996), Beattie, Fong, and Lin (2002), and Wu and Hamada (2000, pp. 371-374).
In this article we propose criteria for evaluating supersaturated designs that relate explicitly to model identification-related objectives. We also propose new supersaturated designs based on these criteria. We concentrate on designs that maximize the probability of identifying all of the important factors (coverage probability). For computational convenience, we assume that stepwise regression is used for the analysis. We propose generalizations of our methodology based on alternative analysis procedures for future research. To estimate the probability of correct selection of the important factors, we use simulation and assumptions adapted from the Bayesian analysis literature. We describe several examples of average [s.sup.2] optimal designs that apparently maximize the coverage probability. In addition, we show that some members of the class of average [s.sup.2] optimal designs perform poorly with respect to their predicted ability to achieve model identification-related goals.
The concept of experimental design criteria that combine the rational, interpretable objectives described by Beattie et al. (2002) and the increasingly standard assumptions from George and McCulloch (1993) and Wu and Hamada (2000, pp. 363-366) is apparently new. We use the proposed criteria to provide plots that support decision making related to selection of the design and analysis method most appropriate for the specific needs of the practitioner. We also provide unbalanced supersaturated designs, which have advantages in terms of performance and possibly affordability that we discuss.
In Section 2 we review the assumptions about the true models discussed by Beattie et al. (2002), George and McCulloch (1993), and Wu and Hamada (2000) and adapt them to the context of optimal design generation and method evaluation. We also define new assumptions that avoid certain logical contradictions that we discuss. We present the proposed design criteria in Section 3. In Section 4 we use the proposed criteria and associated plots to evaluate the performance of existing designs from the supersaturated design literature. Special attention is given to the design discussed commonly in the context of the Williams (1968) case study. In Section 5 we provide new supersaturated designs that derive directly from the proposed criteria and simulation optimization. In Section 6 we compare the proposed designs with alternatives using additional simulation studies. Finally, we conclude with a summary of our contributions and opportunities for future research in Section 7.
2. ASSUMPTIONS
In this section we adapt the notation and assumptions from George and McCulloch (1993) and Wu and Hamada (2000, pp. 363-366) to the supersaturated experimental design problem. The central assumption is that experimental data come from a model of the form
y = X[beta] + [epsilon], (1)
where X is the N X k model matrix; [beta] is the k vector, with k = 1 + p + p(p - 1)/2, containing the intercept, main effects, and two-factor interactions for p factors; and [epsilon] [approximately] MN(0, [[sigma].sup.2][I.sub.NXN]). A k vector [delta] of 0s and 1s is introduced to indicate the importance or "significance" of model terms (Wu and Hamada 2000, p. 363). The ith entry [[delta].sub.i] = 1 for i = 0,...,k - 1 indicates that [[beta].sub.i] is large and therefore important. The ith entry [[delta].sub.i] = 0 for i = 0,...,k - 1 indicates that [[beta].sub.i] is small and therefore not important.
One set of assumptions that we investigate is that the distribution function, [pi]([[beta].sub.i]|[[delta].sub.i]), of the coefficient, [[beta].sub.i], conditioned on the vector entry, [[delta].sub.i], is normal with mean 0 and standard deviation that depends on whether a given factor or interaction is important for i = 0,...,k - 1 (George and McCulloch 1993),
[MATHEMATI CAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (2)
where we have included the binary parameters, [q.sub.i], to facilitate exploration of the assumption that unimportant terms have zero coefficients.
Note that we assume in general that experimental data come from models in which interactions may be important. To limit and clarify the effects of interactions, we will set as default that [q.sub.i] = 1 for the main effects (i = 1,...,p) and [q.sub.i] = 0 for the interaction terms (i = p + 1,...,k - 1). Therefore, we concentrate on the assumption that unimportant interactions have zero coefficients and the magnitudes of the other interaction coefficients are adjusted by the parameters [[tau].sub.i] (i = p + 1,...,k - 1). We also focus on the properties of analysis procedures that fit only main effects. These procedures have received the most attention (see, e.g., Lin 1993; Wang 1995; Westfall et al. 1998).
Following Chipman (1996), we assume a prior distribution based on the principles of conditional independence and inheritance, as discussed by Wu and Hamada (2000, pp. 364-365). These principles give prior probabilities [p.sub.i], [p.sub.11], [p.sub.10], and [p.sub.00] as follows:
[p.sub.i] [equivalent to] Pr([[delta].sub.i] = 1),
[p.sub.11] [equivalent to] Pr([[delta].sub.l] = 1| both main effects are important, that is, [[delta].sub.i] = [[delta].sub.j] = 1),
[p.sub.10] [equivalent to] Pr([[delta].sub.l] = 1| either the ith or the ith is important but not both),
[p.sub.00] [equivalent to] Pr([[delta].sub.1] = 1| both effects are not important, that is, [[delta].sub.i] = [[delta].sub.j] = 0), (3)
where the index l refers to the interaction between factors i and j in the [delta] vector for i, j = 1,...,p. Note that this definition implies the assumption that conditional interaction probabilities are the same for all factors.
Note also that for the prior in (2), the probability that at least one supposedly unimportant factor coefficient has a larger absolute value than at least one of the supposedly important factors can be appreciable for realistic values of [c.sub.i]. To avoid this contradiction, we propose the following different prior distribution, similar to the one of Lewis and Dean (2001) for the related context of group screening (i = 1,...,k):
[MATHEMATI CAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (4)
where the threshold constants, [T.sub.i], can be adjusted to make the probability of overlap negligible. We set [T.sub.i] = 0 (i = p + 1,...,k - 1) so that our assumption for important interactions is the same as that used by George and McCulloch (1993). Because, by default, the negligible interactions are exactly 0, there is no concern about logical inconsistency for interactions.
At this point, we depart further from the assumptions used for Bayesian model selection, in part because our purpose relates to experimental design generation before any data have been collected. Therefore, without loss of generality, we fix [sigma] = 1. Also, as is to a great extent standard in the literature, we focus on the case in which no information is available about the relative probabilities of the factors being important. For these situations, we assume that [[tau].sub.i] = [tau], [c.sub.i] = c, and [p.sub.i] = [p.sub.0] for all i = 1,...,p. A notable exception to this generalization is the work of Yamada and Lin (1997), which provides designs for the case in which the prior probabilities are unequal.
3. THE PROPOSED CRITERIA
In this section we propose experimental design criteria that generalize the rational, interpretable objectives of Beattie et al. (2002) and are based on the assumptions from Section 2. After the data have been collected and some type of model identification procedure (e.g., stepwise regression) has been performed, an important outcome of this process is the set of terms estimated to be important. We define [^.[delta]] as a k = 1 + p + p(p - 1)/2 vector, with elements equal to 1 if the analysis procedure identified that term as important and equal to 0 otherwise.
In this article we focus on the expected value of experimentation only as it relates to model selection. A general formulation of the expected value is
f([xi]) = E{V[[delta], [^.[delta]]([xi])]}, (5)
where V is the value function, [xi] is the experimental design, and the expectation is over the distributions of the true model parameters [delta] and [beta] and the random errors [epsilon].
We further focus on special cases of the value function in (5) that relate only to the identification of main effect terms. We define three random variables as functions of the elements of [delta] and [^.[delta]] used in our expressions for the precise criteria considered. Let A denote the actual number of important main effects in the true model associated with a given simulated experimentation and analysis, that is, A [equivalent to] [[summation].sub.i=1,...,p] [[delta].sub.i]. Let M denote the number of main effects in the true model that were missed, that is, M [equivalent to] A - [[summation].sub.i=1,...,p] [[delta].sub.i][^.[delta].sub.i]. Finally, let O be the number of extra terms identified in the simulation to be important but that actually were unimportant, that is, O [equivalent to] [[summation].sub.i=1.sup.p] [^.[delta].sub.i] - [[summation].sub.i=1.sup.p] [[delta].sub.i][^.[delta].sub.i].
We propose four criteria for analysis and generation of supersaturated designs, which are functions of [delta] and [^.[delta]]([xi]) through the random variables A, M, and O, that is, V[[delta], [^.[delta]]([xi])] = V[A([delta], [^.[delta]]), M([delta], [^.[delta]]), O([delta], [^.[delta]])]. First, we define the probability of correct selection, [p.sub.CS], to be the probability that the analysis correctly identifies which effects are important and which effects are unimportant. Formally, [p.sub.CS] is defined as
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (6)
Second, the probability of covering the set of all important main effects by the set of identified main effects, [p.sub.COV], is
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (7)
The third criterion that we propose is the average model size, d, formally defined as
d([xi]) = E{[V.sub.3][A, M, O]} with [V.sub.3][A, M, O] = 1 + A - M + O. (8)
The final criterion that we propose is the power or probability of identifying any given important factor, w. This can be written as
w([xi]) = E{[V.sub.4][A, M, O]} with [V.sub.4][A, M, O] = (A - M)/A. (9)
The first three criteria in (6), (7), and (8) are similar to the criteria used by Beattie et al. (2002) to investigate the performance of supersaturated designs in the context of known true models. The final criterion (suggested by a reviewer) is equivalent to the power in hypothesis testing. Oehlert and Whitcomb (2000) investigated the power criterion in the context of several other types of experimental designs.
For the general case and barring additional analytical results, all of these criteria must be estimated using some form of Monte Carlo or quasi-Monte Carlo simulation (see, e.g., Atkinson and Fedorov 1975; Pukelsheim and Rosenberger 1993 for related discussions and approximate criteria that can be evaluated analytically). In principle, each of these criteria could be evaluated for any supersaturated design and analysis combination, for example, Lin (1993) designs applied followed by the SSVS from George and McCulloch (1993). However, to achieve two-decimal point accuracy for the probabilities [p.sub.CS] and [p.sub.COV], typically more than 10,000 Monte Carlo simulations of the entire analysis process are needed. At present, computational power permits investigation of results only from stepwise regression discussed by Lin (1993), Wang (1995), and Westfall et al. (1998). We base our investigations on the "forward" stepwise procedure from Neter, Kunter, Nachtsheim, and Wasserman (1996).
For completeness, we briefly review the application of Monte Carlo simulation based on N random samples to simultaneously estimate all of the proposed criteria for a given experimental design. First, N vectors [delta] are sampled using the probabilities in (3). Then, N sets of vectors [beta] and [epsilon] are sampled from the distributions in (1) and (4). From these samples, N response vectors, y, are calculated. Stepwise regression is applied to each response vector, and the values of A, M, and O and then [V.sub.1], [V.sub.2], [V.sub.3], and [V.sub.4] are calculated based on (6)-(9). Averages of the resulting N observations of [V.sub.1], [V.sub.2], [V.sub.3], and [V.sub.4] provide estimates of the quantities [p.sub.CS], [p.sub.COV], d, and w.
The application of this stepwise regression involves selecting the critical values of the F statistic for entering and removing variables, [[alpha].sub.forward] and [[alpha].sub.backward]. Results of Westfall et al. (1998) suggest that in the context of the supersaturated designs that they considered, the option of removing variables associated with [[alpha].sub.backward] played little, if any, role in the analysis. Therefore, to simplify the discussion, we restrict attention to the case [[alpha].sub.forward] = [[alpha].sub.backward] = [alpha].
4. EVALUATION OF EXISTING METHODS
In this section we propose plots based on the criteria in Section 3 and the assumptions in Section 2 for supporting decision making related to planning supersaturated experiments. As implied by Westfall et al. (1998), the risks associated with applying supersaturated designs may be unacceptable even if the supersaturated designs have desirable properties, such as average [s.sup.2] optimality. The plots proposed here are intended to calibrate practitioner's expectations and aid in decisions, for example, whether the number of runs is sufficient and which [alpha] to use in stepwise regression. We discuss the potential usefulness of these plots in relation to an example from the design literature.
The proposed plots for a given supersaturated design derive from a [3.sup.7-1] fractional factorial simulation experiment. The seven factors in the [3.sup.7-1] experiment are the parameters needed for precise specification of the assumptions in Section 2, which are varied over the levels given in Table 1. We chose these levels to cover the range of plausible scenarios. For each experimental run, the values of [p.sub.CS], [p.sub.COV], d, and w are estimated using N = 4,000 Monte Carlo simulations. Then the proposed plots are simply the standard main effects and interaction plots associated with this [3.sup.7-1] experiment. To limit the number of factors, we fix [p.sub.00] to equal .5[p.sub.10], where both parameters are defined by (3).
Figure 1 shows the main effects plot derived from the n = 14 run, p = 23 factor design proposed by Lin (1993) and discussed in the context of the Williams (1968) data by Yamada and Lin (1997). Westfall et al. (1998), and others. The plot confirms the conclusion from Westfall et al. (1998) and Abraham et al. (1999) that correct identification of the four active effects was highly fortuitous using this design. Figure 1 shows that averages of the correction selection probability, [p.sub.CS], in our [3.sup.7-1] experiment for all levels of all factors are less than 3%. We conclude that users of a design with n = 14 runs and p = 23 factors should not expect to achieve correct selection in realistic scenarios.
Figure 1 also shows that most of the changes in the assumptions have only small effects (<10%) on all of the criteria. Exceptions include the relatively large main effects of the parameters [alpha] and [p.sub.0] associated with the criteria [p.sub.COV], d, and w.
The stepwise parameter [alpha] is unique in that the experimenter chooses it. Predictably, as [alpha] is increased, the coverage probability, [p.sub.COV], and the power, w, increase, but so does the model size d. We suggest that by quantifying these trade-offs, the plots may help a decision makers select the appropriate [alpha] for his or her situation. For example, the practitioner may require the power to identify important effects greater than 60%. Also, because of the costs associated with follow-up experiments, the practitioner may require an expected model size smaller than 12. With these constraints, the plot indicates that a value of [alpha] approximately equal to .15 is appropriate.
[FIGURE 1 OMITTED]
The parameter [p.sub.0] is the probability that any given factor is important in the true model. Not surprisingly, as [p.sub.0] increases, performance degrades and coverage and power both decrease. This confirms similar results of Westfall et al. (1998) and Abraham et al. (1999). In fact, with [p.sub.0] = .4, the performance is comparable to an arbitrary policy of randomly selecting eight factors and declaring them important (which requires no experimentation). For this random policy, the average model size, d, equals 9 (8+1 for the constant term), the power, w, equals .35 (8/23), and the coverage is approximately 0 (calculated using the hypergeometric distribution conditioned on the binomially distributed number of important factors). The values shown in Figure 1 for [p.sub.0] = .4 are d = 9.3, w = .47, and [p.sub.COV] = .01. This implies only a marginal benefit in expected performance from performing a 14-run experiment and applying stepwise regression.
Figure 2 shows that the parameters in general have negligible interactions related to their effects on the coverage probabilities for the n = 14 and p = 23 Lin (1993) design. Plots for other criteria also indicate little information beyond that provided by the main effects plot in Figure 1.
[FIGURE 2 OMITTED]
The practitioner might be unsatisfied with the performance of the n = 14 run design described in plots in Figures 1 and 2 and wish to consider designs with larger numbers of runs. Figure 3 shows the main effects plot for a [3.sup.7-1] experiment on an n = 22 and p = 23 Lin (1993) design. Note that the 22-run design dominates the model identification-related performance of the 14-run design; that is, the selection probabilities [p.sub.CS], [p.sub.COV], and w are all higher, and the average model size. d, is lower. The practitioner might also wish to consider a design that is not a half fraction of a Plackett-Burman design, such as the optimal designs proposed in the next section.
5. NEW SUPERSATURATED DESIGNS
Optimization of any of the criteria [p.sub.CS], [p.sub.COV], w, or d in (6)-(9), alone or in combination, could be used to generate experimental designs. We concentrate on maximizing the coverage probability, [p.sub.COV], defined in (7), for two reasons. First, coverage probability corresponds directly to the goal of not losing important factors, which may be considered primary. Second, we discovered empirically that the designs that maximize the coverage were generally competitive for other criteria, including average model size, which we demonstrate in Section 6.
[FIGURE 3 OMITTED]
Therefore, we focus on designs that derive from the formulation
[MATHEMATI CAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (10)
where X is the set of possible n-run, m-factor experimental designs. The random coefficients [[beta].sub.i], experimental random errors [[epsilon].sub.i], and indicator function V were defined in Sections 2 and 3. An added restriction of potential interest might be that X should include only balanced, two-level experimental designs.
Because V is not continuous, the set X is discrete, and [p.sub.COV] must be evaluated using some form of numerical integration, (10) defines a "general discrete stochastic optimization" (DSO) or "simulation optimization" problem (see, e.g., Andradottir 1998). Several authors have proposed methods to solve general DSO problems, including Bernshteyn (2001) and Gong, Ho, and Zhai (1999). We selected the method of Bernshteyn because it yielded more impressive computational results than the approach of Gong et al. (1999). The Bernshteyn algorithm was able to find each of the average [s.sup.2] optimal designs listed in Table 2 in less than 5 minutes using a Pentium III 833-MHz machine. The optimality was verified using the bounds and computational results of Nguyen (1996) and Tang and Wu (1997).
We use the default assumptions given in the right column of Table 1 to generate the designs in Figure 4. We believe that these assumptions and the coverage probability objective ([p.sub.COV]) might be reasonable for cases in which the experimenter desires to use a small amount of data to roughly screen a large number of factors, most of which probably have no effect over the levels studied. The choice of using a purely model identification-related objective implies an intent to ignore the collected data after the subset containing the important factors has been identified. It is expected that additional experiments would then be performed, varying only the reduced set of factors with the goal of system optimization.
Both of the designs shown in Figure 4 were generated using 10 hours of run time on a Pentium III 833-MHz machine. For generation, we concentrated on designs with few more factors than runs, because these designs offer a potentially acceptable probability of coverage (see Sec. 6) and sufficient run economy that they might be preferred over fractional factorial designs.
One potential advantage of direct maximization of any of our criteria is the ability to produce unbalanced designs such as those shown in Figure 4. The standard formulation of average [s.sup.2] optimality does not permit evaluation of unbalanced designs (Booth and Cox 1962). We imagine that these designs might be of interest to practitioners who find it expensive to achieve specific levels of certain factors and would like to minimize the experimental cost. We also generated balanced designs that maximized [p.sub.COV], and we include these in the comparisons in Section 6.
6. COMPARISONS
In this section we compare several of the proposed designs, including those shown in Figure 4, with alternatives from the literature. Table 2 lists the probability of correct selection [p.sub.CS], the probability of coverage [p.sub.COV], the average model size d, and the power w for designs produced using alternative criteria and for different numbers of runs and factors. Each data point is derived from 200,000 Monte Carlo simulations of the experimentation and analysis processes. The Lin (1993) designs were produced using half fractions of Plackett-Burman arrays. The optimization method of Bernshteyn (2001) was used to produce all other designs, including optimization of the Booth and Cox (1962) average [s.sup.2] criterion and the [D.sub.2] criterion proposed by Wu (1993).
The numbers shown in the table are putative; thus designs with higher objective values may well exist. Exceptions are the n = 6 run and p = 10 factor and n = 8 run and p = 14 factor average [s.sup.2] optimal designs. For these designs, we confirmed that our solutions were globally optimal for average [s.sup.2] using formulas of Nguyen (1996) and Wu (1993). All evaluations are based on the default assumption in Table 1. We varied the stepwise parameter [alpha], because it pertains to a decision made by the experimenter after the runs are performed and thus is not an intrinsic property of the design.
Table 2 supports three findings. First, it is perhaps remarkable how similar the performances are for the designs produced in different ways. Coverage probabilities are within 5% of each other for most designs of the same number of runs and factors. Second, because the proposed designs were produced by maximizing [p.sub.COV], by definition they must have the highest coverage probabilities in Table 2. Nonetheless, the values of the other criteria for these designs were still comparable or superior to the alternatives listed. For example, the n = 10 run and p = 11 factor-unbalanced [p.sub.COV] design achieves a 5% higher coverage probabilities than all alternatives with the stepwise [alpha] = .05 and an 8% higher coverage probability with [alpha] = .25. Moreover, for [alpha] = .25, the average model size is among the lowest (7.5). This demonstrates that gains in coverage probability do not necessarily come at the expense of larger model sizes. This is important in part because it implies that even if the experimenter has multiple objectives, a multicriterion formulation may not be necessary; that is, at least some designs that maximize [p.sub.COV] have [p.sub.CS], d, and w values close in some sense to their respective single criterion optimal or "utopia" values.
Third, the average [s.sup.2] criteria should not be applied singly. All of the balanced designs generated to maximize [p.sub.COV] were apparently average [s.sup.2] optimal. Yet the second n = 8 run and p = 14 factor average [s.sup.2] optimal design in Table 2, produced by applying the Lin (1993) construction method to a 16-run Plackett-Burman array, performed substantially worse with respect to all selection probabilities.
7. CONCLUSIONS
In this article we have proposed experimental design criteria for generating and evaluating supersaturated designs. The main attraction of these criteria is their interpretability and direct correspondence with the stated goals of supersaturated designs, for example, a high probability of identifying all of the important factors ([p.sub.COV]). We showed how main effects plots of these criteria can clarify the expected performance for a range of assumptions about the system being studied. We also applied simulation optimization to generate new designs from one of the proposed criteria, which was the probability that the factors identified by stepwise regression include or "cover" all important factors ([p.sub.COV]). We noted that optimization based on any of the proposed criteria permits generation of unbalanced designs that might be preferable in cases where levels of certain factors were associated with high costs. Finally, we compared designs derived from maximizing the coverage probability with alternatives optimizing other criteria from the experimental design literature. Our comparison indicated that maximizing the coverage probability produces designs with desirable values for coverage probability and for other criteria as well.
Table 1. Parameters Needed for Evaluation of the Proposed Criteria,
Levels Used in the Main Effect and Interaction Plots, and the Default
Settings Used for Design Generation
Levels for plots
Parameter Description 1 2 3 Default
c[sigma][tau] Standard deviation 1.0 3.0 5.0 3.0
of important factors
[sigma][tau] Standard deviation of 0 .25 .50 .3
unimportant factors
T Threshold value 0 1.0 2.0 1.0
[alpha] Type I errors for adding .05 .15 .25 .05
and removing in stepwise
[p.sub.0] Probability a factor is .10 .25 .40 .25
active
[p.sub.11] Interaction probability 0 .25 .50 .50
assuming that both factors
are important
[p.sub.10] Interaction probability 0 .05 .10 .01
assuming that one factor is
important
(a)
Run A B C D E F G
1 - + - + + + -
2 + - - - - + +
3 - - + + - + +
4 - + + - - + +
5 + + + - + + +
6 + - - + + - +
(b)
Run A B C D E F G H I J K
1 + + - - + - - - - - -
2 + + + - + - + + - - +
3 + + - - + + + + - - +
4 + - - - + - - + - - +
5 + + - + + - + + - - +
6 + + + - + - - - - - +
7 - + - - + - + + - - +
8 - + - - - - - + + - +
9 + + - - + - + - + - +
10 + + - - + - + + - + +
Figure 4. Unbalanced Supersaturated Designs Selected to Maximize
[p.sub.COV].
Table 2. Criteria Values for Alternative Designs With Standard Errors
for Estimates of Probabilities Less Than .002 and for Average Model
Size Less Than .005
[alpha] = .05
Runs Factors Criterion Balance Average [p.sub.CS]
[s.sup.2]
6 7 Avg. [s.sup.2] Y 4.000 .322
6 7 [D.sub.2] Y 4.000 .324
6 7 Lin Y 4.000 .322
6 7 [p.sub.COV] Y 4.000 .324
6 7 [p.sub.COV] N .320
6 10 Avg. [s.sup.2] Y 4.000 .156
6 10 [D.sub.2] Y 4.000 .157
6 10 Lin Y 4.000 .157
6 10 [p.sub.COV] Y 4.000 .157
6 10 [p.sub.COV] N .152
8 9 Avg. [s.sup.2] Y 3.556 .254
8 9 [D.sub.2] Y 3.556 .253
8 9 [p.sub.COV] Y 3.556 .254
8 9 [p.sub.COV] N .256
8 14 Avg. [s.sup.2] Y 4.923 .069
8 14 Avg. [s.sup.2*] Y 4.923 .043
8 14 [D.sub.2] Y 4.923 .070
8 14 [p.sub.COV] Y 4.923 .070
8 14 [p.sub.COV] N .070
10 11 Avg. [s.sup.2] Y 4.000 .190
10 11 [D.sub.2] Y 4.000 .191
10 11 Lin Y 6.327 .171
10 11 [p.sub.COV] Y 4.000 .191
10 11 [p.sub.COV] N .201
10 16 Avg. [s.sup.2] Y 5.867 .051
10 16 [D.sub.2] Y 5.867 .050
10 16 Lin Y 5.867 .050
10 16 [p.sub.COV] Y 5.867 .051
10 16 [p.sub.COV] N .056
[alpha] = .05 [alpha] = .25
Runs [p.sub.COV] w d [p.sub.CS] [p.sub.COV] w d
6 .483 .694 1.91 .069 .656 .846 3.97
6 .483 .694 1.91 .059 .674 .859 4.15
6 .484 .694 1.92 .069 .656 .846 3.97
6 .484 .694 1.91 .070 .659 .847 3.97
6 .490 .713 2.02 .072 .701 .873 4.17
6 .270 .544 1.98 .009 .414 .732 4.74
6 .270 .544 1.98 .009 .415 .733 4.74
6 .272 .545 1.98 .009 .416 .733 4.74
6 .272 .545 1.98 .009 .416 .733 4.74
6 .276 .570 2.21 .018 .408 .741 4.62
8 .454 .677 2.33 .007 .721 .904 6.24
8 .455 .679 2.34 .007 .715 .901 6.14
8 .458 .680 2.35 .007 .717 .901 6.14
8 .464 .725 2.58 .024 .749 .919 6.00
8 .168 .467 2.50 0 .276 .693 6.94
8 .076 .384 2.05 .002 .126 .593 5.90
8 .169 .467 2.50 0 .280 .695 6.94
8 .169 .467 2.50 0 .280 .695 6.94
8 .177 .511 2.76 .001 .300 .719 6.83
10 .387 .646 2.63 .004 .643 .885 7.78
10 .387 .645 2.61 .003 .660 .895 8.13
10 .374 .641 2.70 .015 .543 .820 6.31
10 .387 .645 2.61 .004 .643 .885 7.78
10 .436 .732 3.17 .013 .723 .917 7.50
10 .149 .464 2.91 0 .266 .723 8.92
10 .148 .464 2.91 0 .266 .722 8.89
10 .148 .463 2.91 0 .264 .720 8.85
10 .149 .464 2.91 0 .266 .723 8.92
10 .168 .534 3.32 .001 .316 .759 8.69
ACKNOWLEDGMENTS
We thank the anonymous reviewer and the associate editor for helpful suggestions. We also thank Bill Notz for his ideas and encouragement and Angela Dean for helpful discussions. Finally, we thank Liyang Yu for his many contributions.
[Received May 2001, Revised July 2002.]
REFERENCES
Abraham, B., Chipman, H., and Vijayan, K. (1999), "Some Risks in the Construction and Analysis of Supersaturated Designs." Technometrics, 41, 135-141.
Andradottir, S. (1998), "A Review of Simulation Optimization Techniques," in Proceedings of the 1998 Winter Simulation Conference, pp. 151-158.
Atkinson A. C., and Fedorov, V. V. (1975), "Optimal Design: Experiments for Discriminating Between Several Models," Biometrika, 62, 289-303.
Beattie, S. D., Fong, D. K. H., and Lin, D. (2002), "A Two-Stage Bayesian Model Selection Strategy for Supersaturated Designs," Technometrics, 44. 55-63.
Berger, J. O., and Pericchi, L. R. (1996). "The Intrinsic Bayes Factor for Model Selection and Prediction." Journal of the American Statistical Association, 91, 102-122.
Bernshteyn, M. (2001). "Simulation Optimization Methods That Combine Multiple Comparisons and Genetic Algorithms with Applications in Design for Computer and Supersaturated Experiments," unpublished doctoral dissertation, Ohio State University.
Booth, K. H. V., and Cox, D. R. (1962), "Some Systematic Supersaturated Designs," Technometrics, 4, 489-495.
Box, G. E. P., and Meyer, R. D. (1986). "An Analysis for Unreplicated Fraction Factorials," Technometrics, 28, 11-18.
Chipman, H. (1996), "Bayesian Variable Selection With Related Predictors," The Canadian Journal of Statistics, 24, 17-36.
George, E. I., and McCulloch, R. E. (1993), "Variable Selection via Gibbs Sampling," Journal of the American Statistical Association, 88, 881-889.
Gong, W. B., Ho, Y. C., and Zhai, W. (1999), "Stochastic Comparison Algorithm for Discrete Optimization With Estimation." SIAM Journal of Optimization, 10, 384-404.
Lewis, S. M., and Dean, A. M. (to appear), "Detection of Interactions in Experiments With Large Numbers of Factors," Journal of the Royal Statistical Society, Ser. B, 63, 633-672.
Li. W. W., and Wu, C. F. J. (1997), "Columnwise-Pairwise Algorithms With Applications to the Construction of Supersaturated Designs," Technometrics, 39, 171-179.
Lin, D. K. J. (1993), "A New Class of Supersaturated Designs," Technometrics, 35, 28-31.
Neter, J., Kunter, M. H., Nachtsheim, C. J., and Wasserman, W. (1996). Applied Linear Statistical Models, Chicago: Irwin.
Nguyen, N. K. (1996), "An Algorithmic Approach to Constructing Supersaturated Designs," Technometrics, 38, 69-73.
Oehlert, G., and Whitcomb, P. (2001), "Sizing Fixed Effects for Computing Power in Experimental Designs," Quality and Reliability Engineering International, 17, 291-306.
Pukelsheim, F., and Rosenberger, J. L. (1993), "Experimental Designs for Model Discrimination," Journal of the American Statistical Association, 88, 642-649.
Satterthwaite, F. (1959), "Random Balance Experimentation" (with discussion), Technometrics, 1, 111-137.
Tang, B., and Wu, C. F. J. (1997), "A Method for Constructing Supersaturated Designs and Its E[s.sup.2] Optimality," The Canadian Journal of Statistics, 25, 191-201.
Wang, P. E. C. (1995), Comments on "A New Class of Supersaturated Designs," by D. K. J. Lin, Technometrics, 37, 358-359.
Westfall, P. H., Young, S. S., and Lin, D. K. J. (1998), "Forward Selection Error Control in the Analysis of Supersaturated Designs," Statistica Sinica, 8, 101-117.
Williams, K. R. (1968), "Designed Experiments," Rubber Age, 100, 65-71.
Wu, C. F. J. (1993), "Construction of Supersaturated Designs Through Partially Aliased Interactions," Biometrika, 80, 661-669.
Wu, C. F. J., and Hamada, M. (2000), Experiments: Planning, Analysis, and Parameter Design Optimization, New York: Wiley.
Yamada, S., and Lin, D. K. J. (1997). "Supersaturated Design Including an Orthogonal Base," The Canadian Journal of Statistics, 25, 203-213.
Theodore T. ALLEN
Industrial, Welding & System Engineering
The Ohio State University
Columbus, OH 43210
(allen.515@osu.edu)
Mikhail BERNSHTEYN
Sagata Ltd.
Montreal, QC H3X 2B5, Canada
(mberns@sagata.com)