In a recent article published in this journal, Kanninen and Khawaja note that McFadden's pseudo [R.sup.2], a standard goodness-of-fit measure for binary discrete choice models, is undefined in the case of the double-bounded logit model used in contingent valuation (CV). As a remedy to this dilemma,
The double-bound approach to contingent valuation employs a sequence of two discrete choice questions in order to elicit an individual's willingness to pay (WTP), typically for a nonmarket good or service (see, e.g., Carson; Hanemann 1985; and Hanemann, Loomis, and Kanninen). The initial question asks the survey respondent if they are willing to pay [B.sub.I] for the good. If they say "yes," the follow-up question asks if they would be willing to pay an even higher bid level [B.sub.H], whereas a "no" response to the initial question is followed up by a lower bid offer [B.sub.L] (with [B.sub.L] [less than] [B.sub.I] [less than] [B.sub.H]). This sequence of questions narrows the range in which the respondent's true WTP lies, placing it into one of four intervals: (-[infinity], [B.sub.L]), [[B.sub.L], [B.sub.I]), [[B.sub.I], [B.sub.H]), or [[B.sub.H], + [infinity]). The analyst observes the discrete variable
(1) [Mathematical Expression Omitted].
While there are several approaches to modeling the double-bounded survey responses identified in equation (1), Cameron's bid function approach clearly illustrates the issues in constructing McFadden's pseudo [R.sup.2].(1) Let
(2) WTP = f(Z, [Epsilon]; [Gamma])
denote the individual's WTP (or bid function) for the good in question, where Z is vector of observable characteristics of the individual, [Epsilon] is a random variable capturing unobservable characteristics, and [Gamma] is a vector of unknown parameters in the bid function. Assuming that f([center dot]) is linear in Z and [Epsilon] and that [Epsilon] [similar to] N(0, [[Sigma].sup.2]), equation (2) becomes:(2)
(3) WTP = [Alpha] + [Lambda][prime]Z + [Epsilon]
where [Gamma] = ([Alpha], [Lambda][prime])[prime]. Using this specification, equation (1) can be rewritten as
(4) [Mathematical Expression Omitted]
and the choice probabilities become
(5) [Mathematical Expression Omitted]
where [Phi]([center dot]) denotes the standard normal cdf,
(6) [Mathematical Expression Omitted]
(7) [Mathematical Expression Omitted]
(8) [Mathematical Expression Omitted].
The log-likelihood function used for estimation becomes
(9) [Mathematical Expression Omitted]
where [I.sub.k] is an indicator function for the event k, and the subscript i is used to denote individual i.
The standard construction of McFadden's pseudo R: for this model is given by
(10) [R.sup.2] = 1 - [L.sub.0]/[L.sub.max]
where [L.sub.max] corresponds to the unconstrained maximum log-likelihood value of L and [L.sub.0] corresponds to the maximum value of L when all the slope terms are constrained to zero (i.e., [Mathematical Expression Omitted] and [Mathematical Expression Omitted]) and only a constant term is retained. However, as Kanninen and Khawaja correctly note, [L.sub.0] is undefined for the double-bounded likelihood function in equation (9), since for the constrained model pr(D = 2) = pr(D = 3) = 0.
As a remedy to this problem, I propose the following variant on McFadden's pseudo [R.sup.2]:
(11) [Mathematical Expression Omitted]
where [Mathematical Expression Omitted] corresponds to the maximum value of L when all slope parameters, except the one on bid values, are constrained to zero (i.e., [Mathematical Expression Omitted]), thus retaining both the constant term and [Mathematical Expression Omitted]. In this case, the restricted likelihood function is well defined and [Mathematical Expression Omitted] can be estimated.
The rationale for this adjustment to McFadden's pseudo [R.sup.2] can be seen by examining the underlying bid function in equation (3). The standard goodness-of-fit question, comparable to one found in a continuous linear regression context, would be to determine the impact of setting all of the slope parameters in the bid function to zero; i.e., testing [H.sub.0]: [Lambda] = 0 (see, e.g., Maddala, p. 39). However, testing [H.sub.0]: [Lambda] = 0 corresponds to testing [Mathematical Expression Omitted], not to testing that all of the slope terms in the likelihood function are zero; i.e., [Mathematical Expression Omitted] and [Mathematical Expression Omitted]. As equation (7) indicates, restricting [Mathematical Expression Omitted] is equivalent to setting, equal to infinity.
The need for a modification to the pseudo [R.sup.2] stems in large part from the unique nature of CV data. In the standard logit setting, analysts observe only a dummy variable, indicating whether or not the underlying latent variable driving the discrete choice is greater than or equal to zero. In this setting, the variance term ([[Sigma].sup.2]) cannot be separately identified. However, as Cameron has noted, the referendum format underlying the discrete choice CV questionnaires provides cardinal information on the individual's WTP and allows for the identification of [[Sigma].sup.2]. This is manifested in the above model by the appearance of [Mathematical Expression Omitted] in the likelihood function. The issue of constraining [Mathematical Expression Omitted] does not arise in the standard logit model (or in the formation of the corresponding pseudo [R.sup.2]) because the parameter does not exist in that model.
Additional insight into this issue emerges if one begins by specifying the consumer's indirect utility function rather than their bid function (e.g., Hanemann 1984). In particular, suppose that utility is linear in income with an error term that is normally distributed. Then the change in utility that is offered by a CV referendum question with a bid of [B.sub.k] (k = L, I, H) takes the form
(12) [Delta][U.sub.k] = [[Alpha].sub.u] - [[Rho].sub.u][B.sub.k] + [[Lambda].sub.u][prime]Z + [Epsilon]
where [[Rho].sub.u] can be interpreted as the marginal utility of income, and [Epsilon] [similar to] N (0, [[Sigma].sup.2]. The choice probabilities in equation (5) emerge if we set [Mathematical Expression Omitted], and [Mathematical Expression Omitted]. In this framework, constraining [Mathematical Expression Omitted] (but not [Mathematical Expression Omitted]) to zero as one does the standard construction of McFadden's pseudo [R.sup.2] corresponds to setting the marginal utility of income to zero (i.e., [[Rho].sub.u] = 0). This suggests where the problem lies in using the standard pseudo [R.sup.2] If indeed the marginal utility of income is zero, then one should not observe a survey respondent answering "no" to an initial bid and "yes" to a follow-up bid (i.e., [D.sub.i] = 2), since money does not matter when [[Rho].sub.u] = 0. Similarly, one should not observe [D.sub.i] = 3 when [[Rho].sub.u] = 0. In fact, if no one chooses options 2 or 3 (i.e., [I.sub.[D.sub.i] = 2] = [I.sub.[D.sub.i] = 3] = 0 [for every] i), as the underlying utility function would suggest when the marginal utility of income is zero, then [L.sub.0] is defined and McFadden's pseudo [R.sup.2] can be constructed. However, once we have observed individuals with [D.sub.i] = 2 or [D.sub.2] = 3, the data are theoretically inconsistent with the model being used as the benchmark (i.e., [L.sub.0]) in the standard construction of McFadden's pseudo [R.sup.2].(3) The modified pseudo [R.sup.2] of equation (11) provides a benchmark (i.e., [Mathematical Expression Omitted]) that allows for observations with [D.sub.i] = 2 or [D.sub.i] = 3.(4)
Conclusions
This comment has provided a modified version of McFadden's pseudo [R.sup.2] for use in assessing goodness of fit in dichotomous choice contingent valuation. I argue that the benchmark against which more general models should be judged is one in which the bid function includes only an intercept. The corresponding benchmark log-likelihood ([Mathematical Expression Omitted]) maximizes
(13) [Mathematical Expression Omitted].
More general models are then judged using [Mathematical Expression Omitted] in equation (11), measuring the extent to which they improve on this minimum value of the log-likelihood function.
There are two advantages of the modified pseudo [R.sup.2] over its standard construction. First, [Mathematical Expression Omitted] parallels goodness-of-fit measures used in continuous linear regression models, measuring the extent to which more complex models go beyond providing simply a mean value for WTP. The standard pseudo [R.sup.2] employs a benchmark in which WTP is undefined. Second, the modification avoids the problem, identified in Kanninen and Khawaji, that the standard pseudo [R.sup.2] may be undefined for double- and multiple-bounded CV formats.
Finally, I note that the above arguments apply not only to the double-bounded logit model but to the single- and multiple-bounded logit models as well. The problem simply becomes transparent in the double-bounded setting because the standard pseudo [R.sup.2] becomes undefined. Even in the single-bounded setting, however, restricting [Mathematical Expression Omitted] is equivalent to setting [Sigma] equal to infinity, and it is recommended that the revised pseudo [R.sup.2] be used.
This is journal article J-17973 of the Iowa Agricultural and Home Economics Experiment Station, Project No. 3246. The author would like to thank Cathy Kling and three anonymous referees for their comments and suggestions regarding an earlier draft of this article.
1 Similar arguments regarding the pseudo [R.sup.2] can be made using models of WTP based on the consumer's indirect utility function (e.g., Hanemann 1984). See equation (12).
2 These assumptions are commonly employed in the literature and are used here to simplify the subsequent exposition. Parallel arguments apply when a logarithmic model of WTP is assumed and/or a logistic distribution is specified for the error term.
3 Indeed, as one reviewer noted, with the marginal utility of income set to zero, WTP becomes undefined. A similar problem will, of course, emerge in multiple-bounded questionnaire formats if an individual responds with anything except all no's or all yes's.
4 It should be noted that I am not arguing that analysts should ignore the size and sign of the marginal utility of income, The marginal utility of income is a key parameter of the model, and the precision with which it is estimated is critical to the formation of welfare estimates. Furthermore, one would generally not want to restrict the marginal utility of income to be strictly positive without any data. However, once an individual selects options 2 or 3, they have, within the standard Hanemann framework, revealed a nonzero marginal utility of income by changing their response in the second question as a result of a change in their income.
References
Cameron, T.A. "A New Paradigm for Valuing Non-Market Goods Using Referendum Data: Maximum Likelihood Estimation by Censored Logistic Regression." J. Environ. Econ. and Manage. 15(September 1988): 355-79.
Carson, R.T. "Three Essays on Contingent Valuation (Welfare Economics, Non-Market Goods, Water Quality)." PhD dissertation, Department of Agricultural and Resource Economics, University of California, Berkeley, 1985.
Hanemann, W.M. "Some Issues in Continuous- and Discrete-Response Contingent Valuation Studies." N.E. J. Agr. Econ. 14(April 1985): 5-13.
-----. "Welfare Evaluations in Contingent Valuation Experiments with Discrete Responses." Amer. J. Agr. Econ. 66(August 1984): 332-41.
Hanemann, W.M., J.B. Loomis, and B.J. Kanninen. "Statistical Efficiency of Double-Bounded Dichotomous Choice Contingent Valuation." Amer. J. Agr. Econ. 73(November 1991): 1255-63.
Kanninen, B.J., and M.S. Khawaja. "Measuring Goodness of Fit for the Double-Bounded Logit Model." Amer. J. Agr. Econ. 77(November 1995):885-90.
McFadden, D. "The Measurement of Urban Travel Demand." J. Public Econ. 3(1974):303-28.
Maddala, G.S., Limited-Dependent and Qualitative Variables in Econometrics. Cambridge: Cambridge University Press, 1983.
Joseph A. Herriges is associate professor of economics at Iowa State University.