In their meta-analysis investigating the relationship between extraversion and nonverbal behavior La France, Heisel, and Beatty (2004) found a substantial negative correlation between effect size and sample size, which they explained using the cognitive load hypothesis. The cognitive load hypothesis
Keywords: Coding Scheme; Cognitive Load; Meta-analysis; Nonverbal Communication
**********
In their meta-analysis investigating the relationship between self reported extraversion and observer reports of nonverbal codes associated with extraversion (e.g., eye contact, proximity, gestures, smiling, etc.), La France, Heisel, and Beatty (2004) found a substantial negative correlation between effect size and sample size (r = -.40). They argued that this relationship may be an indicator of an underlying methodological artifact, which they termed cognitive load. Their cognitive load hypothesis posited that as coding schemes become increasingly complex observer error increases as well. Increasing sample size, they argued, contributes to the complexity of coding communicative behavior, which leads to more error and attenuated effect sizes. They also provided evidence revealing negative correlations between sample size and effect size in other meta-analyses (average weighted r = -.35). (1)
Quantitative content analysis is an important method with increasing popularity among social scientists including communication scientists (Riffe, Lacy, & Fico, 1998). Perhaps not surprisingly, over time classification systems have become more complex. For example, earlier studies investigating the relationship between nonverbal communicative behavior and personality variables had observers code relatively few (i.e., five or fewer) nonverbal behaviors (Kendon & Cook, 1969; Mallory & Miller, 1958; Mobbs, 1968; Pedersen, 1973; Ramsay, 1966; Steer, 1974). More recent personality and nonverbal communication research, however, has increased the number of nonverbal cues observers are counting. For example, Lippa (1998) had observers rate over 30 nonverbal behaviors. Riggio and Friedman (1983) had raters code 29 nonverbal cues and in a later study had coders record incidences of 27 nonverbal behaviors (Riggio & Friedman, 1986). Berry and Hansen (2000) had coders rate 17 nonverbal behaviors. Although the decision to have observers code many rather than few behaviors may be consistent with hypothesis testing, it comes at a methodological--and theoretical--price. Indeed, Riffe et al. note that as coding schemes increase in their complexity coder errors are likely to increase (p. 107). The current study was designed to test this assertion.
Types of Content
Burgoon and Baesler (1991) define microscopic nonverbal behavior measurement as the observation of single concrete behaviors that are event-based or that occur during relatively short time intervals. Alternatively, macroscopic measurement typically involves a compilation of nonverbal behaviors that are more abstract and occur over an extended time period or events. They state that there are conceptual and methodological benefits and consequences to each type of measurement. Riffe et al. (1998) distinguish between manifest content and latent content in content analysis. Manifest content refers to easily recognized phenomena--phenomena that can be counted easily. For example, noticing whether a speaker uses a vocal segregate (e.g., um) would be easily recognized by an observer. By contrast, latent content exists when "the meanings embedded in the content [must be] interpreted by some observer" (p. 107). Asking observers to determine the degree to which a speaker is extraverted, psychotic, or neurotic for example requires that coders recognize and interpret communicative behavior. The lens through which such interpretation occurs includes a variety of assumptions that are specific to the individual. Riffe et al. argue that analyzing latent content is more complex and often times leads to more interrater disagreements.
Number of Nonverbal Cues Coded
In addition to types of content, coding scheme complexity can be varied in a number of ways. Increasing the length of coding sessions raters must observe behavior and increasing the number of behaviors coded are two ways in which scheme complexity can be altered. This study employed both types of design characteristics to vary coding scheme complexity. In studies examining nonverbal communicative behavior, observers have coded a variety of nonverbal cues. Examples of nonverbal cues used in classification schemes include ocular behavior (e.g., establishing eye contact), facial behavior (e.g., eyebrow flashes), kinesics (e.g., gestures while speaking), vocalics (e.g., speech rate), proxemics (e.g., spatial closeness) and haptics (e.g., self touch). La France et al. (2004) demonstrated that effect sizes in studies assessing the relationship between extraversion and nonverbal behavior decreased dramatically as the number of nonverbal cues coded exceeded five. Less cognitive effort is necessary to observe relatively few nonverbal behaviors (e.g., vocal segregates and establishing eye contact) than is required to record instances of many nonverbal cues (e.g., vocal segregates, smiling, establishing eye contact, breaking eye contact, self touch, and using hands to gesture). Consequently, as Riffe et al. (1998) argue, the impact of increasing coding scheme complexity leads to increases in discrepancies in raters' judgments. Specifically, hypothesis 1 predicts a main effect for the number of nonverbal cues coded on observer error such that coder errors will be largest for those research participants recording instances of eight nonverbal cues compared to observers counting two nonverbal cues.
Length of Coding Session
The number of nonverbal cues coded is one design characteristic that can be varied to change coding scheme complexity. Another methodological feature that impacts classification scheme complexity is the duration that coders are asked to observe behaviors. The length of the coding session during which observers scrutinize behaviors influences the complexity of content analytic classification schemes. More cognitive effort is needed to code behavior over extended periods of time. Campbell and Stanley (1966) note that observers may suffer from coder drift. As the duration of coding sessions increases, the probability of coder drift occurring increases as well. Thus, research participants counting nonverbal cues during longer sessions will generate more errors than observers coding behavior during shorter sessions. Accordingly, hypothesis 2 posits a main effect for length of coding session on observer error. Individuals will make more errors counting nonverbal behaviors for 10 minutes than observers scrutinizing behaviors for 2 minutes. Considering both main effects on error rates it is expected that observers coding eight behaviors during the 10-minute session will have the highest error rates, and coders observing two nonverbal cues during the 2-minute coding session will experience the lowest error rates.
Fatigue
Campbell and Stanley (1966) note that instrumentation is a classic threat to internal validity. They argued that as instruments (classification schemes in the case of content analysis) become complex human observers will suffer from fatigue, and fatigue produces changes in observations. As a threat to internal validity, these changes occur because of the classification scheme not because of the independent variable(s) being induced. Thus, hypothesis 3 posits a main effect for the number of nonverbal cues coded on fatigue. Coders who observe eight nonverbal cues will report greater feelings of fatigue than will research participants who count two nonverbal cues. Additionally, hypothesis 4 predicts a main effect for length of coding session on fatigue; observers who participate in the 2-minute coding session will feel less fatigue than individuals in the 10-minute coding session. Given these posited main effects, it is anticipated that participants observing eight nonverbal cues for 10 minutes will report the highest levels of fatigue, and observers recording two nonverbal cues for 2 minutes will record the lowest levels of fatigue.
The cognitive load hypothesis suggests that observer fatigue will lead to observer error (La France et al., 2004; Riffe et al., 1998). Therefore, a substantial and significant positive correlation between observer fatigue and observer error is hypothesized (hypothesis 5).
Method
Respondents
One hundred twelve undergraduate students from various communication courses at a large public Midwestern university participated in this study in exchange for extra course credit; this research was approved by the Institutional Review Board at the university. On average, participants were 22 years old (Mdn = 21, SD = 2.92), and few participants (15%) reported that they had previous experience with coding communicative behavior. Both men (45%) and women (55%) comprised this sample. Twenty-eight participants were assigned randomly to one of four experimental conditions.
Design
The experiment was a 2 (number of nonverbal cues coded: two codes, eight codes) x 2 (length of coding session: 2 minutes, 10 minutes) independent groups design. Two videotaped student presentations were used for this study. The first videotape was utilized to train research participants, and it displayed the initial 2 minutes of a student giving a persuasive class presentation. The second videotape, which served as the stimulus for this experiment, showed another student giving a persuasive class presentation. Depending on the experimental condition, research participants were shown either a 2-minute or 10-minute segment of this presentation. Participants in the eight-code condition recorded instances of the following nonverbal codes: vocal segregates, laughs, mispronounce words, smiles, establishing eye contact, breaking eye contact, self touch, and using hand gestures. These nonverbal cues are representative of the nonverbal behavior included in the literature investigating personality variables (see La France et al., 2004). In the two-codes condition, participants recorded the number of times they heard vocal segregates and saw the speaker establish eye contact. Vocal segregates and establishing eye contact were chosen randomly from the list of eight nonverbal cues.
Materials and Procedures
A packet of materials was distributed to participants. This packet included an informed consent form--which included a detailed description of the experiment-coding sheets, questions about individuals' interaction proclivities, and items soliciting demographic information. Participants were instructed that they would view a videotape (i.e., stimulus videotape) of a student's persuasive class presentation during which they were to record every instance of a particular nonverbal cue using a simple hash mark. To practice, participants were told that they would initially view a short training videotape of another student's persuasive class presentation. Participants were given examples of all nonverbal codes prior to viewing either videotape, and all questions or clarifications were addressed.
Two coding sheets were distributed to research participants. One coding sheet was completed for the training presentation, and the second coding sheet was utilized for the stimulus presentation. During each presentation, individuals recorded the number of times that a nonverbal behavior occurred. The coding sheets included a separate row for each nonverbal cue. A separate column for participants to total their marks after viewing each presentation was provided on both coding sheets. The coding sheets provided ample space for individuals to record easily the counts of nonverbal communicative behaviors. After viewing both presentations, participants responded to the fatigue measure.
Fatigue
Four 5-point semantic differential items measured the degree to which participants felt fatigue during the experiment (exhausting/energizing, boring/stimulating, tedious/challenging, dreary/uplifting); a higher score indicated greater fatigue. Generally, participants felt minimally fatigued (M = 3.06, SD = .80). This measure was reliable ([alpha] = .77). This measure was also valid. Confirmatory factor analysis assesses the dimensionality of a measure or measures (Hunter & Gerbing, 1982). Expected inter-item correlations are calculated based on the product rule and are compared to obtained inter-item correlations. Substantial errors (i.e., large deviations between expected and observed correlations) indicate that the measurement model needs to be revised. These four items were consistent with unidimensional model.
Error rate
Participants' observed scores were calculated by summing the reported occurrences of each nonverbal code. This procedure was performed for the training presentation and the stimulus presentation. Therefore, every participant had two observed scores for each nonverbal cue, one score for the training presentation and one score for the stimulus presentation. True scores were computed by recording the actual number of incidences of the specific nonverbal behavior that appeared on the training presentation and the stimulus presentation. Two true scores were calculated for every nonverbal cue, one true score for the training presentation and one true score for the stimulus presentation. To calculate the error rate, participants' observed scores were subtracted from true score; the result was divided by the true total number of incidences of nonverbal behavior. Positive scores indicate that underestimation errors occurred; negative scores reveal that overestimation errors occurred. Unless where noted, error rates refer to the stimulus presentation only.
Results
On average, participants failed to recognize over one third of the actual nonverbal behaviors performed (M= .39, SD = .20, Mdn = .41, Min. = -.21, Max.- .80). (2) Frequency analyses revealed that the error rate variable was negatively skewed (skew- .42) indicating that most participants (97%) underestimated the incidences of nonverbal behaviors. As might be expected, there was a significant and substantial positive correlation between the error rate for the training presentation and the error rate for the stimulus presentation (r(112) = .62, p < .001). (3) The correlations between all constructs are presented in Table 1.
Error Rates: Testing Hypotheses 1 and 2
To test the first two hypotheses, a two-way ANCOVA was performed using the number of nonverbal cues coded and length of coding session as the independent variables and error rate of the stimulus presentation as the dependent variable. Given the substantial correlation between the error rate for the training presentation and the error rate for the stimulus presentation, error rate of the training presentation was entered as a covariate. Hypothesis 1 predicted a main effect for the number of nonverbal behaviors coded on observer error; it was expected that individuals coding eight nonverbal behaviors would have higher error rates than coders counting two nonverbal cues. Hypothesis 2 posited a main effect for the length of coding session on observer error. Observers in the 10-minute experimental condition were hypothesized to generate more errors than raters in the 2-minute condition. Accordingly, observers coding eight behaviors during the 10-minute session were expected to have the highest error rates, and coders observing two nonverbal cues during the 2-minute coding session were anticipated to have the lowest error rates. There was a main effect for the covariate (F(1,107) = 12.56, p < .001, [[eta].sup.2] = .06) such that individuals who had high error rates during the training presentation also had high error rates while viewing the stimulus presentation. Consistent with hypothesis 1, there was a main effect for the number of nonverbal cues coded (F(1,107) = 18.76, p < .001, [[eta].sup.2] = .09). This result indicates that error rates were significantly and substantially higher for participants who recorded eight behaviors (M = .52, SD = .13) compared to those participants who recorded two behaviors (M = .26, SD =. 18). The main effect for length of coding session on observer error was not substantial (F(1,107) = .29, p > .05, [[eta].sup.2] = .001) nor was the number of codes x length of coding session interaction significant (F(1,107) = .34, p >. 05, [[eta].sup.2] = .002). Thus, these data are consistent with hypothesis 1 but inconsistent with hypothesis 2. Coding many (i.e., eight) nonverbal cues led to significantly higher error rates than did coding fewer (i.e., two) nonverbal cues. The length of coding session did not have an impact on error rates.
To determine if perhaps the type of nonverbal cue coded explained the impact of the number of nonverbal cues coded on error rate, nonverbal cues that were consistent across experimental conditions were examined separately. A two-way ANCOVA using the number of nonverbal cues coded and length of coding session as the independent variables and the error rate for establishing eye contact as the dependent variable was conducted. Because the error rate for eye contact for the training presentation was substantially and positively correlated with the error rate for eye contact for the stimulus presentation (r(112) = .27, p < .05), the former variable was entered as a covariate. Results indicated that there was a main effect for the covariate (F(1,107) = 12.81, p < .01, [[eta].sup.2] = .08) demonstrating that participants' initial error rates for establishing eye contact predicted their error rates for establishing eye contact for the stimulus presentation. There was also a main effect for the number of nonverbal cues coded on observer error (F(1,107) = 46.53, p < .001, [[eta].sup.2] = .28), which revealed that error rates for eye contact were substantially higher for participants coding eight nonverbal cues (M= .58, SD= .19) than error rates for participants counting two nonverbal cues (M = .33, SD = .23). The main effect for length of coding session on observer errors was not significant (F(1,107)= .73, p > .05, [[eta].sup.2] = .004), and the number of nonverbal cues coded x length of coding session interaction was not significant (F(1,107)= 1.04, p> .05, [[eta].sup.2] =.006). Research participants had higher error rates for eye contact when coding eight cues compared to two cues. Length of coding session did not impact error rates. These results mirror the previous ANCOVA findings, which used the error rate for all nonverbal cues coded as the dependent variable.
A two-way ANCOVA was performed using vocal segregate, the other nonverbal cue which was consistent across all experimental conditions, as the dependent variable and the number of nonverbal cues coded and length of coding session as the independent variables. Error rates for vocal segregates for the training presentation were correlated substantially with error rates for vocal segregates for the stimulus presentation (r(112)= .30, p < .01). Accordingly, error rate for vocal segregates for the training presentation was entered as a covariate. The main effect for the covariate was significant (F(1,170)= 4.12, p < .05, [[eta].sup.2] = .03) indicating that participants' error rates for vocal segregates for the training presentation impacted their error rates for vocal segregates for the stimulus presentation. The main effect for number of nonverbal cues coded was significant and substantial (F(1,107) = 11.01, p <. 001, [[eta].sup.2] = .08) demonstrating that error rates were lower for participants coding two behaviors (M = .01, SD = .30) than error rates for observers coding eight behaviors (M = .26, SD = .30). The main effect for length of coding session was not significant (F(1,107) = .15, p > .05, [[eta].sup.2] = .001). The main effect for number of nonverbal cues observed, however, must be interpreted cautiously because of the significant number of nonverbal cues coded x length of coding session interaction (F(1,107) = 13.22, p < .001, [[eta].sup.2] = .09). This interaction indicates that participants who observed behavior for 2 minutes and counted two nonverbal cues had the lowest error rates (M = -.08, SD = .38), and participants who coded eight nonverbal cues for 2 minutes had the highest error rates (M= .36, SD = .34). Observers who counted behavior for 10 minutes had comparatively moderate error rates (two nonverbal cues coded, M= .10, SD = .15; eight nonverbal cues coded, M= .17, SD = .20). This interaction demonstrates that participants counting the incidences of eight nonverbal cues made more errors than those individuals counting the occurrence of two nonverbal cues, and this relationship was strongest for individuals who viewed the stimulus presentation for 2 minutes. Therefore, the results for vocal segregates partially mirror those ANCOVA results presented for the combined error rate.
Results regarding the length of coding session variable were not consistent with hypothesis 2, which posited a main effect for the length of coding session on observer errors. An examination of participants' scores from training presentation to stimulus presentation, however, revealed that overall error rates for the stimulus presentation (M = .39, SD = .20) were consistently higher than they were for the training presentation (M = .29, SD = .32, paired t(111) = -4.15, p < .001). Error rates for eye contact were higher for the stimulus presentation (M = .45, SD = .24) than error rates for eye contact for the training presentation (M = .15, SD = .36, paired t(111) = -8.55, p < .001). Furthermore, the mean error rate for vocal segregates was higher for the stimulus presentation (M = .14, SD = .32) than the mean error rate for vocal segregates for the training presentation (M = .06, SD = .44), although this difference was not significant (paired t(111) = -1.74, p = .085). Recall, however, that the correlation between the vocal segregate error rate for the training presentation and the vocal segregate error rate for the stimulus presentation was substantial and significant. Thus, on average, error rates increased 10% from the training presentation to the stimulus presentation.
Fatigue: Testing Hypotheses 3 and 4
Hypothesis 3 predicted a main effect for the number of nonverbal cues coded on fatigue; research participants who counted eight nonverbal cues were hypothesized to feel more fatigue than participants who observed two nonverbal cues. Hypothesis 4 posited a main effect for length of coding session on fatigue. Observers coding behaviors for 10 minutes were predicted to feel more fatigue than coders observing behavior for 2 minutes. Considering both main effects, it was expected that the greatest levels of fatigue would be reported from participants counting eight nonverbal cues for 10 minutes, and the lowest levels of fatigue would be observed in participants recording incidences of two nonverbal cues for 2 minutes. Accordingly, a two-way ANOVA was employed using number of nonverbal cues coded and length of coding session as independent factors and fatigue as the dependent variable. Contrary to the relationship predicted in hypothesis 3, there was not a substantial main effect for number of nonverbal behaviors coded (F(1,105) = .07, p > .05, [[eta].sup.2] = .001). Consistent with hypothesis 4, there was a main effect found for length of coding session (F(1,105) = 19.95, p < .001, [[eta].sup.2] = .16) that revealed participants who recorded incidences of nonverbal behaviors over 2 minutes reported feeling less fatigue (M = 2.75, SD = .57) than did respondents who noted frequencies of nonverbal behavior over 10 minutes (M = 3.38, SD = .88). The number of behaviors coded x length of coding session interaction was not significant (F(1,105) = 2.47, p > .05, [[eta].sup.2] = .02). These data were inconsistent with hypothesis 3 but were consistent with hypothesis 4. Longer coding sessions led to participants feeling greater levels of fatigue.
Error Rates and Fatigue: Hypothesis 5
There was no significant relationship between participants' fatigue levels and their error rates (r(109) = .02, p > .05). When controlling for the impact of the training presentation error rates, the correlation between fatigue and stimulus presentation error rates remained insignificant (pr(106) = .13, p = .10). This finding challenges the prediction made in hypothesis 5, which posited a positive significant and substantial correlation between fatigue and error rate.
Discussion
The impact of complex classification schemes on observer error has been questioned (Campbell & Stanley, 1966; Rifle et al., 1998). This experiment was conducted to test the cognitive load hypothesis presented by La France et al. (2004), which asserted that as observers suffer from increases in cognitive load associated with coding behavior, errors in coding would increase. Cognitive load increases as classification schemes become more complex (Riffe et al.). Coders scrutinizing more nonverbal cues (i.e., eight) had higher error rates than did observers counting two nonverbal behaviors. This finding is consistent with hypothesis 1. The length of coding session did not impact error rates, which contradicts the prediction made in hypothesis 2. The main effect for the number of nonverbal cues coded was found when examining the error rate for all nonverbal cues as well as for establishing eye contact and use of vocal segregates specifically. The main effect for number of nonverbal cues coded on the error rate for vocal segregates, however, was shadowed by the interaction between the number of nonverbal cues coded and length of coding session. For vocal segregates, the error rate was highest for participants who recorded eight nonverbal cues over 2 minutes; the error rate was lowest for participants who coded two nonverbal cues for 2 minutes. This interaction was surprising. Increasing classification scheme complexity by increasing the number of nonverbal cues coded resulted in dramatic increases in error rates. Indeed, the design decision to increase the number of nonverbal cues coded created 26% more errors.
The results of testing hypothesis 2 were equivocal. Although errors did not vary predictably for observers coding nonverbal communicative behavior for 2 minutes or 10 minutes, there was a clear effect for time when considering error rates from the training presentation to the stimulus presentation. Examination of the pattern of error rates over time revealed consistently higher mean error rates for the stimulus presentation than for the training presentation. This effect demonstrated that over time observers made 10% more errors. This finding highlights a paradox. Although training coders is crucial for scholars employing observational methodology, the results from this experiment warn researchers that there are unintended consequences of coder training. Each training session in which coders must participate may contribute to higher error rates. The degree to which training and time interact to impact observer error is an important methodological question that future research should address.
The main effect for number of nonverbal behaviors counted on fatigue was neither significant nor substantial, which contradicts the relationship predicted in hypothesis 3. There was, however, a main effect for length of coding session indicating that individuals who viewed the 10-minute presentation had higher levels of fatigue than participants who scrutinized the 2-minute presentation. This result offers support for hypothesis 4. Making a coding scheme more complex by increasing the time observers spend scrutinizing behavior increases feelings of observer fatigue, but participants' levels of fatigue were not influenced by the number of nonverbal cues coded. Feelings of fatigue, then, are produced by increasing the overall time that observers are expected to code behavior. Consequently, researchers employing coders should reduce the length of time raters spend coding behaviors whenever possible.
Hypothesis 5 posited that feelings of fatigue would be positively and substantially related to observer error. Surprisingly, there was no significant relationship between these two variables. This result is intriguing especially when considering that participants failed to recognize over one third of the presenter's nonverbal behaviors. One reason that fatigue was not related to observer error rates may be the result of restriction in range. Respondents were not likely to self report that they were energized, stimulated, challenged, or uplifted by performing this experimental task (Min. = 1.75). For scholars employing observers, hearing comments about how tedious or boring coding communicative behavior is is commonplace. Regardless of these reports, however, error rates are uncorrelated with feelings of fatigue. One ironic implication of the approximate zero correlation between fatigue and error rate is stark. Asking coders whether a particular observational task is tiring as a proxy indicant of reliability or validity is not useful. Furthermore, regardless of observers' self perceptions error rates were high.
Burgoon and Baesler (1991) argue that there are benefits and consequences to microscopic and macroscopic measurement, but they found that both were significantly correlated for 15 of 20 nonverbal cues (with exceptions including vocal pitch variety, vocal tension, vocal warmth, self-adaptors, and object-adaptors--cues not included in the present investigation). Additionally, reliability estimates were higher for 12 of 20 nonverbal cues when microscopic coding was used compared to macroscopic measurement. Smiling and gestures, two of the nonverbal cues investigated in the present experiment, were more reliably measured at the microscopic measurement level. Burgoon and Baesler state that as measurement becomes less isomorphic with how nonverbal communicative phenomena is experienced validity decreases. Their investigation, however, indicated acceptable validity and reliability estimates for a large number of nonverbal cues using microscopic measurement.
In conclusion, although complex coding schemes satisfy scholars' needs to be comprehensive, validity concerns mandate that researchers change the ways coding schemes are constructed. Increasing scheme complexity negatively impacts error rates. Much like increasing points on a self report scale has diminishing returns in reliability after 7 points (Nunnally, 1978, p. 595), so too does the impact of requiring observers to code many behaviors. This experiment has demonstrated support for the cognitive load hypothesis.
An earlier version of this paper was presented to the Interpersonal Communication Division of the National Communication Association. Chicago, 2004. We would like to thank W. Zakahi and the two anonymous reviewers for their help in shaping this manuscript, and thanks also to William Donner for help with data collection. Correspondence to: Betty H. La France, Department of Communication, Northern Illinois University, DeKalb, IL 60115, USA. E-mail: blafrance@niu.edu
References
Berry, D. S., & Hansen, J. S. (2000). Personality, nonverbal behavior, and interaction quality in female dyads. Personality and Social Psychology Bulletin, 26, 278-292.
Burgoon, J. K., & Baesler, E. J. (1991). Choosing between micro and macro nonverbal measurement: Application to selected vocalic and kinesic indices. Journal of Nonverbal Behavior, 15, 57-78.
Campbell, D. T., & Stanley, J. L. (1966). Experimental and quasi-experimental design for research. Boston, MA: Houghton Mifflin.
Egger, M., Smith, G. D., Schneider, M., & Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. British Medical Journal, 315, 629-634.
Hunter, J. E., & Gerbing, D. W. (1982). Unidimensional measurement, second-order factor analysis, and causal models. Research in Organizational Behavior, 4, 267-320.
Kendon, A., & Cook, M. (1969). The consistency of gaze patterns in social interaction. British Journal of Psychology, 60, 481-494.
La France, B. H., Heisel, A. D., & Beatty, M. J. (2004). Is there empirical evidence for a nonverbal profile of extraversion: A meta-analysis and critique of the literature. Communication Monographs, 71, 28-48.
Lippa, R. (1998). The nonverbal display and judgment of extraversion, femininity, and gender diagnosticity: A lens model analysis. Journal of Research in Personality, 32, 80-107.
Mallory, E. B., & Miller, V. R. (1958). A possible basis for the association of voice characteristics and personality traits. Speech Monographs, 25, 255-260.
Mobbs, N. A. (1968). Eye-contact in relation to social introversion/extraversion. British Journal of Social and Clinical Psychology, 7, 305-306.
Nunnally, J. C. (1978). Psychometric theory. New York, NY: McGraw-Hill.
Pedersen, D. M. (1973). Correlates of behavioral personal space. Psychological Reports, 32, 828-830.
Ramsay, R. W. (1966). Personality and speech. Journal of Personality and Social Psychology, 4, 116-118.
Riffe, D., Lacy, S., & Fico, F. G. (1998). Analyzing media messages: Using quantitative content analysis in research. Mahwah, NJ: Erlbaum.
Riggio, R. E., & Friedman, H. S. (1983). Individual differences and cues to deception. Journal of Personality and Social Psychology, 4, 899-915.
Riggio, R. E., & Friedman, H. S. (1986). Impression formation: The role of expressive behavior. Journal of Personality and Social Psychology, 50, 421-427.
Steer, A. B. (1974). Sex differences, extraversion, and neuroticism in relation to speech rate during the expression of emotion. Language and Speech, 17, 80-86.
Sterne, J. A., Egger, M., & Smith, G. D. (2001). Investigating and dealing with publication and other biases in meta-analysis. British Medical Journal, 323, 101-105.
Sutton, A. J., Abrams, K. R., & ]ones, D. R. (2001). An illustrated guide to the methods of metaanalysis. Journal of Evaluation in Clinical Practice, 7, 135-148.
Notes
[1] When critiquing meta-analytic methodology, scholars have cited biases associated with meta-analysis. Generally, these biases have been noted under the publication bias label (Egger, Smith, Schneider, & Minder, 1997; Sterne, Egger, & Smith, 2001; Sutton, Abrams, & Jones, 2001). "Studies [that] show a significant effect of treatment are more likely to be published, be published in English, be cited by other authors, and produce multiple publications than other studies" (Sterne et al., p. 101). Moreover, studies that show significant results are expected to be published and published more quickly than studies that do not show significant findings (i.e., pipeline bias, see Sutton et al.). Egger et al. argue that the accuracy in estimating true effect size increases as sample size increases because smaller sample studies produce more heterogeneous effects than do large sample studies. Thus, when effect size and sample size are plotted using a simple scatterplot, the result is a symmetrical inverted funnel. To the extent that the scatterplot shows an asymmetrical relationship between sample size and effect size reveals the extent to which bias exists within the meta-analysis. For example, if publication bias exists and smaller sample studies are not published because they did not obtain large enough effects to be statistically significant, then the plot will be asymmetrical; the resultant correlation would be negative. Although publication bias has been offered as an explanation of the relationship found between sample size and effect size, alternative explanations of asymmetry have been asserted. Sterne et al. suggested that asymmetry may result from large effects in smaller sample studies where individualized treatments are given to specific (e.g., high risk) persons. The present investigation offers the cognitive load hypothesis as another explanation of the negative correlation between sample size and effect size.
[2] This high error rate may have been obtained because the nonverbal cues coded were conceptually dissimilar as they were examples of haptics, ocular behavior, vocalics, and facial expressions. Analysis using only the ocular data (establishing eye contact and breaking eye contact) from participants who counted eight nonverbal cues, however, revealed that overall error rates were higher when only conceptually similar nonverbal cues were considered (M = .52, SD = .13).
[3] There were no significant sex differences regarding the error rate for the training presentation (t(109) = .04, p > .05, r = -.004, [M.sub.women] = .29, [SD.sub.women] = .33, [M.sub.men] = .28, [SD.sub.men] =.32) nor for the stimulus presentation (t(109) = -.89, p > .05, r = .09, [M.sub.women] = .37, [SD.sub.women] = .19, [M.sub.men] = .40, [SD.sub.men] = .21).
Table 1 Correlations between Constructs
Error Error Error
Variable Fatigue rate (TR) rate (ST) rate (VS)
Error rate (TR) -.12
Error rate (ST) .02 .62 **
Error rate (VS) -.05 .42 ** .43 **
Error rate (EE) .02 .47 ** .84 ** .21 *
TR = training presentation; ST = stimulus presentation; VS = vocal
segregate; EE = establishing eye contact.
* p [less than or equal to] .05. ** p [less than or equal to] .01.