Elements of traditional police promotional examinations (e.g., written tests and assessment centers) do not take into account candidates' work histories, and performance appraisals on records are fraught with problems. This article describes a new method of having panelists systematically review
"The two most important days in a candidate's career are the day he/she takes the written test and the day he/she takes the assessment center. What about the other 363 days of the year of reliable and outstanding performance?"
--Chief of Police
Traditional promotional examinations in police organizations often consist of the following elements: a written knowledge test, an assessment center, credit for seniority, and a score based on recent performance appraisal ratings. Each of these elements may evaluate some job-related attributes, but they also have limitations. The purpose of this article is to highlight the strengths and limitations of these traditional elements of police promotional examinations and to describe an innovative method for gathering and evaluating additional job-related information from the personnel records of candidates.
Strengths and Limitations of Traditional Methods
What is wrong with the traditional methods of evaluation? Credit for seniority certainly recognizes the contribution of long-time service to an organization and deserves inclusion in the overall ranking, but it is limited in that there is often little variation in years of service among the finalists. At the lower end, candidates usually must have at least a few years of service; at the upper end, long-time officers have often been promoted or do not put themselves up for advancement. More importantly, length of service in a lower-level job is no guarantee the person has the skills needed for higher-level positions.
Performance on a written test of knowledge and an assessment center also provide relevant, but limited, information. The written examination certainly provides the best way to evaluate whether a candidate knows the rules of the organization and conceptual principles of policing. That information is a basic, necessary starting point for good supervision, but it is not sufficient for success in police administration and leadership.
An assessment center certainly goes beyond the written test by tapping into the decision-making, administrative, and interpersonal skills that are necessary for effective management and leadership.[1] The long history of research on assessment centers clearly demonstrates their effectiveness in measuring skills related to effective management and in predicting success in a variety of organizational settings.[2] A potential limitation of the assessment center method is that it measures the ability to execute the skills ("can do" variables), but not necessarily the willingness to do so over the long haul in an organization ("will do" variables). In addition, assessment centers typically measure task-related aspects of performance, but not necessarily attributes related to the willingness to do extra "good citizenship" behaviors related to the organization's survival.[3]
Errors in Prediction
One result of these limitations may be two types of error in prediction: false positives and false negatives. False positives are those candidates who test well on the written examination and assessment center, but do not actually perform well on the target job if they are promoted. False negatives are those who do not test well and thus are not promoted, but would have performed well if they had been promoted. Both of these errors can be minimized by gathering behavior information about actual past job performance, which is relevant to performance on the higher-level job.
A logical suggestion then is to use existing information from performance appraisals of performance on the job. And, in fact, some jurisdictions include appraisal ratings from the past two to four years of performance. Unfortunately, existing performance appraisal information is often not very useful in this regard. Performance appraisals are used for various purposes, but the rating may not be an accurate reflection of actual work effectiveness.[4] Supervisors may be reluctant to write down honest, negative appraisals because of adverse consequences, such as pay implications or hostile reactions from the subordinate. Frequently the result is highly lenient ratings that do not differentiate among subordinates. Another problem with performance appraisals is that different supervisors may have different standards for their ratings. A harsh supervisor may grade down a subordinate who might have received a much higher score from a lenient supervisor. More importantly, at their best, performance appraisals are evaluations of performance on the current job and not necessarily relevant to performance on the next higher level of job.
Peer descriptions are an alternative source of evaluation of job performance which have long been proven reliable and valid[5] and have recently been incorporated in multi-source (sometimes called 360 degree) evaluation and feedback systems.[6] While input from peers may have a beneficial role in the development of managers in many organizational settings, peer evaluations are probably not advisable in the often highly competitive situation of making promotion decisions in police administration -- and were not acceptable in the organization discussed in this paper.
What is needed is some way of examining past job behavior that is relevant to performance on a future, higher-level assignment. An early study demonstrating the efficacy of evaluating personnel records was conducted by Hinrichs with a managerial sample at IBM.[7] Higher-level managers studied information in the personnel files of individuals who had also been assessed in a three-day assessment center. Scores on the records review and the assessment center were both found to be correlated with subsequent evaluations of job performance, and both tapped into unique information and contributed to the prediction of managerial performance. Similar findings were revealed in a study of candidates for promotion to management positions in fire and police departments.[8] A limitation of that study is the small sample from different organizations. The present report describes a method of evaluating the personnel work history which affords a way of fairly considering past job performance that is relevant to future performance in police management, along with data about the findings from a diverse sample.
A New Method for Evaluating Past Performance
The Personnel Records Review (PRR) process provides a systematic way of thoroughly and fairly reviewing each candidate's work history to assess many of the "will do" attributes that are so important to successful managerial performance. The PRR uses many of the measurement principles that have made the assessment center method so successful: evaluation of job-related dimensions, reliance of behavioral evidence, multiple raters consulting multiple sources, and a systematic process of integrating information which delays decision making, avoids rating errors, and balances multiple inputs.
Job Related Dimensions
The PRR begins with a careful job analysis of the key elements of success in the target job. Most police jurisdictions already have a sound job description of target jobs, so the dimensions to be evaluated are probably already known. Other jurisdictions may need to do a new job analysis. One jurisdiction decided to use the following dimensions for the sergeant's examination:
* Supervisory-related education and experience -- This includes any formal or informal education that was related to the job, including external and internal coursework, workshops, experience inside and outside the organization, e.g., participation in community leadership roles.
* Disciplined behavior -- any information about sustained discipline and recorded incidents displaying poor performance or violating regulations; in addition, evidence of demonstrations of good discipline is evaluated favorably.
* Commendatory behavior -- formal commendations within the agency and letters from citizen groups.
* Reliability -- Negative evidence might involve unauthorized absence, failure to meet court date, tardiness, failure to meet department standards; positive evidence includes completing tasks on time, volunteering for extra assignments.
Each of these dimensions was defined thoroughly, along with examples of behaviors illustrating good and poor performance. Candidates were informed that these dimensions would be evaluated, and panelists were trained to recognize relevant behaviors.
Multiple Sources of Data
Behavioral information relevant to these dimensions can be obtained from a variety of sources. In recent applications of the procedure in major cities, the following sources of information have been consulted to obtain information relevant to the target job:
* Personnel file with performance evaluations -- The personnel jacket is examined for behavior relevant to all of the dimensions
* Supervisory situation record (SSR) -- This is a daily log kept by the immediate supervisor in the candidate's work area; it might include notes on effective performance that does not rise to the level of a commendation, or ineffective performance that does not result in disciplinary action. These entries are like critical incidents, and this source may provide behavioral evidence relevant to all dimensions
* Commendation history -- These records are kept in the department and list the formal forms of recognition such as awards and citizen letters
* Professional history form -- Completed by the candidate, this form allows each person to present any evidence related to the dimensions and explain any entries in the disciplinary record or SSR; this "brag sheet" provides the candidate with the opportunity to make sure all behavior relevant to the assessed dimensions is presented to the panel.
Multiple Raters
A pool of six raters, two or three levels above the target position, is selected to be representative of the workforce -- including males and females, majority and minority individuals from different functional areas in the department. Each candidate is allowed to pick the four raters to serve on his or her panel. Internal raters are more appropriate than outside raters, who are often used in assessment centers, because inside raters are more familiar with the special situations faced by members of that department. Such raters can make more meaningful evaluations of special circumstances surrounding the candidates' past performances.
Rater Training
Training for the panelists is comparable to the training received by assessors observing assessment center behavior. Training includes definitions and discussion of the dimensions, review of the sources of data, discussion of rating errors and how to minimize them, ethical issues that can arise, the responsibility to be fair to all candidates, and review and application of the steps in evaluation process. The training is provided by the consultant, the HR representatives from the hiring body, and the chief of police who charges the panel with making fair evaluations. Training includes practice with real files (with names blocked out) of persons who are not actually candidates at that time.
Systematic Integration
A systematic process is carried out for studying, reporting, and integrating the behavioral information from the records. Each of the four panelists is given one or more of the source documents to study for a few minutes. Then, each panelist briefly summarizes the critical behavioral information in the source document he or she has reviewed. Next, the panel reports detailed information about the first dimension, e.g., education and related experience, as it appears in each of the source documents. After all information is presented on this dimension, the panelists independently write down an initial rating. If these ratings are more than one point apart (on a five-point scale), further discussion is conducted to identify the sources of disagreement. When the ratings are within a range of 1.0, they are recorded, and the panel then discusses further until consensus is reached on the final dimension rating. The above process is carried out on all dimensions. This consensus discussion process has been utilized successfully in numerous assessment center applications.[9]
The final step is the overall rating. Panelists independently record an overall rating that represents his or her best judgment of the quality of performance demonstrated by this candidate. The final rating is not an average of the dimension ratings, because one very high dimension score may override other, lower scores, or one very low dimension score may override other, high scores. For example, even though a candidate may have a very fine educational background and get a high score on job-related education and experience, he or she may deserve a low overall rating, if the candidate has a serious disciplinary transgression. The final rating of each panelist is the best possible judgmental integration of all relevant information. The final rating is not some mechanical arithmetic averaging of scores. Final ratings are discussed by the panelists, until they reach consensus.
The consultant or moderator of the panel follows a strict script for leading the panel through the above steps. Documentation of the process for each candidate is recorded by having the moderator complete a checklist to ensure administrative consistency.
Assessment Center Principles Applied
For anyone who has experienced an assessment center, it has probably become obvious that the Personnel Records Review process described here utilizes many of the principles and practices of the assessment center method. To clarify these similarities, the essential elements of the assessment center method provided in the "Ethical Considerations and Guidelines for Assessment Center Operations"[10] are displayed in Table 1, along with the comparable aspects of the PRR.
Statistical Analyses of Personnel Records Reviews
Research on the PRR in prior promotional examinations reveals that the process provides valid and fair information which contributes unique information to the determination of qualifications among police candidates. For example, statistical analyses of the ratings for a recent examination for police sergeant are shown in Tables 2 to 7. Table 2 shows the means and standard deviations for the five elements of the examination process, including the evaluation of the personnel records for the 42 finalists. (A maximum of 10 points were allowed for this element, as determined by civil service commission rules. The maximum allowed for all elements is shown in the footnote to Table 2.) The results show that there were no statistically significant differences in the ratings of personnel records for men versus women, or for whites versus blacks or Hispanics. These findings suggest that the PRR is fair to these protected classes.
Differences among subgroups were noted in some comparisons on other elements: whites in this sample received higher assessment center ratings and overall ratings than blacks, and women in this sample had somewhat longer seniority than men. Even though the average ratings were somewhat different, analyses showed that the rates of actual promotions were fair to all groups. Based on the assumption that 25 promotions would be made during the life of the promotional list generated by this examination, analyses were conducted to determine the fairness of the evaluations (sometimes called adverse impact analyses) emanating from this portion of the promotional examination alone. Chi Square analyses revealed that there were no statistically significant differences in the promotion rates for the subgroups, if the PRR alone was used to make promotions. These findings suggest this process is fair to all subgroups.
Table 3 presents the correlations among the elements of the examination process and the total scores. These findings show that the PRR is correlated to a moderate degree with the assessment center ratings [R] = .40), but not with the written test or seniority. Thus, the records review is tapping into some qualities not measured by the other elements. More senior officers, compared to less senior officers, are not necessarily showing better records of on-the-job performance relevant to the qualities sought in higher-level officers.
Table 4 shows the results of multiple regression analyses of the total score in relation to the elements of the examination. They reveal that all elements make independent contributions to the overall ratings. The assessment center appears to be carrying the most weight, followed by seniority, and then the PRR. The written test, race/ethnicity, and gender did not make a statistically significant contribution to the prediction of the total score, after the assessment center, seniority, and the records review are taken into account. The failure of the written test to make any additional unique contribution to the prediction of total scores at this stage is probably due to the very narrow range of written test scores for the finalists who were screened into the later stages on the basis of the written test scores themselves.
Race and gender account for no additional variance in Total Score.
The next analyses look within the processes within the Personnel Records Review. Table 5 shows the means and standard deviations for the four components of the PRR and the total score on the PRR for racial/ethnic and gender subgroups. No statistically significant differences in subgroup scores were noted. These findings show that men and women, and whites, blacks, and Hispanics in this sample, were evaluated on average the same on discipline, commendatory behavior, reliability, and job-related education and experience. The findings suggest the process is fair to these subgroups.
Table 6 shows the correlations among the components and the total score on the PRR. Ratings on most components correlated with each other, and all components correlated with the total score. Ratings of discipline and commendations are not correlated and, thus, seem to be tapping independent aspects of past performance, as do reliability and education/experience.
Table 7 shows the results of regressing the Total Score on the scores of the four components. Each of the components correlates independently with the total score, suggesting that each component is tapping a unique and important aspect of these candidates' backgrounds related to potential for success as a sergeant.
Potential Problems and Solutions
As with any assessment method, there are potential problems which must be anticipated and dealt with. For the Personnel Records Review, initial applications revealed some issues which were dealt with in subsequent applications. Recent promotional examinations using the procedure have been improved to further minimize the problems and enhance the procedure's reliability, validity, and fairness. Initially, there was some variability in the amounts and style of material submitted by candidates in the Personal History Form. Some candidates went to unnecessary lengths to have the material prepared by professional printers, including colored paper and fancy binding. Panelists had no trouble ignoring "slick" presentations and focusing only on the behavioral content, but to ensure the perception of fairness by everyone, later applications placed clear limitations on the materials, e.g., no more than three pages per dimension, no colored paper, no fancy binders, etc.
Another problem was the occasional inconsistent amount and quality of information in the various sources. For example, Supervisory Situation Reports from some district offices and some managers were more complete than others. The internal assessors were usually aware of the differences in these daily records prepared in different locations and could adjust their evaluations so as not to penalize any one candidate.
When information was not complete, and a panelist had additional information to contribute, the natural question arose about whether to allow panelists to introduce information not in the written record. Because it was felt that relevant information should be used, a set of rules was agreed upon for allowing outside information: (a) The information had to be relevant to one of the dimensions; (b) it had to be verified or verifiable; (c) more than one panelist had to concur that the information was correct; and (d) the panel had to agree to allow the information to be used. In a few instances, phone calls were made to obtain independent verification of a piece of information not in the records.
Another potential problem with any group discussion is that one person may try to dominate a portion of the meeting, and may even try to "champion" his or her favorite candidate. This potential problem was dealt with in several ways. The chief or his deputy attended the training session for the panelists and charged them with the responsibility of being fair to everyone. Panelists were held accountable for following the procedures consistently for all candidates. The consultant facilitating the discussions followed the strict procedures described earlier. A citizen observer was present for all discussions and was charged with challenging the panelists, if irrelevant information was being discussed, or if their conclusions did not follow logically from the behavioral information presented in the documents.
Conclusions
The Personnel Records Review provides a systematic way to include information from each candidate's work history into the promotional examination process. Job-relevant behavioral information not provided by a written test or an assessment center is brought to bear on the important decision of who should be promoted. Candidates, the police organization, and, most importantly, the public gain from more valid and fair promotional processes.
Notes
[1] Thornton, G.C. III (1991). Assessment Centers in Human Resource Management. Reading, MA: Addison-Wesley.
[2] Thornton, G.C. III & Byham, W.C. (1981). Assessment Centers and Managerial Performance, New York: Academic Press.
[3] Gaugler, B.B, Rosenthal, D.B., Thornton, G.C. III, & Bentson, C. (1987). "Meta-Analysis of Assessment Center Validity," Journal of Applied Psychology, 72, 493-511.
- Borman, W.C. & Motowidlo, S.J. (1993). "Expanding the Criterion Domain to Include Elements of Contextual Performance," In N.Schmidt & W.C. Borman (Eds), Personnel Selection in Organizations (pp 71-98). San Francisco, CA: Jossey-Bass.
[5] Murphy, K.R. & Cleveland, J.N. (1995). Understanding Performance Appraisal: Social, Organizational, and Goal-Based Perspectives, Thousand Oaks, CA: Sage.
[6] Kane, J.S. & Lawler, E.E. III. (1978). "Methods of Peer Assessment," Psychological Bulletin, 85, 555- 596.
[7] Cheung, G.W. (1999). "Multifacted Conceptions of Self-Other Rating Disagreement," Personnel Psychology, 52, 1-36.
[8] Hinrichs, J. (1976). "Comparison of 'Real Life' Assessments of Management Potential with Situational Exercises and Pencil-Ability Tests, and Personality Inventories," Journal of Applied Psychology, 53, 425-432.
[9] Lowrey, P.E. (1994). "Selection Methods: Comparisons of Assessment Centers with Personnel Records Review Evaluation," Public Personnel Management, 23:, 383-395.
[10] Thornton, ibid.
[11] Task Force on Assessment Center Guidelines. (1989). "Guidelines and Ethical Considerations for Assessment Center Operations," Public Personnel Management, 18, 457-470.
George C. Thornton III, Ph.D. Department of Psychology Colorado State University Fort Collins, CO 80523
George Thornton earned his Ph.D. in Industrial Psychology from Purdue University in 1966. He is currently a professor of Industrial/Organizational Psychology at Colorado State University. He has conducted research and published numerous articles on the assessment center process, including a meta-analysis of assessment center validities, the landmark 1981 book with William Byham Assessment Centers and Managerial Performance, and a book published by Addison Wesley, Assessment Centers and Human Resource Management. In addition to his continuing research and development of assessment centers, Dr. Thornton's current research interests and consulting practice centers around employment discrimination law, including age discrimination in layoffs, and sexual harassment. Dr. Thornton is a Fellow of the Society of Industrial/Organizational Psychology, and a diplomate of the American Board of Professional Psychology.
David M. Morris, Ph.D., J.D. Morris & McDaniel, Inc. Box 11599 Washington, D.C.
David Morris, President of Morris & McDaniel, Inc., has his doctorate of philosophy in psychology, with an emphasis in industrial/organizational psychology and juris doctorate. Dr. Morris' dual career as an Industrial/Organizational Psychologist and attorney gives him a unique perception of Title VII and the development of personnel procedures. Dr. Morris has held academic positions and has taught courses in industrial relations and related areas of psychology. Dr. Morris is a member of many professional associations including the American Psychological Association, the International Personnel Management Association, and the American College of Forensic Psychology. Dr. Morris has successfully designed entry-level and promotional systems for numerous police and fire service jurisdictions, both large and small, including Denver, Boulder, and the states of Massachusetts, Maryland, and Georgia. Dr. Morris founded the firm of Morris & McDaniel, Inc. and has been with the firm for 24 years.
Table 1. Comparison of Personnel Records Review and
Assessment Center Methods
Personnel Records Review
Dimensions Job-related attributes
Sources of Extant records
Information
Behavior observed Past job performance
Raters Internal managers
Rater training Rating errors and how to
how to mitigate mitigate
Method of Study reports, report
combining behavior, rate dimensions,
information achieve consensus, give
overall rating, achieve
consensus
Assessment Center
Dimensions Job-related skills, abilities, aptitudes
Sources of Situational exercises
Information
Behavior observed Behavior displayed in exercises
Raters External managers
Rater training Rating errors and
how to mitigate
Method of Observe exercises, report
combining behavior, rate dimensions,
information achieve consensus, give
overall rating, achieve
consensus
Table 2. Means and Standard Deviations for Elements of
Police Sergeant Promotional Examination Process for Total Group,
Gender and Racial/Ethnic Subgroups
Group (sample size)
Total Men Women
Exam Elements[a] (42) (36) (6)
Personnel Records Review 8.2 8.2 7.8
(1.4) (1.3) (2.1)
Written Examination 36.3 36.1 37.1
(1.2) (1.0) (1.8)
Assessment Center 24.2 23.7 27.1
(5.9) (5.6) (7.1)
Seniority 3.8 3.6 4.6[c]
(1.6) (1.6) (0.8)
Total Score 72.4 71.7 76.6
(7.1) (6.5) (9.7)
White Black Hispanic
Exam Elements[a] (31) (5) (6)
Personnel Records Review 8.3 7.9 7.7
(1.2) (1.9) (1.9)
Written Examination 36.4 35.7 36.1
(1.3) (1.1) (0.9)
Assessment Center 25.4[b] 17.8 23.4
(5.7) (4.1) (4.9)
Seniority 3.8 2.9 4.6
(1.6) (1.6) (0.8)
Total Score 73.8[b] 64.2 71.8
(6.7) (5.4) (6.6)
[a] Maximum points: Personnel Records Review: 10; Written Exam: 40;
Assessment Center: 45; Seniority: 5.
[b] Whites scored higher than blacks, p [less than] .05.
[c] Women scored higher than men.
Table 3. Correlations among Elements of Police Sergeant
Promotional Examination Process
2 3 4 5
1. Written Test 0.14 .39[*] 0.03 .52[*]
2. Personnel Records Review .40[*] -.10 .53[*]
3. Assessment Center -.11 .95[*]
4. Seniority 0.12
5. Total Score
[*] p [less than] .05.
Table 4. Regression of Total Score on Elements of Police Sergeant
Promotional Process
R2 Change in R2
Assessment Center 0.89 .89[*]
Seniority 0.94 .05[*]
Personnel Records Review 0.97 .03[*]
[*] p [less than] .05.
Table 5. Means and Standard Deviations of Components of Personnel
Records Review for Total Group, Gender and Racial/Ethnic Subgroups
Group (sample size)
Total Men Women
(42) (36) (6)
Discipline 4.4 4.3 4.5
(0.7) (0.7) (0.8)
Commendatory Behavior 3.8 3.8 4.1
(0.9) (1.0) (0.5)
Reliability 4.5 4.6 4.3
(0.8) (0.7) (1.6)
Education and Experience 4.0 4.0 3.9
(0.6) (0.6) (0.9)
Total 4.1 4.1 3.9
(0.7) (0.7) (1.0)
Group (sample size)
White Black Hispanic
(31) (5) (6)
Discipline 4.4 4.1 4.4
(0.7) (1.0) (0.8)
Commendatory Behavior 3.9 3.1 3.9
(0.8) (1.6) (0.7)
Reliability 4.6 4.5 4.2
(0.7) (1.0) (1.6)
Education and Experience 4.0 3.9 4.0
(0.7) (0.7) (0.5)
Total 4.1 3.9 3.9
(0.6) (1.0) (1.0)
NOTE: There are not statistically significant differences among
gender and racial/ethnic groups.
Table 6. Correlations among Components of Personnel Records Review
2 3 4 5
1. Discipline .32[*] .70[*] .33[*] .69[*]
2. Commendatory Behavior 0.05 .63[*] .69[*]
3. Reliability 0.14 .57[*]
4. Education .74[*]
5. Total
[*] p [less than] .05.
Table 7. Regression of Total Score on Components of
Personnel Records Review
R2 Change in R2
Education 0.55 .55[*]
Reliability 0.77 .23[*]
Commendatory Behavior 0.86 .09[*]
Discipline 0.88 .02[*]
[*] p [less than] .05.
Note: Race/ethnicity and gender account for no additional variance in
Total Score summary, these analyses show that the Personnel Records
Review is fair to all protected subgroups and makes an independent
contribution to the evaluation for sergeant in combination with the
assessment center and seniority. All components of the PRR are fair to
racial/ethnic and gender subgroups, and all make unique contributions
to the determination of the candidates' final scores on the records
review. The data from this sample suggest the review of personnel
records is a valid and fair process for examining candidates' past
behaviors as they relate to qualifications for promotion to police sergeant.