The recommendations at the end of this report were approved by the Faculty Senate on May 1, 2000.
A REPORT TO THE FACULTY SENATE
FROM THE TEACHING COUNCIL
ON STUDENT REVIEW OF TEACHING
Approved March 1, 2000
A. BackgroundII. REVIEW COMPONENTSThe CTEP Task Force "Report on Student Evaluation Component" dated October 5, 1994 begins with these statements:B. The Current Student-Survey Component of the Campus Teaching Evaluation ProgramThe desirability of a structured evaluation of teaching effectiveness has been well-established at The University of Tennessee, Knoxville. In October 1987, the ad hoc Senate-Provost Committee on the Evaluation of Instruction issued a report entitled "A Proposal to Evaluate Teaching Effectiveness" which called for a "rigorous and thorough" review of teaching performance of all faculty "regardless of their academic rank or tenure status" and which should include both (a) student opinion "collected through written surveys in courses taught by the instructor under review" as well as (b) a peer review component (p. 1).The 1987 report also stated that:The primary reasons for evaluating teaching are (1) to provide information for sound personnel decisions and (2) to assist faculty members in self-improvement (p. 6).An instrument titled the Chancellor's Teaching Evaluation Program (CTEP) was developed to serve the first of these purposes. The survey included only a few general questions about overall teaching effectiveness and was employed from 1989 to 1995. The Task Force Report goes on to say that during that period of time, "the UTK Student Government Association ... sought (and obtained) access to the results of CTEP as a means of satisfying a third reason for student review of instruction--to help students choose courses effectively." This access was obtained under the "Tennessee Open Records Law."
This earlier CTEP Task Force's mandate was "to examine the CTEP program in its entirety in order to see if the various goals of evaluation of instruction could be reconciled." The Task Force recommended adoption of the University of Washington's Instructional Assessment System beginning in the Fall Semester 1995 as the student-survey component of the renamed Campus Teaching Evaluation Program, which was to include a formal peer review component as well. The recommendation included a provision that the effectiveness of the student-survey component be reviewed after three years. This three-year period was extended for an additional two years (through the 1999-2000 academic year) to allow an adequate period of time for the review to be conducted.
In June 1998, the University of Tennessee Board of Trustees approved guidelines for post-tenure review of faculty. The Teaching Council was charged by the Faculty Senate with making recommendations on how the review process should be implemented for teaching. The Teaching Council spent most of the 1998-99 academic year developing "Best Practices" Guidelines, which were included as an Appendix in the Post- Tenure Review Policy approved by the Faculty Senate in Spring 1999. The "Best Practices" Guidelines state that a "goal of this proposed process is to provide greater balance to the existing review process by gathering inputs from faculty members, their peers, and their students, rather than primarily student reviews" (p. 3). The Guidelines go on to say that the "proposed review process confirms the need for student review of teaching" (p. 5) and that "(q)uestionnaire(s) similar to the current CTEP ... should continue to be administered" (p. 5). However, the guidelines also suggest caution in the interpretation of results from small classes and encourage the use of open-ended questions as well. Finally, the Guidelines re-emphasize that "while student reviews occur each semester, they should not receive greater weighting than self or peer assessments during cumulative reviews" (p. 6).The student-survey component currently employed at UTK was adopted with slight modifications from the University of Washington's Instructional Assessment System (IAS). The IAS consisted at that time (1994) of eight different 26-item questionnaires and a supplemental open-ended student comment sheet. Departments or individual instructors choose the questionnaire most appropriate for their courses and have the option to add up to 15 of their own questions. In addition to four general questions which are essentially identical across all questionnaires, students rate instructors on two other groups of questions designed to provide (1) diagnostic feedback to instructors and (2) information to other students. Students are also asked about their reasons for taking the course, their classification, and the grade they expect to receive in the course. The CTEP reports provide a distribution of student responses by rating category for each question, standard deviation, and mean rating. A comparison of the individual course mean ratings with the means of ratings for all other departmental, college and university courses using the same form is provided, with an indication of whether differences are statistically significant. A percentile' rank among all others using the same form within a department and college is also calculated.C. Outline of the ReportThis report is devoted to describing the process and components of the Teaching Council's review of UTK's current system for student review of teaching, as well as outlining general conclusions and specific recommendations. The next section provides summaries of the findings from the following six components of the review.(1) E-mail survey of peer institutionsThe final two sections include a summary of general conclusions and a list of specific recommendations.
(2) Mail survey of all UTK faculty and instructors
(3) Solicitation of input from UTK student leaders
(4) Review of research literature on validity and usefulness of student ratings of instruction
(5) Review of validity, reliability, and recent changes in the University of Washington's Instructional Assessment System
(6) Review of policies and procedures related to implementation of UTK student survey system
A. E-mail Survey of Peer InstitutionsIII. General ConclusionsInformation regarding student evaluation of teaching was obtained in January 1999 from six of UT's ten peer institutions. Practices vary widely across the six institutions. While some form of student evaluation of teaching is required at all six, the specific approach used is generally left up to individual colleges. At three of the six institutions, however, a standard set of core questions is included by most colleges. Three other Tennessee institutions were also contacted. All three institutions require use of a standardized set of core questions, university-wide.B. Mail Survey of All UTK Faculty and InstructorsIn April 1999, the Office of Institutional Research & Assessment (OIRA) conducted a survey of all UTK faculty and instructors, numbering 2,489 in total. The questionnaire was developed by a task force within the Teaching Council, with all members having the opportunity for input on its scope and design. A total of 662 questionnaires were returned (about 60% of which were from tenure-track faculty), for a response rate of 26.6%. The results were documented in an OIRA report dated July 1999. Availability of the OIRA report to anyone in the university community was publicized when a summary of the report results was provided at a Faculty Senate meeting in early October 1999 and posted on the Faculty Senate web site.C. Solicitation of Input from UTK Student Leaders
In short, while the survey results did not provide a ringing endorsement of the current system, neither did they constitute a serious indictment. Of all respondents, 78% agreed that "a structured teaching evaluation should be used for classes." While 44% agreed that "CTEP provides useful information for tenure-promotion considerations," 25% disagreed, 20% were neutral and 10% had no basis to judge. With regard to whether "CTEP numerical data assisted teaching faculty with self-improvement," 51% agreed, while 26% disagreed, 19% were neutral, and 3% had no basis to judge. The statement that "student comment sheets assist teaching faculty with self-improvement" was agreed to by 78%. The statement "CTEP currently works well for the particular courses you teach and evaluates teaching effectively" was agreed to by 41%, with 32% disagreeing, 23% neutral, and 5% no basis to judge. In separate questions, 32% agreed that "CTEP needs to be replaced by a simpler evaluation system," while 33% agreed that "CTEP needs to be replaced by evaluation forms developed by each unit." Mean ratings did not differ greatly among GTAs, tenure-track faculty, and non-tenure track faculty.
Responses to the open-ended questions on the survey do shed a good bit of light on why some faculty are critical of the current system. The first question, relating to why a particular form was used, revealed that 16% were not offered a choice or were unaware they had a choice, while another 34% indicated they used the form given to them by their department, unit, or college. These findings may explain some of the criticism regarding the usefulness of the results, especially for self-improvement.
The second question asked for specific comments or recommendations. Of the 548 responses reviewed, 262 were comments regarding the current system. Of these comments, 95 (or 36%) reflected the opinion that results show things like ease of the class, ease of the grading system, or popularity of the instructor, rather than how well students learn. Forty-five comments (17%) reflected the opinion that the current system does not control for student biases of various sorts. Thirty-eight responses (15%) were related to the fact that not all information collected about students was reported in Tennessee 101, the Student Government Association's publication of the results. Twenty-eight responses (11%) reflected the opinion that while the results are helpful to administrators and students, they are not helpful to instructors. Smaller numbers of multiple responses also reflected the opinions that there is not a good form for team-taught courses, results are published too late for useful feedback, and some questions are not clear, useful, or relevant.
The other 286 responses included specific recommendations. Sixty-seven responses (23%) recommended that use of the open-ended questions be encouraged. Fifty-six responses (20%) recommended some change in the methods of administering the program and calculating the results. Some of these recommendations related to concerns about the validity of comparative means across classes that differ in terms of size, reason for taking the course, etc. Fifty-three responses (19%) recommended that forms be class or departmental specific. Twenty-one responses (7%) recommend that peer reviews and administrative reviews should also be included in the evaluation system. The same number recommended inclusion of "student responsibility" questions related to attendance, class participation, hours studied, etc. Fourteen responses (5%) reflected the opinion that results should be kept confidential. Eighteen responses (6%) recommended eliminating all standardized teaching evaluation by students, while 36 responses (13%) recommended no change in the current system.Input from both undergraduate and graduate student representatives on the Teaching Council has been solicited throughout the review process. In addition, the Teaching Council's Committee on Student Evaluation of Teaching sponsored an open forum for undergraduate students on January 19, 2000 and for GTAs on February 23, 2000. Student leaders were also encouraged to solicit input from other students by whatever means they wished regarding student review of teaching at UTK.D. Review of Research on the Validity and Usefulness of Student Ratings of Instruction
Undergraduate student leaders indicated that in their opinion, most students strongly support the current system with respect to (1) the opportunity it affords to complete a questionnaire in each class and (2) the usefulness of results for a core of standard questions that is published in Tennessee 101. While a number of specified ideas or questions were raised at the open forum, there appeared to be general consensus on the following points:
Graduate student leaders noted that there are mixed feelings among GTAs as to the usefulness and fairness of the current CTEP approach and reporting of results. the following points were made at the GTA forum:
- open-ended questions are important
- information from questions regarding "student responsibility" for the learning process would be useful to students
- students need at least 15 minutes to fill out the form(s)
- students would be more serious and results would be more valid if surveys were administered at the beginning of a class period rather than the end
- surveys should be administered as late as possible in the semester, no earlier than the last two to three weeks
- clarification is needed as to the appropriate procedure and channels for raising questions or reporting incidents of improper administration
- comparison of means for a GTA with those of tenured professors using the same form is viewed as unfair by many GTAs
- perhaps courses taught by GTAs should not be included in Tennessee 101 (since many GTAs only teach for one or two years and these names are generally not in the course timetable)
- students need to be educated as to how "reason for taking the course" categories are defined
- there is "talk" among GTAs as to strategies for insuring a "good atmosphere" in which to administer the student survey
- GTAs would benefit from greater follow-up (e.g., meeting with supervisor to review results)
- GTAs should be encouraged to use their own brief surveys early in the semester to allow for midcourse correctionsThe Teaching Council reviewed a set of five articles on this subject that were published in the November 1997 issue of American Psychologist. These articles were written largely to summarize the findings from the research on this subject over the 1971-1995 period. One article did report on research conducted recently with data from the University of Washington's Instructional Assessment System.E. Review of Validity, Reliability and Recent Changes in the University of Washington's Instructional Assessment System (IAS)
In the overview article, Greenwald reports on a review of 172 studies, concluding that 77 found student ratings to be valid indicators of instructional quality, 69 were neutral, and 26 found some type of bias, i.e., "that student ratings of instruction are contaminated by one or more extraneous influences." (p. 1183) Greenwald goes on to state the following:In summary of the relatively recent literature on student ratings, and as the following quotes indicate, prominent reviews published since about 1980 give a clear impression that major questions of the 1970s about ratings validity were effectively answered and largely put to rest by subsequent research.... These quotes not only acknowledge that grades and ratings are correlated but also express the judgment that this correlation can and should be interpreted without concluding that grades create a bothersome contamination of ratings. (p. 1184)In the second article, Marsh and Roche conclude from their review of the literature that, "... under appropriate conditions, students' evaluations of teaching (SETs) are (a) multidimensional; (b) reliable and stable; (c) primarily a function of the instructor who teaches a course rather than the course that is taught; (d) relatively valid against a variety of indicators of effective teaching; (e) relatively unaffected by a variety of variables hypothesized as potential biases (e.g., grading leniency, class size, workload, prior subject interest); and (f) useful in improving teaching effectiveness when SETs are coupled with appropriate consultation." They do acknowledge, however, that there is fairly consistent evidence in the literature for correlation between student ratings and expected or actual grade, reason for taking a course, and class size.
In the third article d'Apollonia and Abrami "... report the results of a meta-analysis of the multisection validity studies that indicate that student ratings are moderately valid; however, administrative, instructor, and course characteristics influence student ratings of instruction." (p. 1203) They conclude, however, that '.'grading practices are not a practical threat to the validity of student ratings" (p. 1205) and caution against attempting to statistically control student ratings for grading leniency, because higher grades may, in fact, reflect greater student learning. At the same time, they note that some writers in the literature have argued against using norms to rank instructors, emphasizing that "... such norms have negative effects on faculty members' morale. By definition, half the faculty would be below the norm, yet they could be excellent teachers." (p. 1204) In closing they "... recommend that comprehensive systems of faculty evaluation be developed, of which student ratings of instruction are only one, albeit important, component .... Within such a system student ratings should be used to make only crude judgments of instructional effectiveness (exceptional, adequate, and unacceptable)."
In the fourth article, Greenwald and Gillmore specifically address the relationship between student ratings and expected grade in a research study using data from the University of Washington. They conclude that there is a positive relationship between the two that is unrelated to student achievement, and as a result, recommend that a statistical correction or adjustment be made to actual ratings data. (As will be discussed in the next section, such an adjustment is now made on result reports at the University of Washington not only for "grading leniency," but also for class size, and reason for taking a course.) In their concluding section, however, they state very clearly their support in general for collecting and reporting student ratings:The results reported in this article might be regarded as sufficient reason to abandon the entire enterprise of collecting and reporting student ratings. However, there are three good reasons to conclude just the opposite-- that student ratings measures deserve increased attention .... First, in many cases, there is no readily available alternative method of evaluating instruction .... Second, although the influence of grading leniency means that student ratings have a deficiency in discriminant validity, the evidence for convergent validity of student ratings cannot and should not be dismissed.... Third, student ratings almost certainly contain useful information that is independent of their correlation with student achievement.... In summary, there is an instructional-quality baby (convergent validity) in with the bathwater (discriminant invalidity) of grades-- ratings correlations and other possible contaminants. It seems much wiser to give that baby a bath and make it presentable than to throw the baby out with the bathwater. (p. 1215)In the final article, McKeachie discusses the other four articles and focuses on his concern about the proper use of student ratings for personnel decisions. He draws the following conclusions:(T)here is little disagreement about the usefulness of student ratings for improvement of teaching (at least when student ratings are used with consultation or when ratings are given on specific behavioral characteristics).... All of the authors (and I join them) agree that student ratings are the single most valid source of data on teaching effectiveness.... (P)ersonnel committees (should) sensibly use broad categories rather than attempting to interpret decimal-point differences.... (p. 1219)
Student ratings are valid, but all of the authors in this Current Issues section agree that they should be supplemented with other evidence.... I contend that the specific questions used, the use of global versus factor scores, the possible biasing variables, and so forth are relatively minor problems. The major validity problem is in the use of the ratings by personnel committees and administrators.... No matter how valid the evidence provided by students may be, it is almost certainly more valid than many personnel committees give it credit for being.... Although I believe that a statistical adjustment of ratings, such as Greenwald and Gillmore (1997) suggest, may result in lower, rather than higher, validity, it may increase the credibility of the ratings....
Almost as bad as dismissal of student ratings, however, is the opposite problem--attempting to compare teachers with one another by using numerical means or medians. Comparisons of ratings in different classes are dubious not only because of between-classes differences in the students but also because of differences in goals, teaching methods, content, and a myriad of other variables. (p. 1222)Information about the current IAS was collected from the University of Washington web site (www.washington.edu/oea/iasl) and extensive telephone conversations with Dr. Jerry Gillmore, Director of the Office of Educational Assessment. The IAS was implemented in 1974, undergoing some revision in 1995 and again in 1998. A section in the web page description of the IAS summarizes the extensive process used in developing the original rating forms to insure their content validity. The section also notes: "Nationally, a large number of empirical studies have established the validity of student ratings of instruction using forms similar to those used in the IAS." Studies in recent years have inter-rater and inter-teacher reliability to be very high with the current IAS forms. Inter-rater reliability coefficients represent the level of agreement among students on the ratings of individual classes relative to mean differences across classes, and range between .84 and .90 for the four core questions found on every form. Inter-teacher reliability coefficients represent the level of consistency in the ratings of one teacher relative to another. For teachers who have been rated on five or more courses, the coefficient has been estimated to be .88.F. Review of Policies and Procedures Related to Implementation of the UTK Student Survey System
Several significant changes were made in both the forms and the reporting of results in 1995. With respect to forms, seven additional standard questions (23-29) were added to all forms. Five questions asked students to compare on a scale of "much higher to much lower" this course with "other college courses you have taken" with respect to expected grade, intellectual challenge, student effort, effort required to succeed, and student involvement. The other two questions ask for information on how many hours per week the student spent on the course and how many of those hours were "valuable in advancing your education." A second change with respect to forms was to include three new form options. Form I is designed for correspondence/distance learning courses. Form J is designed for rotation or studio courses in the arts. Form X rephrases questions 5-15 so that students use a scale of "always to never" to describe the frequency of certain behaviors on the part of instructors (for example, "the instructor gave very clear explanations"). Form X also rephrases questions 16-22 so that students describe progress toward general learning outcomes on a scale of "great to none." Dr. Gillmore indicated on the telephone that Form X is being used in approximately 8% of all courses. (See Appendix for copies of the current IAS Forms A-J and X.)
The reporting of results was also modified in 1995 in some important ways. First, comparison of individual course ratings with departmental, college, and university norms was eliminated, as were percentile rankings. According to Dr. Gillmore, this was done because of concern about inappropriate interpretation of small differences in ratings (especially across courses with widely varying characteristics) as well as the interpretation from percentile rankings that half of all courses must be "below average." University norms by form, class size, course level, and disciplinary type are available for comparison on the IAS web site (www.washington.edu/oea/iasnorms). Another modification was a shift to reporting median rather than mean ratings. This was done to reduce the influence of skewed rating distributions (due to "outliers"). However, Dr. Gillmore indicated he would probably not recommend this change if he had it to do over again, because of the very high correlation between means and medians, and the difficulty of explaining the concept and calculation of a median for this type of data.
Two other modifications to the results reports were made in 1998. One is to report an adjusted median for the four core questions. This adjustment is based on the research by Greenwald and Gillmore discussed in the previous section, and is related to the three following course characteristics: expected grade relative to other courses taken, class size, and the proportion of students taking the class in their major/minor or as an elective. Actual median ratings are adjusted upward to the extent that expected grades are lower than the university average, class size is higher than the university average, and the proportion described above is lower than the university average. Actual medians are adjusted downward to the extent the opposite is true with respect to these three course characteristics. Further explanation and illustrations of this adjustment process are available on the IAS web site (www.washington.edu/oea/iasadjst).
A second modification in 1998 was to compute a relative rank for questions 5 through 22 on each form. Individual item ratings are standardized relative to university averages and then ranked, giving instructors an indication of their relative strengths and weaknesses. This ranking does not imply anything about the actual level of each item rating compared to the actual university average.
Two other items of significance were discovered in our review. The first is that the University of Washington enhanced IAS capabilities for distance teaming courses in 1999 by implementing an on-line version of IAS. The second is that 30 universities, various types of institutions from all around the country, are currently contracting with the University of Washington to provide both forms and data processing for their systems. An unknown number of other universities have adapted the IAS with minor modifications as we at UTK have done.In December 1997 the CTEP Coordinator submitted a review of the student-survey component to the Faculty Senate Executive Committee. The review was based on input from staff in the Office of Institutional Research & Assessment as well as issues and suggestions submitted from members of the campus community. Four specific recommendations were made by the CTEP Coordinator in the review report:
The Teaching Council's Committee on Student Evaluation of Teaching has been meeting regularly since Fall 1999, considering these four recommendations and other issues and suggestions noted in the review report. The Committee has also considered further input from the CTEP Coordinator and campus community over this academic year regarding implementation policies and procedures. Three specific matters have been considered at some length. One is the need for an effective system for dissemination of information to insure that all teachers understand both the options they have in customizing student surveys and policies regarding implementation procedures. A second is the need for clarification of the appropriate channel(s) and procedures for dealing with questions or complaints about violations of policies regarding implementation procedures. A third is the pressure the CTEP Coordinator has faced from several instructors to produce a result report for courses in which less than five students completed surveys, which would violate the policy recommended by the earlier CTEP Task Force at the time the current system was adopted in 1995. After careful consideration, the Committee brought forward recommendations on a number of these matters to the full Teaching Council.
- Reduce the number of scan forms from eight to five by eliminating Forms D, F, and G.
- Establish a designated time period when the evaluations are to be administered, perhaps no earlier than the last week the class is taught.
- Change the report format so that a minimum of five means must exist before a percentile ranking is calculated.
- Establish a policy that addresses the evaluation of "distance education" courses.
Considering all of the input and evidence collected in this review process, the Teaching Council has come to the conclusion that there is no reason to change our fundamental approach to soliciting student opinions regarding instruction at UTK. The fundamental nature of the current system, with standard core questions, alternative forms for different types of classes, and the option of adding specific or open-ended questions, remains highly consistent with the two basic reasons for evaluation of teaching: 1) to provide information for sound personnel decisions and 2) to assist faculty members in self-improvement. The current system also generates information of value to students in choosing courses. Another advantage of a standardized, university-wide system is the cost efficiency gained in reproduction of materials and in the administrative tasks required with any system.IV. Specific Recommendations
At the same time, the system as currently structured and implemented does have weaknesses and characteristics that can be fairly criticized. While student ratings can be a useful source of information to be considered in evaluation of teaching effectiveness, they must be interpreted in light of course characteristics and should be used in conjunction with self- and peer assessments to place teachers only in broad performance categories. Explicit comparison of mean ratings and percentile rankings across courses can foster inappropriate conclusions and have a demoralizing impact on many faculty. Collection of additional information regarding student input/responsibility, expectations, and experience relative to other courses taken is needed. Care also needs to be taken to insure that the system is implemented with consistency and integrity. This will require that instructors, administrators, and students be fully aware of their particular roles, options, and responsibilities in the system. The Teaching Council believes the modifications outlined in the specific recommendations below will enhance the validity and usefulness of information collected in student surveys, and should address to a great extent the issues and concerns that have been raised.
A. Name(1) Change the name of our system from the Campus Teaching Evaluation Program to the Student Assessment of Instruction System.B. Forms(2) Replace the current "CTEP" forms with the most recent set of IAS (University of Washington) forms A-J and X adapted as appropriate for our campus. Options would remain for addition of specific questions by departments or instructors and use of the open-ended comment sheet.C. Result Reports(3) Develop an on-line database to allow comparison of individual course means with university-wide means for courses with similar basic characteristics, such as class size, course level, disciplinary type, and reason for taking course, and for courses with similar instructor classification, such as GTAs.D. Implementation/Administration
(4) Remove percentile rankings and comparisons of individual course means with overall department, college and university means from the result reports.(For rationale, please see Section II.B., especially the final two paragraphs; Section II.D., especially the final paragraph; and Section II.E., especially the third paragraph.)(5) Develop and distribute written material to administrators and personnel committees providing guidance and caution in interpretation of student ratings.
(6) Develop an informational and educational effort to insure that instructors understand all survey options and policies/procedures for survey administration.
(7) Clarify appropriate channels and procedures for receiving and dealing with questions or complaints about improper administration of student surveys.
(8) Enhance the student survey system for distance learning courses by developing on-line capabilities.
(9) Extend the student survey system to cover courses taught during the mini-term and summer terms.