Current methods of assessing substance use: A review of strengths, problems, and developments
Journal of Drug Issues
|Authors:||Patrick B Johnson|
|Subject Terms:||Drug abuse
This article discusses various means of assessing and measuring substance use behaviors and describes the relative advantages and disadvantages of each of the measurement tools. Self-report instruments are the most convenient and widely used forms of substance use assessment.Copyright Journal of Drug Issues Fall 2001
This article discusses various means of assessing and measuring substance use behaviors and describes the relative advantages and disadvantages of each of the measurement tools. Self-report instruments are the most convenient and widely used forms of substance use assessment. Self-report measures can be obtained through various modes of administration, including self-administration via paperand-pencil questionnaires, computer assisted self-interviews or interactive voice recording, and through personal (interviewer-administered) interviews. The advantages and disadvantages of each of these modes of administration are discussed. Alternative assessment techniques, such as biological measurements, are also frequently used to measure substance use or to validate self-report measures of substance use. This article reviews the various available methods for validating self-report measures, highlighting self-report and biological testing techniques currently in use. It concludes ! by suggesting future avenues of research for improving upon current substance use measurement techniques.
The standard approach for assessing adolescent substance use attitudes, beliefs, expectancies, and behaviors is via self-report questionnaires (SRQ). The primary national studies that collect information on substance use prevalence and trends, including the Monitoring the Future survey (MTF), the National Household Survey on Drug Abuse (NHSDA), the Youth Risk Behavior Survey (YRBS), and the National Longitudinal Survey of Youth (NLSY), all employ the SRQ format to collect their data.
In this article, we will discuss several of the advantages and disadvantages of using self-report instruments in substance abuse research and present possible alternative assessment techniques that do not rely on typical self-report measures. In addition, we will discuss the advantages and disadvantages of two different modes of administering self-report assessment tools, including self-administered "paper and pencil" measures and self-administered computer measures. Finally, we will provide a brief overview of alternative methods of validating self-report measures, highlighting the various biological-testing procedures currently in use. It is important to note that this article is not meant to provide either an exhaustive review or an in-depth critique of the full range of assessment and drug-testing technologies. Rather, it is the authors' intent to present a general overview of the most common current practices and highlight the need for further advancement in the area! of substance use assessment and measurement.
SELF-REPORT MEASUREMENT: ADVANTAGES AND DISADVANTAGES ADVANTAGES
The major advantages of self report measures are that they are relatively easy to administer to large samples, they can be administered simultaneously in several different locations, the responses are easily quantifiable and thus analyzable, and they offer the researcher the ability to question respondents on many different areas of interest. Self-report instruments are also relatively inexpensive to produce and administer and they can be administered in several different ways, including in person or over the telephone by an interviewer, via mail, or via the Internet (Patrick et al., 1994). Furthermore, self-report measures allow the respondent to choose to skip items that he or she does not wish to answer, helping to ensure that the data collection is ethical with respect to protecting respondents' choice as to what information they wish to reveal to the researcher.
Despite these advantages, there are a number of disadvantages to self-report measures, particularly with regard to their validity and reliability.
Demand characteristics. By their nature, self-report measures allow respondents to strategically alter their true responses to suit their particular self-presentation motives. Under most circumstances, respondents wish to present themselves in a socially desirable way and, therefore, might alter their true responses to appear more "normal" or acceptable to the researcher (Victorin, Haag-Gronlund, & Skerfving, 1998). Alternatively, under certain circumstances, the respondent may be motivated to paint a very negative picture of himself or herself, perhaps for amusement purposes or perhaps with the goal of becoming classified as someone who deserves a certain desirable treatment or intervention. In general, respondents may alter or modify their true responses to an item because of "demand characteristics," which include any aspect of the research environment or the research instrument that communicates a "demand" for the respondent to behave in a particular way (Orne, 19! 62).
Social desirability. Respondents tend to respond to self-evaluative questions in a socially approved manner to appear more socially desirable. To help avoid or minimize the problem of respondents' modifications of their answers to fit their self-presentation concerns, many questionnaires include a measure of this tendency to be concerned with making a favorable impression, also known as social desirability, in their research instruments. The scale most commonly used for this purpose is the Marlowe-Crowne Social Desirability Scale (Crowne & Marlowe, 1964). Scores from this measure can then be included in subsequent analyses to control for individual differences in motivation for optimal social presentation.
One difficulty with this approach, however, is that this type of scale is somewhat transparent so that respondents end up responding honestly to the social desirability scale while still underreporting on more sensitive behaviors, such as substance use. In addition, controlling for social desirability concerns might inadvertently also interfere with the collection of potentially useful differentiating information. That is, by controlling for individual motivations that might affect response styles, the instrument might hide or "wash out" important cultural or demographic differences in respondents' thoughts, attitudes, or beliefs about the topic under investigation. In such instances, alternatives to social desirability scales to increase confidence in self-reported substance use should be utilized.
Researcher bias. Participants in a study are not the only ones who may be responsible for possible inaccuracies in their responses to self-report measures. The people who administer the instruments can have a dramatic impact on how participants respond to the measures. Researchers who administer a questionnaire to a particular group of respondents might have certain expectations (sometimes based on social stereotypes) about that group and how it should respond to the measures (Hurtado, 1994; See & Ryan, 1998). An experimenter, researcher, or survey administrator's expectations of and behavior toward a study participant can have a significant influence on how that participant responds to items in a questionnaire (Rosenthal, 1966). This may occur via two mechanisms: (1) the researcher might unintentionally treat different groups of participants differently (e.g., might emphasize certain words when reading instructions or smile more at one group than at another) dependin! g on the group's background and characteristics, or (2) if the survey administer is recording the participants' responses, he or she might subtly interpret and then record their responses differently based on prior expectations. Researcher expectancies can be communicated to participants both through verbal and nonverbal communication, potentially influencing participants' responses to items in a study instrument (Duncan, Rosenberg, & Finkelstein, 1969; Jones & Cooper, 1971).
Even if the same researcher or survey administrator does not inadvertently transmit potentially biasing cues to participants, many times survey instruments are administered to different groups of respondents by several different people. This inevitably results in subtle (or not so subtle) variations in survey administration that can have substantial effects on how different participants respond to questionnaire items. For example, one study found that social distance between the interviewer and respondent, a variable measured by the number of shared social identities (shared demographic characteristics) that each respondent and interviewer had in common, was a significant predictor of respondents' reports of substance use behaviors (Johnson, Fendrich, Shaligram, Garcy, & Gillespie, 2000). Specifically, respondents in dyads with relatively low social distance (i.e., more shared demographic characteristics) were more likely to report drug use.
The wording of self-report items. Aside from the demand characteristics and potential researcher bias that can influence participants' responses to a survey, many survey instruments themselves are designed in such a way that they too can inadvertently influence participants' responses. More specifically, certain characteristics of the measurement tool itself can compromise the validity of selfreport measures (Victorin et al., 1998). For example, the clarity and readability of the items, the precision of the measure's response categories (i.e., dichotomous vs. continuous, frequencies, range of responses), and the time parameters of the recall period for questions about past behaviors or feelings can all contribute to the validity of a self-report instrument.
Questions have been raised about the use of preestablished scales because (a) they present the participant with the very responses they are attempting to assess (Johnson, Gurin, & Rodriguez, 1996); (b) they cannot assess the motivational significance of specific responses (Leigh, 1989); and (c) the outcomes they present may be open to widely divergent interpretations based on respondents' gender, ethnic background (Johnson & Glassman, 1999), and prior experience.
Norbert Schwarz (1996, 1999), a leading researcher in the area of self-report measurement, has written extensively on the many ways that research settings and research instruments can influence the responses of study participants. To the extent that participants' responses to questionnaire items are influenced by question wording, format, or context, the quality of the data obtained from such questionnaires is compromised. Because public policy is often driven by responses to public opinion surveys and research questionnaires, understanding the effects of questionnaire design on these responses is critical.
How the question is asked and the nature of the response alternatives are two key elements of questionnaire design that, if poorly constructed, can compromise the validity of the instrument. For example, open- and closed-question formats can result in systematic differences in responses. With an open-question format, respondents may not know what sort of answers the researcher is interested in and may provide responses that are too detailed or too broad to suit the researcher's needs. Alternatively, with a closed-question format, respondents are presented with response options that they may not have themselves freely thought of, allowing the questionnaire to drive the responses.
Another general problem with closed-format questions, particularly those that provide the respondent with response frequencies for behavioral questions, is that the frequencies provided can influence how the respondent answers. In one study illustrating this problem, respondents were asked to report how frequently they had an irritating experience (Schwarz, Strack, Muller, & Chassein, 1998). Some respondents were given a low frequency scale (e.g., ranging from "less than once a year" to "once a month") and some were given a high frequency scale (e.g., "less than once a week" to "daily") on which to respond. Subjects who responded to the low frequency scale assumed the question was referring to major annoyances and those asked to respond to the high frequency scale assumed the question referred to minor annoyances. Therefore, respondents might be reporting on very different types of irritating experiences, depending on the response frequency provided, making their resp! onses impossible to compare to one another.
Similarly, for closed-format questions that ask respondents to report their behaviors or attitudes, the values assigned to the different levels of the rating scale can convey information to the respondent regarding what is deemed a normal or average response. That is, the midpoint of the scale would appear to represent the average response and the extremes of the scale would appear to represent the extremes of the attitude or behavior frequency. Respondents tend to use these rating scales as frames of reference for estimating their own behavioral frequency responses (Schwarz, 1999). A possible result of this phenomenon is that participants in a study might interpret the question differently depending on the response alternatives provided, making it difficult to compare the responses from different studies that use different rating scales.
SELF-REPORT INSTRUMENTS IN SUBSTANCE USE RESEARCH
Other than the general concerns with self-report measures outlined above, numerous additional questions have been raised about the reliability and validity of self-report measures of substance use in particular (Bailey, Flewelling, & Rachal, 1992; Baker & Brandon, 1990; Harrison, Haaga, & Richards, 1993; Stone et al., 1999; Turner, Lessler, & Gfroerer, 1992).
ISSUES OF CONCERN IN SUBSTANCE USE MEASUREMENT
Sensitivity and specificity. Two intertwined concerns that emerge with measures of substance use behaviors, including SRQ and alternative assessment tools, are the sensitivity and specificity of the assessment method. Sensitivity refers to the assessment method's ability to identify those individuals who have actually used a particular substance during the identified time frame (i.e., to avoid false negatives). Specificity refers to the assessment method's ability to not identify as users those who have actually not used a substance during the same time frame (i.e., to avoid false positives). The strength and accuracy of a particular assessment method is inextricably linked to that method's level of sensitivity and specificity.
The wording of substance use self-report items. There is some debate as to the preferred sequencing of self-report measures of substance use. Some researchers believe that the order of the items should progress from the more socially accepted behaviors to the more disapproved of behaviors in order to develop the participants' trust. Others believe that the potentially threatening questions about participants' substance-use behaviors should be embedded within other less threatening questions in the instrument to limit the extent to which they are seen as important or the focus of the study, thereby minimizing socially-desirable response patterns (Meyers et al., 1999).
Using different assessment tools to measure the same substance use variables. Another potential problem with substance use-related self-report measures is that it is often difficult to compare responses on one such instrument with responses on another. That is, there are many different types of substance use measures that are administered to people in different settings or at different points in time. For example, adolescent substance use is assessed among adolescents inhabiting diverse social service systems and institutions, including education, mental health, juvenile justice, and child welfare (Meyers et al., 1999). Comparisons in rates or prevalence of substance use are difficult to make between adolescents in these different institutional systems. Similarly, comparisons within the same adolescent at different points in time are difficult to make as well if the instrumentation used in these various settings or times are not uniform. Thus, there is a great need for as! sessment tools that are not oriented toward a particular system or setting, but rather that can serve as measurement instruments across time and place.
Underestimates of sensitive behaviors. In comparison to the accuracy of selfreport measures of most human behaviors, the accuracy of self-report responses of "sensitive" behaviors is more uncertain. According to Tourangeau and Smith (1996, p. 276), "A question is sensitive if it raises concerns about disapproval or other consequences (such as legal sanctions) for reporting truthfully or if the question itself is seen as an invasion of privacy." In response to the AIDS epidemic and its related risk behaviors, such as drug use and specific sexual practices, national and local surveys of human behavior have increasingly asked sensitive questions of their respondents. Unfortunately, merely asking these questions cannot insure the accuracy of the estimates they generate and many have argued that the estimates grossly underestimate various sensitive behaviors, including adolescent substance use (Tourangeau & Smith, 1999; Turner et al., 1998).
A main reason for underreporting and the associated underestimates of sensitive behaviors are participants' failure to respond either to the survey in general or to specific items within the survey. By either refusing to participate in the survey or declining to answer specific items, those who are the most likely to have engaged in sensitive behaviors might be the least likely to respond to surveys or their sensitive questions.
That drug users are concerned about interviewers' reactions to their substance use behaviors is, of course, only an assumption. Research by Willis (1997), however, did support this position. When specifically queried regarding their feelings about interviewer reactions to the reporting of drug use, subjects at a drug treatment facility reportedly preferred not to talk openly about drug use.
This assumption is questioned in research reported by Wish, Hoffman, and Nemes (1997) suggesting that underreporting might be less of a problem for heavy than for light substance users. More specifically, these researchers reported, "individuals with the greatest drug abuse problems may be most likely to admit their problem in a research or clinical interview" (. 221).
In any event, it would seem that more attention should be given to determining which types of individuals are more or less likely to underreport their substance use behaviors. This is essential to do in a field such as substance abuse where systematic distortion of the primary dependent variable could create a wholly inaccurate picture of the underlying reality of the behavior being investigated.
It is also important to consider the possibility that although researchers generally assume that the systematic distortion of self-reported substance use will be characterized by systematic underreporting, for selected groups including adolescent males and both male and female college students, the distortion could also be characterized by systematic overreporting. Interestingly, Johnston and O'Malley (1997) reported that when adolescents revise their estimates of previous drug use, their subsequent estimates are frequently less intense. They then suggest that, "the 'revised' may well be the more accurate ones, and the answers given at earlier ages.. may be inflated" (p. 78).
Thus, the underestimation of sensitive behavior may be due both to errors in reporting and to conscious attempts to provide inaccurate information, particularly for embarrassing or socially undesirable behaviors (Bradburn, 1983).
MODES OF ADMINISTRATION OF SELF-REPORT MEASURES OF SUBSTANCE USE
How a self-report instrument is administered can have important implications for the reliability of participants' responses. One study compared self-report responses of substance use that were obtained from mail questionnaires to those obtained from personal interviews (Bongers & Van Oers, 1996). The researchers predicted that self-administered questionnaires, which are more anonymous by nature, would produce higher quality findings, particularly for sensitive topics like those related to substance use. In fact, the researchers found no significant differences between the two modes of data collection in self-reported substance use or substance use-related problems. Another study compared adolescents' reported substance use on paper-and-pencil questionnaires to their responses three months later using a touch-tone telephone format (Boekeloo, Schamus, Simmens, & Cheng, 1998). The findings indicated that test-retest reliability across these two modes of measurement w! as fairly poor for low frequency behaviors, such as injection drug use, and fairly good for more common substance use behaviors. Interestingly, the reliability or consistency of responses across the measurement modes was better for females and older adolescents than for males and younger adolescents.
Another difference in the mode of administration that might affect the honesty and accuracy ofparticipants' responses, particularly to sensitive questions, is whether participants completing the instrument are told that their data is anonymous versus confidential. That is, privacy is heightened when surveys are conducted anonymously because participants do not provide their names or any other identifying information. When participants' responses are confidential rather than anonymous, their names are provided; however the data are not disclosed to anyone but the researchers directly involved in the study. Researchers have recently set out to determine whether anonymous versus confidential survey procedures produce significant differences in responses, particularly to sensitive questions, such as those about drug use (O'Malley, Johnston, Bachman, & SchUlenberg, 2000). The study found that the findings on the MTF survey did not significantly differ between respondents w! ho completed the survey under conditions of anonymity and those who completed it under conditions of confidentiality. This could be an encouraging finding if one assumes that "anonymous" responses are accurate. This conclusion, however, should be drawn with caution. It would require that respondents believe that what they are told is an "anonymous" questionnaire is indeed anonymous.
Computer-assisted self-interview. In recent years, with the increasing availability of desktop and laptop computers, researchers in the health field have begun to employ the computer-assisted self-interview, or CASI, to administer their questionnaires or surveys. A recent article by Webb and associates (Webb, Zimet, Fortenberry, & Blythe, 1999) suggested that the CASI was superior to the SRQ format for the following reasons: (1) since CASI responses are entered automatically, data entry errors are less likely; (2) because only one item is presented by the computer at a time to the participant, there is less likelihood of confusion or distraction; (3) because the computer can be programmed for conditional branching or skip patterns, the participant is not required to make these response adjustments; and (4) since there is no written record, there is greater protection of confidentiality and greater conservation of resources.
In a study comparing paper and pencil survey instruments with CASI, Hallfors and colleagues (2000) did not find that the use of CASI resulted in increased reported rates of student substance use compared to those obtained from paper and pencil questionnaires. However, the use of CASI was found to be superior in terms of improving the speed of data processing and decreasing the incidence of missing data.
While research indicates that even participants with little or no computer background rate CASI as easy or very easy to use (Bock, Niaura, Fontex, & Bock, 1999), some studies have found differential responses to health- or substance userelated questions with CASI compared to SRQ. For example, in the study by Webb and associates (1999), female respondents reported greater alcohol and marijuana use with CASI than with SRQ, while the opposite reporting pattern was found for male respondents. Regardless of the source of such divergent findings, it is essential that they be replicated to determine their generalizability as well as their implications for estimating adolescent substance use and abuse and, ultimately, for determining the sources) of these behaviors.
Audio-CASI. One potentially valuable variation on the CASI approach is the use of an audio-CASI (ACASI) format in which the respondent is presented with the questions via an earphone attached to the computer. Responses are made on numbered keys and keys not involved in the response format are deactivated so that inadvertent responses are not entered as actual answers. With ACASI, questions can also be presented in written form on the screen to provide the respondents with the option of utilizing whichever format they prefer. An important advantage of ACASI is that it requires minimal literacy skills, thus enabling it to be used with individuals with little or no formal education. At the same time, since the format is standardized, each respondent is presented with the same question, in the same order, and in the exact same manner as every other respondent. In addition, because the questionnaires can be developed in multiple linguistic systems, respondents from different c! ultural and linguistic backgrounds can be presented with the same questions in a relatively confidential environment. Or, as Turner and colleagues point out, "every respondent (in a given language) hears the same question asked in exactly the same way..." (1998, p. 867).
In a report examining the development of ACASI procedures for the 1999 NHSDA, researchers found that ACASI helped enhance the privacy of the interview, such that interviewers reported being aware of respondents' answers significantly less often and respondents believed that the interviewer saw fewer of their responses with the ACASI than with the traditional SRQ (Lessler, Caspar, Penne, & Barker, 2000). Overall, this report also found that higher estimates of substance use prevalence resulted from the use of ACASI than from paper and pencil SRQs. This difference was probably due to the enhanced privacy associated with the ACASI, derived from participants' lesser need for interviewer assistance in completing it than in completing paper and pencil questionnaires. They also found that these privacy benefits were experienced by a larger portion of the sample, including FALL 2001 817 younger respondents and those with poor reading skills, who found the ACASI easier to foll! ow.
In their comparison of the ACASI approach to paper and pencil SRQ, Turner and his associates (1998) found that while the mode of presentation did not appear to impact reports of opposite-sex sexual activity, it did dramatically impact reports of same-sex activity. ACASI was associated with significantly higher self-reported rates of same-sex activity than SRQ. Importantly, the increased rates were more concordant with male adult retrospective reports of same-sex activity during adolescence, thus validating the ACASI findings to some extent. Their results also revealed fewer non-responses with ACASI and less frequent selection of the "don't know" choice than with the SRQ mode of presentation. In summarizing their findings, the researchers conclude, "this technology is reducing the underreporting bias known to affect such measurements. In addition, the technology appears to have a more pronounced effect on the reporting of behaviors that are particularly sensitive, stigmati! zed, or subject to serious legal sanctions, compared with less sensitive areas of conduct" (Turner et al., 1998, p. 871).
Earlier work by Tourangeau and Smith (1996) also found the ACASI mode of presentation to be advantageous in various ways. After highlighting the advantages of computer-assisted interviewing, including the reduction in inadvertent question skips, elimination of outside range responses, enhanced timeliness, and reduced expenditures and administrative errors, the authors indicate that ACASI "increases respondents' willingness to make potentially embarrassing admissions in surveys" (Tourangeau & Smith, 1996, p. 299). In their research, they found that ACASI increased reporting of sensitive behaviors over CASI and computer-assisted personal interviews (CAPI). In addition, ACASI appeared to sharply reduce the gender discrepancy in respondents' reports of number of sexual partners relative to the reported numbers found in previous studies that used personal interviews. Speculation as to the large divergence in the number of male and female selfreported sexual partners has ce! ntered on the likelihood that males overestimate the number of partners in personal interviews while females underestimate the number. Use of ACASI appeared to reduce the number of self-reported female partners and increase the number of self-reported male partners in comparison with CAPI.
Finally, ACASI, like CASI, can reduce inconsistencies in reported behaviors (Lessler et al., 2000). That is, the computer can be programmed to alert respondents to inconsistencies in their answers (i.e., when a respondent's answers to two different items cannot both be true). The respondent is then routed to a resolution screen where he or she is asked to verify the response.
Interactive voice recording. Interactive Voice Recording (IVR) is a telephonebased data collection methodology in which respondents call a number and are then questioned by a recorded message about various behaviors. Responses are made on the numeric keys of the telephone keypad. Studies by Perrine and associates (Perrine, Mundt, Searles, & Lester, 1995; Searles, Perrine, Mundt, & Helzer, 1995) have demonstrated adequate reliability and validity of self-reported alcohol use with IVR.
A recent study employed IVR to assess binge eating and alcohol consumption behaviors among females the summer before they entered college (Bardone, Krahn, Goodman, & Searles, 2000). In this study, individual IVR data was compared to data provided using a timeline follow-back (TLFB) interview, in which subjects' substance use history is determined by providing them with particular notable life events (e.g., 161 birthday, first semester of college) to assist in their recall of their substance use behaviors at those times. While alcohol use behaviors were comparable using these two data collection procedures, TLFB produced significantly less reporting of binge eating behavior than IVR. One possible explanation for the discrepancy might be the relatively greater social acceptability for adolescent alcohol use compared to binge eating. That is, because of its greater response anonymity, IVR produces more reporting of behaviors to which some degree of social stigma is attac! hed, such as binge eating.
METHODS TO VALIDATE SRQ ASSESSMENTS OF SUBSTANCE USE
Because of the stigma associated with "sensitive" behaviors, researchers have frequently employed various techniques either to help validate the responses to self-reported substance use measures or in place of SRQ. We now turn to a brief overview of some of these techniques.
THE BOGUS PIPELINE
The bogus pipeline procedure (Jones & Sigall, 1971) was developed to reduce participants' response distortions by convincing them that their responses are subject to chemical or electronic validation. The bogus pipeline, an ostensible "pipeline" to people's true thoughts, uses fake electronic machinery and a set of electromyographic (EMG) electrodes that are attached to respondents' arms. Respondents are told that the equipment can record minute muscular contractions that provide a precise assessment of their true attitudes, beliefs, and feelings. In fact, the equipment does not record anything and it essentially serves as a fake "truth detector." The goal of this procedure is to reduce respondents' tendencies to distort their answers from fear of being proven a liar. Because of the extreme deception inherent in this procedure, the bogus pipeline, as originally conceived, is rarely used in current research. Furthermore, several studies have found that the use of this ! method did little to improve the veracity of self-reports beyond the effect of assuring participants' anonymity (Akers, Massey, Clarke, & Lauer, 1983; Hill, Dill, & Davenport, 1988; Murray & Perry, 1987). However, this method does relate to the use of other forms of biological assessments (to be described below) that serve to validate self-reports and thereby motivate respondents to provide responses that are more in line with their actual thoughts and behaviors.
BIOLOGICAL ASSESSMENT TECHNIQUES
Biological measures of substance use can be used either as an alternative or as an adjunct to self-report measures of use. However, according to Magura, Laudet, and Goldberger (1999), before the 1990's it was relatively rare to see biological testing, despite the availability of sensitive immunoassay methods for urinalysis. In fact, before 1985, only 11 published studies were found by these authors validating self-reports with drug testing. In these studies, roughly 49% of participants with positive drug tests did not self-report use.
With few exceptions (e.g., Elman et al., 2000), an examination of studies over the next 10 years confirmed the widespread discordance between self-report identified and biologically identified substance users (e.g., DeJong & Wish, 2000). Because high-risk youth are likely to drop out of school or to be found in the juvenile justice system, it is likely that school-based estimates of adolescent substance use underestimate use (Dembo et al., 1999). Indeed studies that include youth entering the juvenile justice system and those that use biological assessments show substance use rates that are consistently higher than those found in national population surveys of youth, such as MTF (Mieczkowski, Newel, & Wraight, 1998; National Institute of Justice, 1997). Magura and colleagues (1999) conclude that the use of biological testing of youth entering the juvenile justice system indicates that self-reported drug use behavior does not accurately reflect recent drug taking. ! Not surprisingly, their research indicated that self-report measures under-represented cocaine but not marijuana use.
As noted earlier, a number of researchers have suggested that self-report assessments are particularly vulnerable to distortion with less socially acceptable behaviors. In this regard, self-report measures may be less likely to reveal cocaine use than either alcohol or marijuana use because of the greater acceptance of using the latter substances among adolescents (Measham, Parker, & Aldridge, 1998). This echoes the earlier finding regarding adolescent females' greater willingness to talk about their substance use behaviors than their eating disorder behaviors (Bardone et al., 2000).
In addition to providing an inaccurate estimate of adolescent substance use, the underreporting of drug use also can affect how study data is interpreted. For example, Kim and colleagues (2000) found that urinalysis results revealed significantly higher rates of marijuana use in boys than in girls. However, gender differences in selfreported measures of marijuana use prevalence were consistently smaller than that suggested by the urine test results. This difference in findings based on the assessment tool used appears to have been due to girls' greater willingness to selfreport marijuana use. Thus, in the self-report data, boys were underreporting past marijuana use making it appear as though their prevalence rates did not differ much from those of girls. Similarly, another study found that relative to females, males were more likely to systematically underreport drinking behavior that occurred prior to an alcohol-related injury requiring hospitalization (although they we! re generally more accurate than females) (Sommers et al., 2000).
Yet another example of how the underreporting of drug use can affect how study data is interpreted can be found in the area of drug use and criminal behavior. Previous studies have found a positive relationship between levels of drug use and levels of criminal activity (Rosenfeld & Decker, 1999). However, Magura and colleagues (1999) found that those subjects who failed to report detected drug use also were less likely to report criminal behavior in the prior month. This suggests that the observed association between drug use and criminal activity in the sample may be spurious. Indeed, there was no significant relationship between urinalysis results and criminal activity. As discussed previously, the implications of such systematic distortion for the field of substance abuse should not be minimized. In a related vein, Wislar and Fendrich (2000) compared self-reports of marijuana and cocaine use with the number of self-reported recent sexual partners in a sample of juv! enile detainees. Urinalysis was used to validate self-reports of drug use. The findings indicated that, for cocaine use, males who overreported drug use (relative to the urinalysis results) also reported more sexual partners than either underreporters or accurate reporters. Wislar and Fendrich (2000, p. 86) explain this finding by suggesting that "respondents who are willing to overreport one stigmatized behavior (cocaine use) may be more willing to report other stigmatized or risky behaviors (multiple sex partners)." This finding suggests that an apparent positive relationship between drug use and number of sexual partners might actually be due, at least in part, to the tendency of certain individuals to overreport both behaviors.
Despite the reduction in underreporting often associated with biological measures, there are still significant problems associated with biological testing for substance use. A significant dilemma in this regard is the ethical concern associated with the invasion of individual privacy. Additional concerns associated with biological tests and their interpretation include the following: (1) most only detect recent use; (2) most can only detect the occurrence of use but not the severity or chronicity of use; (3) false negative findings can result from the deliberate tampering with samples unless strict collection procedures are implemented; and (4) false positive findings can be obtained unless tests are confirmed by expensive analytic methods (e.g., gas chromatography-mass spectrometry). We now provide a brief overview of the various biological assessment methods.
Urinalysis. Let us elaborate briefly on the problems associated with the use of urine drug screens. According to Jaffe (1998), a negative screen need not indicate an absence of drug use for a number of reasons. First, most drugs can only be identified in the urine within 24 to 36 hours after use, with the exception of marijuana for which traces can be identified 2 to 3 weeks after use. Second, negative results can also occur following use in conjunction with excessive fluid intake. "The short window of detection by urinalysis, particularly for opiates and cocaine, is a problem for drug-abuse research. The researcher's objective usually is to identify both daily/ near daily users and intermittent (e.g., weekend or binge) users. Urinalysis will detect the former but often not the latter" (Jaffe, 1998, p. 68).
Hair analysis. The advantages of hair analysis include a wider window of time for detecting previous substance use from one to three months, relative ease of collection without embarrassment, and little or no opportunity for tampering. Unfortunately, hair analysis might fail to detect drug use during the week prior to the test.
Although it is relatively easy to take a hair sample, the ease of measurement neither insures willingness or sample availability. Some males possess little or no hair to sample and some, especially African American males, have closely cropped hair making it impossible to obtain a large enough sample for analyses. In addition, comparisons of hair and urinalysis results reveal that urinalysis is a more sensitive measure of substance use than hair analysis for individuals with short hair or shaved hair (Dembo et al., 1999).
Dembo and colleagues (1999) provide another example of how biological results can produce substantially different conclusions compared to the results obtained from self-reported measures of substance use. In their study, self-reported cocaine use was not associated with any of the following: pre-incarceration arrest history, rearrest since release, education since release, and employment since release. In contrast, the hair analyses findings showed what would be expected. That is, hair analyses revealed that those participants who were positive for cocaine use had more arrests before incarceration, more re-arrests since release, and less education and employment.
Saliva analysis. At the present time, saliva analysis is more frequently used to validate self-reported tobacco use than the use of any other substance. Not surprisingly, many study participants, especially females, are unable or unwilling to provide sufficient amounts of saliva for measurement. This is often the case even after attempts are made to stimulate the salivary glands. One reason for this may be that women tend to be particularly averse to expectoration. The use of saliva tests can be useful when, similar to urinalysis, there is a relatively restricted window of detection but when it is not possible to obtain a urine sample.
With respect to biological tests for cotinine levels to determine nicotine use, Patrick and his associates (1994) have pointed out a number of difficulties. For example, cotinine can be elevated in users of snuff and chewing tobacco who do not also smoke cigarettes. In addition, when biochemical tests are repeated, the results may be different even when smoking status has not changed. Therefore, these measures can only be used to validate recent self-reported smoking.
SELF-REPORTS COMPARED TO BIOLOGICAL MEASURES OF SUBSTANCE USE
In their meta-analyses, Patrick and colleagues (1994) compared the validity of self-reported tobacco use to biochemical measures of tobacco use in 30 smoking studies and found that self-reported measures of use possessed generally high levels of sensitivity and specificity. They also found that student self-reports had lower sensitivity than the self-reports of the general population (students may have been more likely to deny smoking - probably because it's illegal for minors to smoke). Furthermore, the findings indicated that biochemical validation might be more important in intervention studies, in studies with student populations, and in studies using self-administered rather than interviewer-administered questionnaires.
Harrison and Hughes (1993, p. 3) point out that "validation studies conducted before the mid-1980's involving known samples of drug users or urinalysis techniques suggested that drug use was fairly accurately reported in self-report surveys. However, more recent validation studies conducted with criminal justice and former treatment clients using improved urinalysis techniques and hair analysis demonstrate that self-report methods miss a lot of recent drug use."
These conclusions are echoed by authors in a NIDA monograph that addressed the validity of self-reported measures of substance use: "The findings reported here contribute to those of other studies that have questioned the validity of selfreports of drug use..." (Wish, Hoffman, & Nemes, 1997, p. 222); and "The findings cast doubt on the validity of self-reports as means of estimating drug use prevalence and suggest the need for multiple assessment methods" (Cook, Bernstein, & Andrews, 1997, p. 247).
While many may assume that, because of their apparent divergence from selfreported substance use, biological assessment is more valid and that researchers should place increasing emphasis on them, a number of problems have been identified with the various biological approaches to drug testing. "Biological testing is far from perfect. More needs to be learned about both hair analysis and saliva analysis including such issues as appropriate cutoff concentrations, conditions influencing sensitivity and specificity, dose/assay relationships and the interpretation of quantitative results, and (for hair analysis) the effects of external contamination, possible deposition by sweat, cosmetic treatments, and hair pigmentation and structure on analytical results" (Magura, Laudet, & Goldberger, 1999, p. 231).
A PROPOSED ALTERNATIVE TO SELF-REPORTS FOR ASSESSING SUBSTANCE USE
One avenue of research that is currently underway by the authors is aimed at developing an assessment tool that would serve as an alternative or companion to self-report measures of substance use and substance use-related attitudes. This technique draws on a methodology developed by researchers in the field of social psychology for examining people's "sensitive" attitudes, for example those related to prejudicial or biased thoughts (e.g., Bargh & Chartrand, 1999). The premise underlying this type of procedure is that self-report measures tend to elicit conscious and often strategically biased thoughts that intervene between response activation (i.e., when the subject's initial response comes to mind) and response presentation (i.e., what the subject chooses to give as a response).
These techniques measure the automaticity of attitude activation, or the extent to which attitudes regarding an attitude object are automatically activated upon the presentation of cues related to the attitude object. In experimental settings, attitudes typically are automatically activated via priming techniques, whereby experimental manipulations passively (without a willful act on the part of the participant) elicit accessible cognitions. The activation of an attitude is assumed automatic to the extent that it is relatively effortless and not mediated by active attention or conscious thought (Eagly & Chaiken, 1993).
Thus, in the classic experiments using this technique (Fazio, Sanbonmatsu, Powell, & Kardes, 1986), participants are briefly presented with primes of attitude objects on a computer display, followed after a short interval by an adjective, which participants are asked to judge as either "good" or "bad" in meaning. The outcome measure in these studies is the response latency in judging the evaluative valence of the adjectives. The logic of this procedure is that the extent to which the attitude object automatically activates the presented adjective would be reflected in the speed with which participants judge the adjective to be "good" or "bad." If the adjective is perceived to be of the same valence as the attitude object prime, evaluative judgments of the adjective should be faster. Conversely, if the adjective is perceived to be of the opposite valence as the attitude object prime, evaluative judgments should be slower.
In relation to substance use attitude assessment, for example, participants can be primed with a visual representation of an alcohol-related attitude object (e.g., a bottle of beer) and then presented with a positive or negative adjective (e.g., fun, sad), which they must judge to be good or bad. For participants who have a positive attitude toward alcohol, evaluative judgment response latencies should be faster if the presented adjective is "fun" (i.e., consistent with the attitude toward the attitude object) and responses should be slower if the presented adjective is "sad" (i.e., inconsistent with the attitude toward the attitude object). This assessment of attitudes would occur outside of the participant's conscious awareness.
The need to avoid respondents' conscious manipulations of their automatic responses is particularly acute in the area of substance use for which there is societal disapproval and thus a greater incentive for respondents to present themselves in a positive light. Thus, the use of alternative methods to self-reports, such as the one currently being developed, can bypass or reduce the likelihood that subjects will consciously manipulate their responses to measures of substance use. Such methods can thereby allow for a measurement that is more closely aligned with subjects' true thoughts and attitudes than that which researchers can receive from more traditional self-report measures. Methods such as these are creative, non-invasive, and provide easily quantifiable data for analyses. They also have the potential to be particularly useful for substance abuse researchers who focus on children. Children are often less able and less willing than adults to accurately express and be! forthcoming about their true thoughts or behaviors with regard to "sensitive" topics such as drug use.
In the field of substance abuse research, this type of methodology can be applied both to the assessment of individuals' substance-related attitudes, beliefs and expectancies, as well as to the assessment of substance use behaviors. By bypassing subjects' deliberate responses that, using traditional self-report assessment techniques, often produce less than accurate measures of subjects' true responses, measuring subjects' automatic (or non-manipulated) responses to questions of substance use could go a long way toward increasing the reliability of substance use-related measures.
SUMMARY AND CONCLUSIONS
This review of the literature examining the various methods of assessing substance use behaviors and their associated advantages and disadvantages has highlighted several important conclusions.
First, self-report instruments, although widely used and convenient, have several potential disadvantages that must be taken into account when designing an instrument and when analyzing the results of a study. Self-report measures are subject to various forms of bias that can dramatically influence the validity and reliability of the study findings. Computer-assisted self-interviews (CASI), particularly audio CASI, can help reduce some of these biases; however, the validity of data obtained through these means are still somewhat vulnerable to participants' self-presentation concerns. Biological techniques for assessing substance use are able to bypass some of the concerns regarding participants' conscious distortions of their responses, yet these techniques too have some disadvantages. Biological assessment techniques are often inferior to self-report assessments with regard to their sensitivity and specificity of measurement. Furthermore, many biological assessment techn! iques are invasive and aversive to study participants.
This review has presented the various advantages and disadvantages of different techniques for assessing substance use in research. It also provides some general guidelines for the circumstances under which one technique might be preferable to another. Regardless of the technique used, it is imperative for researchers to be aware of the potential biases and threats to validity inherent in each method of assessing substance use.
Despite their important role in informing policy decisions and intervention strategies, the currently used assessment tools often provide imperfect or inaccurate measures of substance-related attitudes and behaviors. Therefore, the development and application of new methods for assessing substance use attitudes and behaviors should be a very high priority for substance abuse researchers.
The development of alternative measurements that combine the advantages of self-report and biological assessment techniques, while avoiding the disadvantages of these techniques would help further the field of substance use research. Accurate, valid, and reliable measures of substance use are absolutely necessary for identifying those individuals who would benefit most from prevention, intervention, and treatment efforts. Such efforts cannot be effectively implemented if the scope and severity of substance use is not accurately understood. Future research on the development of alternative assessment techniques and on the improvement of current techniques is, therefore, sorely needed in order for the field of substance use research to progress and continue to make significant theoretical and practical contributions to public policy and public health endeavors.
LINDA RICHTER, PATRICK B. JOHNSON
Linda Richter, Ph.D., is a senior research associate in the Policy Research and Analysis division of The National Center on Addiction and Substance Abuse at Columbia University (CASA). Patrick B. Johnson, Ph.D., is the deputy director of the Medical Research and Practice Policy division of CASA. Address correspondence to: Patrick B. Johnson, The National Center on Addiction and Substance Abuse at Columbia University (CASA), 633 Third Avenue, 19th floor, New York, N.Y. 10017-6706. Email: firstname.lastname@example.org