U.S. Department of Justice
Office of Justice Programs
National Institute of Justice


 Evaluating Drug Control and
System Improvement Projects

Guidelines for Projects Supported by

the Bureau of Justice Assistance


August 1989


October 1992

Developed by the National Institute of Justice, in cooperation with the Bureau of Justice Assistance, U.S. Department of Justice, by Abt Associates Inc., under contract #OJP-86-C-002.


In the course of developing these guidelines, the authors and the National Institute of Justice consulted with officials of the Bureau of Justice Assistance as well as with members of an advisory committee appointed for the project.

These advisers included:

Alfred Blumstein, Ph.D., Dean and J. Erik Jonsson Professor of Urban Systems and Operations Research, School of Urban and Public Affairs, Carnegie Mellon University.

Douglass Lipton, Ph.D., Director of Research, Narcotic and Drug Research, Narcotic and Drug Research, Inc.

Jay Malcan, Evaluation Specialist, Virginia Department of Criminal Justice Services

Joseph Marshall, Programs Coordinator, State of Virginia Department of Criminal Justice Services.

Lloyd E. Ohlin, Ph.D., Roscoe Pound Professor of Criminology, Harvard Law School, retired.

Thomas Quinn, Executive Director, State of Delaware Criminal Justice Council

The authors, and the National Institute of Justice, are grateful to the advisers and Bureau of Justice Assistance officials for their assistance.

Douglas C. McDonald

Christine Smith

Abt Associates Inc.

Cambridge, Mass.

Table of Contents


   The Uses of Evaluations
1 Developing a Strategy for Evaluation
    A Continuum of Evaluative Activities
    Determining Whether to Evaluate at All
    Focusing the Evaluation: Which Question to Ask?
        Projects as Input-Output Processes
        Questions About Implementation, Results, and Outcomes
    Matching Evaluations to Types of Projects
2 Obtaining Information for Evaluations
    Use Existing Data or Collect New Information?
    Collecting New Data
        Direct Observation
        Collecting Data from Administrative Records
        Adapting Administrative Recording-Keeping Procedures for Evaluations
    Confidentiality of Data
    Protection of Human Subjects
3 Methods of Analyzing Data
    Case Studies
    Quantitative Descriptions
    Before/After Studies
    Time-Series Analyses
    Experimental Designs
    Quasi-Experimental Studies
4 Conducting Evaluations In-house or Under Contract
5 Using Evaluation Findings for Decision Making
    Increasing Relevance for Decision Makers
    Explicating the Advantages and Disadvantages of Different Options
    Communicating the Findings in Understandable Terms
    Writing for the Intended Audience
    Meshing the Evaluation's Schedule With Time Frames for Decision Making
    Preparing Clients for Challenging Findings


A broad array of drug control projects are conducted throughout the country with funds provided under the Anti-Drug Act of 1988. Through these efforts the Federal government joins with State and local officials to fight drugs and crime in our society.

As the research and development agency of the Department of Justice, the National Institute of Justice carries out a comprehensive evaluation program to find out which innovations will enable police, prosecutors, judges, corrections officials, and State and local officials to stem the tide of drug trafficking, drug abuse, and violent crime. NIJ has evaluated programs such as police crackdowns, community policing, new court management practices, intermediate punishments aimed at both persistent and casual drug users, and other programs to reduce the impact of drugs and drug-related crime. The results of these studies are reported in NIJ Evaluation Bulletins and in the Institute's Annual Evaluation Report on Drugs and Crime.

Under the Anti-Drug Abuse Act, NIJ developed guidelines in cooperation with the Bureau of Justice Assistance "to assist State and local units of government to conduct program evaluations as required, "as mandated in the Act. The National Institute is pleased to reprint those guidelines in this document.

The Guidelines that were developed are flexible instructions rather than rigid rules. They encourage agencies to formulate strategies to focus evaluation resources and consider a range of evaluation activities. Most important, they encourage agencies to develop information that will help guide their investment of drug control resources in the future.

Charles B. DeWitt
National Institute of Justice


The Anti-Drug Abuse Act of 1988 (Pub.L. 100-690), signed into law on November 18, 1988, established grant programs for the purpose of funding drug control and justice system improvement projects at the state and local levels. Title VI, Subtitle C of the Act establishes the Drug Control and System Improvement Grant Program, which provides " formula" grants to state governments. The purpose of the formula grant program is " to assist States and units of local government in carrying out specific programs which offer a high probability of improving the functions of the criminal justice system, with special emphasis on a nationwide and multilevel drug control strategy by developing programs and projects to assist multi-jurisdictional and multi-State organizations in the drug-control problem and to support national drug control priorities." Title 6 also establishes authority for discretionary grants to public and private agencies or organizations, issued directly by the Director of the Bureau of Justice Assistance, to further any of four purposes enumerated in the Act.

The Act also established requirements that the activities funded by these grants be evaluated. With respect to formula grants, it mandates that "Each program funded ...shall contain an evaluation component, developed pursuant to guidelines established by the National Institute of Justice, in consultant with the Bureau of Justice Assistance." In the Program Guidance announcing the formula grant program, the Bureau adds that the "purpose of evaluating each program is to assess how well it has been implemented and to assess the extent to which the activities funded have achieved the program's goals. Such assessments should be designed to provide administrators and policy makers with an improved understanding of whether specific activities accomplish their desired results of furthering the state strategy." The Bureau's Program Guidance also announces that states may choose form among many different assessment methods in designing their evaluation efforts, and it encourages states to undertake "intensive evaluations" where appropriate. (Bureau of Justice Assistance, 1988; emphasis added.)

Not all projects or programs must be evaluated. The Act permits the Director of the Bureau of Justice Assistance to waive this requirement "when in the opinion of the Director - (1) the program is not of sufficient size to justify a full evaluation report; or (2) the program is designed primarily to provide material resources and supplies, such as laboratory equipment, that would not justify a full evaluation report."

With respect to projects funded under the discretionary grant provisions of the Act, the Act establishes that applicants must "describe the method to be used to evaluate the program or project in order to determine its impact and effectiveness in achieving its stated goals; and ...[must] conduct such evaluations according to the procedures and terms established by the Bureau."

The Uses of Evaluations

Although the Act establishes the necessity of evaluating federally-funded programs in order to ensure their accountability, evaluation activities provide other important benefits as well. State administrators and planners need feedback to determine how effectively their strategies achieve the established drug control and system improvement goals. Information about how their plans were implemented and the results of these activities can be used by planners to revise strategies to strengthen their chances of success. This information is also useful for determining whether to expand the project, to undertake similar projects in other jurisdictions, or to spend scarce resources for other purposes altogether. Similarly, project managers can use the results of evaluations to strengthen their ability to achieve their project's goals.

The federal government is interested not only in assisting law enforcement but in supporting the states to act as laboratories for innovations. Because many of the practices developed at the state and local level are new and relatively untested, evaluations are especially suited to distilling the lessons of these experiences for other jurisdictions.

Chapter 1: Developing a Strategy for Evaluation

Evaluation involves the systematic assessment of whether and to what extent projects or programs are implemented as intended and whether they achieve their intended objectives. It entails asking questions about projects or programs (or even of a larger constellation of programs that comprise a state's strategy), acquiring information, and analyzing that information. Evaluations vary, therefore, according to the types of questions posed, the methods used for acquiring information, and the types of analyses conducted.

This document provides a summary description of various approaches to evaluation, their essential attributes, and their relative strengths and weaknesses. In addition, the choices posed by the various demands of designing and managing an evaluation are identified, with some discussion of several issues relevant to administrative decisions about when and what to evaluate, and by which means.

The discussion here is focused on the evaluation of specific projects, and not higher-level entities (programs or strategies that include two or more projects). Evaluating such entities - especially their collective impacts - poses special analytic problems not addressed here.

A Continuum of Evaluative Activities

There is no single method of evaluation that is best suited to all purposes and all projects. Instead, the most appropriate method for answering questions depends upon many factors, including:

1) the type of question posed;

2) the nature of the project and its possible effects, and the constraints these features impose on the ability to answer the questions asked;

3) the availability of data needed for an evaluation;

4) how certain the decision maker (or the "client" for the evaluation) needs to be about the data and the conclusions that are produced by the study;

5) what level of resources the decision maker/client is willing to devote to getting the answer; and

6) the time that is available for the study.

Typically, choosing an evaluation strategy requires accepting trade-offs between time, cost, and confidence that one can place in the study's findings. The optimal balance is one in which the evaluation provides the most valuable analysis of project implementation or the most plausible estimates of the project's effects, is most likely to be conducted successfully, and provides the most useful results for administrative, planning, and policy purposes.

Determining Whether to Evaluate at All

Before trying to determine which kind of evaluation approach best suits both the needs of important stakeholders and the nature of the project, a threshold decision first has to be made: whether to evaluate the project at all. States that fund a number of different projects in several program areas will find it difficult to evaluate every project. Rather than attempting to do so, administrators should focus their evaluation resources so that they provide the most useful information possible.

In deciding which projects to evaluate, the following questions should be considered:

How central is the project to the state's strategy?

How costly is it, relative to others?

Are the project's objectives such that progress towards meeting them is difficult to estimate accurately with existing monitoring procedures?

How much knowledge exists about the effectiveness of the type of project being supported? Other things being equal, where more uncertainty exists about a project's effects, the need for evaluation is greater.

Are evaluations underway elsewhere that are assessing similarly designed projects? If so, the administrator may choose to wait until the results of those evaluations are in, and to devote evaluation resources instead to projects about which less is known.

Focusing the Evaluation: Which Questions to Ask?

Because evaluations are undertaken to answer questions, framing the question - or questions - clearly is the most critical element in designing and planning evaluations. Evaluations that are not focused precisely are likely to be wasteful of scarce resources.

Projects as Input-Output Processes

To identify the types of questions that can be asked of a project, and thereby to distinguish among different types of evaluations, it is helpful to conceive of projects as input-output processes.

Inputs refer to all the resources that are devoted to the project, including (among others): money, staff and volunteers, knowledge, equipment and space, as well as such things as the networks of possible supporters and allies that staff bring to the project, and the good will of others that can be called upon. In addition, one of the key inputs are the plans and assumptions upon which the project is built.







Activities are the things people do that, taken together, constitute the project. Examples include patrolling the borders to intercept drug smugglers with experimental technology, investigating crimes and suspects, providing close surveillance of persons sentenced to intensive probation supervision, collecting and testing urine samples from persons under supervision, or responding to calls alleging spouse abuse in certain prescribed and innovative ways.

Results are the goods or services that are produced by these activities. Examples include arrests of drug dealers, convictions, investigations forwarded, seizures of illegal drugs, or provision of intensive probation supervision. As conceived here, results are the more immediate outcomes of project activities, directly flowing from those activities, as distinguished from longer-term or less immediate results that we define as "outcomes."

Outcomes refer to the ultimate end products of a project's activities. These differ from results in that they are the more long-term consequences of activities, or changes that result from those activities. Some of these outcomes may be intended and foreseen, and their occurrence can be taken to indicate that the program's goals were achieved. (Goals are distinct from outcomes, because they define the desired states, conditions, or events against which actual outcomes are assessed.) Other outcomes may be unexpected, unwanted, and unintended. Examples of intended end products of a concentrated drug intervention effort would include not only the seizure of illegal drug shipments (the results of the activities) but also the diminution of the supply of drugs on the street, reduced drug use, and lower incidence of street crimes by addicts. Unintended and undesirable outcomes might include an increase in the incidence of street crimes by drug abusers if a restriction in supply produced a rise in the price of the drug without a parallel drop in demand. In these circumstances, abusers might be compelled to steal more to purchase the same amount of drugs.

Questions About Implementation, Results, and Outcomes

Conceiving of projects as input-output processes permits one to catalogue possible types of evaluative questions. One type of question addresses aspects of a project's implementation, such as features or problems associated with the way project activities are organized or carried out. These include: How well or effectively do the project's administrators and staff combine the various resources (or inputs) they are given? Why are admissions into a drug treatment program so much lower than anticipated? Were assumptions about the target population erroneous? Were recruitment or assignment procedures ineffectively designed, or staff poorly managed? These are just a few examples of a number of problems in program administration or implementation that evaluative questions can address.

The data the evaluator examines to explain these include information about program activities and the way they are organized, in addition to the nature of the inputs (resources, planning assumptions, etc.) and their suitability to the problem the program is designed to address.

Evaluative questions might also focus on the pattern of observed project results. For example, why is the dropout rate from a drug treatment program so high? What accounts for the apparent variation among police precincts in arrest rates in an experimental patrol project? Why are the conviction rates of persons arrested following investigations by multi-jurisdictional task forces so variable (or how do they compare with conviction rates of persons investigated by non-task force officials)? To what extent was the shortened time to criminal case disposition a result of a delay-reduction program? Did the experimental alternative sentencing program actually result in offenders being sentenced to it in lieu of imprisonment? To explain these results, or to account for variations that are observed in them, the evaluator would examine elements that were expected to produce results - both the project's inputs and activities. In addition, the evaluator may want to explore whether other forces or events operating outside the project contribute to the observed results.

Finally, evaluative questions may also focus on whether and to what extent the project has achieved its ultimate goals. Project goals can include such things as reducing the incidence of crime, decreasing the prevalence of drug use and abuse, reducing the supply of illegal drugs, or rehabilitating prisoners. Outcome or impact evaluations are designed to assess whether and to what extent a project has accomplished these goals. This requires analyzing information about those elements of the project that were expected to produce these outcomes, including the inputs and the activities of the project, as well as the more immediate results of these activities. Such evaluations are among the most demanding, because these end products, or outcomes, often pose the most difficult measurement problems. Moreover, the evaluator is faced with the difficult methodological problem of attributing the outcomes to the project itself and ruling out (or at least accounting for) the effects of other forces operating independently of the project.

Matching Evaluations to Types of Projects

Not all types of evaluations are suited to all projects. The appropriateness of an evaluation approach depends in part upon the project's stage of development and "maturity." Moreover, it depends upon whether the requirements of certain types of evaluations can be met.

In the early stages of project development, the most pressing problems confronting managers generally involve project implementation rather than assessments of outcomes. Indeed, a prerequisite for achieving results, and longer-term or less immediate outcomes, is a project that operates with at least some modicum of stability and effectiveness. Translating a project plan into operational reality requires solving a number of problems that emerge in the early months of a project's existence. Projects often have yet a different set of organizational problems after the initial development activity has passed, when they confront a transition to becoming "institutionalized," transformed from a distinct and perhaps innovative "project" into a more routine agency operation. State administrators and program managers may decide in these instances that the most productive use of an evaluator is to assist them in devising effective remedies to specific difficulties in project or program implementation.

Once projects have developed some stability in their activities, evaluating outcomes or results may be feasible. Whether to focus on the less immediate outcomes or the more direct results of a project should be determined by the relative importance of each to the broader state strategy. For certain types of project, their immediate results have the most significance. For example, a project to develop "model drug control legislation" could reasonably be assessed by examining whether such bills were indeed drafted and passed, ignoring whether the new law actually led to reduced drug trafficking or abuse. Other examples include spending funds to enhance capabilities of forensic laboratories, develop automated fingerprinting identification systems, provide assistance to victims or witnesses, establish financial information sharing systems for investigating money laundering, or provide training to staff in financial investigation techniques. Evaluation questions that are likely to be posed of these programs include how many such services, or how much equipment was provided.

In other types of programs, administrators and managers may decide also that it is sufficient for their purposes to assess whether and to what extent programs achieve their immediate results, without demonstrating further that these results produce longer-range or broader crime control benefits or improvements in justice system operations. For example, a program designed to enhance the investigation of drug trafficking cases so that they are more likely to lead to convictions could reasonably be assessed according to whether conviction rates actually increased. This result is assumed to be of value in and of itself, without demonstrating that improved conviction rates of drug traffickers have a positive effect on the supply or use of drugs. Consequently, the evaluation would focus on documenting and explaining these results, without attempting to measure more downstream effects.

If evaluating the results or outcomes of a project has high priority, certain conditions must exist to support conclusions. First, as mentioned above, the operation of the program must have achieved some measure of stability and successful implementation. Some evaluations have documented that projects have not produced the effects desired, without first establishing that the project in fact operated as planned. In such a circumstance, it is difficult to determine if the successfully implemented project failed to achieve its objectives or if the project was so inadequately implemented that any real test of its underlying concept was unfeasible. Or, if program activities change rapidly - as they often do in newly established projects - the project's effects may vary widely, making it difficult to attribute particular results to particular features of the project.

Second, outcomes or results must be susceptible to measurement. For example, evaluating whether an enforcement program succeeds at deterring would-be drug traffickers from entering the illegal market imposes difficult problems of measurement.

Third, the critical features of the program that are thought to produce the desired outcomes or results must be able to be distinguished from other possible causes. Evaluating whether or not a drug treatment program for probationers reduces drug abuse and criminal recidivism is difficult if all program participants are also placed under intensive probation supervision and are subjected to random urine testing.

Faced with less-than-optimal conditions, evaluators may develop substitute measures or rely on complicated statistical methodologies to attempt to distinguish among various simultaneous causes. This may be accomplished after the fact - which is to say, after a program has been in operation for some time - but at some cost in the ability to develop supportable conclusions. To increase the odds of the project's being able to support an evaluation of its outcomes, evaluators involved in the program planning stage will be able to propose modifications in organizational operation that will better prepare the project for subsequent evaluation.

Chapter 2: Obtaining Information for Evaluations

Questions frame not only the focus of the evaluation but also direct the evaluator to the kind of information that must be collected. They do not necessarily determine precisely the kind of information required, nor the method by which it is to be obtained, however. Administrators and evaluators face choices that involve questions of suitability, expense, timing, and how much confidence the administrator needs to have in the data and the conclusions drawn from them. Generally, these choices require making trade-offs between time, money, and confidence. The precise nature of these trade-offs in each case depend upon the features of the program being evaluated as well as the questions that are asked about it. In considering how to strike the balance between resources, feasibility, and confidence, administrators should ask evaluators to specify the trade-offs that particular evaluation plans entail.

Use Existing Data or Collect New Information?

Evaluations may be able to employ information that others have collected, usually for other purposes, rather than going to the expense and energy of collecting new information. In determining whether to use existing information rather than collect new data, administrators should ask evaluators to consider, at least, the following matters:

How available are the existing data, and can they be obtained for research?

How valid are they? Do they actually measure what they intend to measure? This is often more complicated than meets the eye. For example, studies of sentencing practices have been done by collecting and then analyzing data on all persons sitting in a jail or prison system on a single day. Such a data collection strategy biases the sentenced population in favor of those with longer terms, because these persons fill a disproportionately larger number of beds. More valid data would be obtained by analyzing those entering the jail or prison over a specified period of time. (Both of these data collection methods fail to encompass all sentenced persons, obviously, because they ignore those not given incarceration sentences.)

The validity of data can be assessed also by determining whether they refer to a subset of the population or practices of interest, or whether data reflect the entire universe of that which is being examined. If they are a subset, are there any reasons to suspect that they might be unrepresentative of the larger universe? Even if the data are not drawn from subsets or samples, how complete are the data? If information is missing, is there any reason to suspect that information is excluded systematically as the result of some practice? For example, studies that rely on drawing random samples of case records in the files of prosecutors may unintentionally exclude the more difficult or serious one if these files are kept at the assigned prosecutors' desks. Or is there reason to expect that those charged with collecting or recording the data have an interest in under-or-over-reporting them?

How reliable are the available data? (Reliability refers to the dependability of the measure used to obtain the data. Data are said to be reliable if their measurement is consistent, so that upon repeated measurement they remain the same regardless of transient personal, environmental, or instrumental factors.) Were the data collected in interviews that were sometimes carried out on site, while others were not? If so, would this plausibly have a biasing effect on some responses? Were those charged with recording the information likely to be reliable, and motivated or supervised adequately so that their performance was consistent? Was the instrument used to collect the information problematic, such as a poorly-worded questionnaire that may have yielded ambiguous (and unreliable) responses? If the data are collected by means of self-reporting, how dependable are they likely to be, given the means by which they were collected?

The principal advantage of using existing data is that they are often inexpensive or even provided free to the evaluator, and they may be abundant, representing an expenditure of hundreds of thousands of dollars. The disadvantage, however, is that there may not be a perfect fit between what the evaluator is trying to measure and the purposes for which the data were collected. Moreover, it is often difficult to assess how reliable the data are, and how reliably they were entered, without conducting random checks of the information against the records used by the persons who originally recorded the information.

Collecting New Data

Where the needed data do not exist, the evaluator will have no choice but to obtain those data directly. Although it may be more costly than using already-existing data, collecting new data has distinct advantages. Greater control can be achieved over the measures that are used, as well as over the procedures and staff employed to collect the data. The reliability and validity of these data may thereby be increased.

To choose the most appropriate data collection strategy, the evaluator should assess the relative advantages and disadvantages of alternative approaches and clarify the nature of the trade-offs that are faced in the particular evaluation. The administrator's input into the decision should include some guidance regarding the level of resources that may be devoted to collecting data, the constraints on time available to collect them (for example, must administrative or policy decisions be made on a schedule that is too tight for an extended data collection effort?), and the amount of confidence that the administrator needs to have in the results of the evaluation for decision purposes. How much of a difference will it make for the administrator to have information that is more rather than less reliable and valid? Because the answer to this question bears heavily on the degree of precision demanded of the data -- and the requisite costs of getting it -- it is helpful for the administrator to set at least the upper and lower boundaries for the evaluator.

The means of collecting new data include direct observation, interviews (either in person or by telephone), surveys, drawing information from administrative records, or piggy-backing data collection instruments onto existing recording procedures.

Direct Observation

Obtaining data by on-site observation has the advantage of providing an opportunity to learn in detail how the project works, the context within which it exists, and what its various consequences are. On-site observation is especially useful for exploratory evaluations, where the primary purpose is describing the program, identifying its significant features, exploring the possible consequences of its activities, and developing hypotheses about how and why it works. It is an especially useful strategy for identifying "side effects," or consequences that are not intended.

A major disadvantage of direct observation is that data collection may lack focus, unless discipline is exercised. The reliability of data collected by observation may also be questionable, depending upon the competence and perceptiveness of the observer, and his or her relationship to those being evaluated. Generally, this data collection strategy is better suited to exploratory and descriptive evaluations, or evaluations of project implementation, than to tests of hypotheses about a project and its consequences.

If precision of measurement and high reliability is desired, evaluators may design forms for recording their observations. Such forms may specify uniform procedures and categories for such things as counting events, the time that it takes for them to happen, the numbers of persons processed, the numbers and types of persons who interact during the course of a specific transaction, or other categories.


Interviews are often coupled with direct observation during site-visits, but may be conducted separately, either in person or by telephone. For exploratory purposes, such interviews may be relatively unstructured and open-ended. Where the evaluation calls for more focused data collection, a questionnaire can be developed. This increases the reliability of the data because respondents are answering uniformly-phrased questions. The virtues of structured interviews over other types of data collection methods include more control over obtaining the data, (unlike mail surveys, it is more difficult for the person interviewed not to respond at all), and an ability to clarify answers by probing matters that emerge in the course of the interview. The chief disadvantage is cost and time, which can be substantial if the program is a large one. In many circumstances, interviewing may not be feasible at all.

Collecting Data from Administrative Records

Programs operating within the criminal justice system often keep a variety of different records for administrative purposes. These include case files on the defendants/offenders, management information systems organized to coordinate processing of cases or individuals, institutional performance records of persons detained or sentenced or put under some other form of surveillance, records of prior arrests or convictions, as well as financial accounting information about agencies. Evaluators will almost always draw at least some of the required data from these records. The advantages of such information are obvious: it is often plentiful, the cost of recording it has been born by others, and some of it may even be in computer-readable format. Whether or not the evaluation can rely entirely on it depends upon whether the data describe exactly what the evaluator is examining, how reliable the data are, and whether the recording of them has been biased. Many of the same issues arise for the use of administrative records as for using other existing data, described above.

Adapting Administrative Recording-Keeping Procedures for Evaluations

One of the most effective and least costly methods of collecting data is to change the project's existing procedures for recording information so that the needed information is captured. For example, existing intake forms might be augmented to include additional questions asked of all project participants; or records of services rendered to those participants might be supplemented with requests for additional information needed by the evaluator. The advantages of this method are substantial: the evaluator may define precisely what it is that is to be collected, rather than having to rely on already-existing indicators that may not provide direct measures. The evaluator will also not be required to hire and supervise a staff of data collectors. The costs of data collection will be carried by the project, however, and recording these additional data may be time consuming. Because data collected in this way often permit powerful analyses, evaluators and administrators are encouraged to plan ahead for evaluations, to design data collection instruments that can be used by project staff for a period of time before the analytic work begins, and to monitor how staff comply with the procedures for recording the data. Because project staff may resist doing the additional work, agreements have to be negotiated among staff, project management, and those involved in the evaluation.


When administrative records do not contain the data required of the evaluation, it may be collected by pencil and paper surveys. Like questionnaires administered during interviews, surveys permit control over the phrasing of questions. In circumstances where anonymity of respondents matters, such surveys may reduce the likelihood of biased reporting and thereby raise the validity of the information. Mail surveys are also relatively easy to administer; responses can be coded quickly, readily putting a format for analysis (including computer-readable formats), and all often at a low cost. They may not be suited to data collection on small projects, however, where the key actors can be surveyed in person by interview. Moreover, a principal liability of using surveys distributed by mail or other means is that respondents often have few incentives to complete and return them. Return rates are often very low.

Confidentiality of Data

Evaluators may have to collect information about people that is sensitive in nature. For example, the evaluation may seek to learn about past or current criminal activity. Measures of this can be obtained by asking persons to reveal this information about themselves, on the condition that this information shall not be divulged to others or described in ways that reveal the identity of the respondent. If such confidentiality is to be assured, evaluators, administrators, and relevant parties should establish a formal agreement to guarantee that information obtained for evaluative purposes will not be revealed to project administrators, staff, or to other agencies.

Protection of Human Subjects

In addition, research projects that involve putting subjects at risk, or that offer a particular benefit to some (e.g., a treatment) while denying it to others, must conform to guidelines established by the U.S. Department of Health and Human Services, as well as any other guidelines required by other relevant agencies. (For example, drug treatment programs in hospital settings will have to conform to human protection guidelines established by the hospitals). Questions regarding this matter should be addressed to the Bureau of Justice Assistance.

Chapter 3: Methods of Analyzing Data

The repertoire of analytic techniques that have been used in evaluation studies is large. It includes, among others, case studies, simple descriptive statistics, before/after comparisons, cohort studies, time-trend comparisons or other longitudinal methods (such as "panel analysis"), cross-sectional comparisons, cost-benefit analysis, controlled experiments and quasi-experimental statistical analyses, and methods drawn from the fields of operations research and systems analysis. What follows is a very brief (and incomplete) catalogue of some of these analytic approaches. Cursory descriptions of their key attributes, relative advantages and disadvantages are provided, along with some suggestions about what the evaluator and administrator might consider when choosing among them.

In general, these methods vary (1) in the emphasis they place on descriptive as opposed to explanatory analysis, (2) the extent to which they support strong conclusions about the project's operations and/or impacts, and (3) their reliance on qualitative compared to quantitative data.

Case Studies

A case study is an inquiry that investigates a contemporary phenomenon within its real-life context, when the boundaries between phenomenon and context are not clearly evident, and in which multiple sources of evidence are used (Yin, 1981). Case studies rely heavily on description but when undertaken for evaluative purposes, they may involve interpretation and analysis. Case studies typically make extensive use of qualitative data drawn from interviews and observation, although they may include quantitative data as well. They have traditionally been used in the sciences as an exploratory research strategy, undertaken for the purposes of learning how a particular phenomenon (such as a project or program) operates, as well as for developing hypotheses about it. Because the evaluator becomes steeped in richly detailed information in the course of undertaking a case study, a comprehensive understanding of a project and its complexities can be developed. Another advantage of case studies is that they be done quite quickly. Information can be collected on site through observation, interviews, examination of administrative records, or any other sources of information. A report may be written to organize information around key questions that are framed by the evaluator and the "client," either the state administrator or the program manager, or both.

The principal shortcomings of case studies as an evaluation method are that they demand competence and experience on the part of the evaluator. The analyst must evaluate information and those who provide it, distinguish significant features from insignificant ones, analyze data (impressionistic as well as more structured) quickly to discern patterns, devise ingenious ways of testing hypotheses against data, and find new data for these tests. The quality and usefulness of a case study therefore relies very heavily upon the good judgment and experience of the evaluator. The best case studies are sometimes done by those who are capable of working with more complex research designs.

Another problematic feature of the case study method is its limited ability to yield strong conclusions about whether, and to what extent, a project produces the effects that planners and managers intend for it to have. In some instances, success or failure is readily apparent. Case study methods may be sufficient to develop explanations of either these successes or failures. In instances where success of failure is more difficult to discern, however, and where the possible explanations are many, more controlled measurement and statistical analysis may be required to estimate the nature and size of a relationship between various observed effects and their causes. Some have argued however, that the case study method can be formalized so that it can be an effective tool for explanatory analysis. Evaluators choosing to undertake case studies are advised to consider various strategies for maximizing the reliability and validity of data collected and the procedures they use to interpret them (Yin, 1984, 1988).

Quantitative Descriptions

Evaluations may require documenting or measuring, in quantitative terms, the activities and the results of a project. While this can be done in a more quantitative case study, the analyst may chose to use relatively little qualitative data and focus instead on building a description from counts of activities, clients, staff hours, or other measures of what goes on in a project.

If a more complex description is required, evaluators may choose to develop models that describe the key components of the project, how they are related to one another, and measures that characterize these relations in quantitative terms. These may include the models such as those used in operations research (Tien, 1983). Their advantage is that they permit very precise specifications of how various elements of a project affect the operations of other elements in the project. If designed well, with accurate data, these models can be used to simulate alternative ways of organizing the project and to estimate the impact that these alternative may have.

Before/After Studies

When the evaluator is asked to assess the impact of a project, or to assess its effectiveness in accomplishing its goals, methods for testing cause-and-effect hypotheses are called for. One such method compares the target population or conditions before and after the project begins its operations. This "bargain basement" approach seeks to establish that participation in, or implementation of, the project is at least associated with the desired change. This design, termed a before/after comparison, requires obtaining data about the conditions that prevailed before the project intervention was initiated. (The analyst may find it possible to rely on data that another agency collected before the project's intervention occurred.) If the desired changes are shown to occur after the intervention, support is given to the assertion that the project caused the change to happen.

Confidence in the findings of such before/after comparisons depend, however, upon whether factors other than the project's interventions changed as well. Was the observed change really due to another force that operated independently of the project? Many of the conditions targeted by criminal justice projects are influenced by demographic, social, legal, and economic forces that operate independently of a project's intervention. Any increase or decrease in the observed outcomes may be affected by these outside factors and may therefore be unrelated to the project. To rule out these other possible explanations, the analyst must devise strategies for testing them. One such method is to collect data on these other possible causes and to impose statistical controls to isolate their effects from the project's operations. (See "quasi-experimental techniques," below.)

Yet another drawback of before/after studies is that estimates of the project's effect might be obscured by taking too few measurements. If the phenomena being measured are subject to random fluctuations, comparing only tow snapshots may not provide true pictures of the conditions before, during and after program intervention.

Time-Series Analyses

One method of compensating for fluctuating rates in the targeted conditions is to take several measures before and after the implementation of the project (or, if subjects are being studied, before and after they are exposed to the project). This research design, known as "time series analysis," first observes trends in the conditions existing before the project's intervention and then analyzes the trend data statistically. This trend can then be extrapolated into the future, to the point after which the project was implemented. By comparing what was projected to occur as a result of the pre-existing trends with what actually occurred, the analyst obtains some indication of what the project's impact may be. If outcome measures are fairly stable, minor random fluctuations or outside influences such as shifting demographics in the target populations may be accounted for in the trend projections. Time-series designs do not rule out all other possible explanations of the observed changes, however.

Experimental Designs

The investigative technique that provides the analyst maximum control, so that the relationship between a particular element of a project and the desired outcome can be isolated from other causal forces and measured accurately, is the laboratory experiment. If all factors are held constant (or "controlled"), and an effect is observed after one factor changes, one is in the strongest position to say that the manipulated factor caused the observed effect. Branches of science that have been able to impose laboratory conditions upon the matters they investigate have been able to develop powerful explanations of complex phenomenon.

Outside the laboratory, one can approximate laboratory conditions in field experiments (Campbell and Stanley, 1963). If one were in the laboratory studying the effects of a treatment regime, one might be able to control not only environment but also individual differences among subjects. (Laboratory mice, for example, are bred from a common genetic stock expressly for the purpose of experimentation.) In the field, however, such control over subjects and their environments cannot be gained. The strategy for approximating this control is to assign at random equally eligible subjects (cases/arrestees/addicts, or whatever) to two groups. The subjects in one group (the "experimental group") are then exposed to or given the "treatment" - be it a drug treatment project, enhanced prosecution, or whatever else the project is designed to do - while the other group (the "Control group") is not. Random assignment provides optimal assurance that any differences in the outcomes observed in the two groups can be attributed to the experimental treatment, and not to pre-existing differences or to chance.

While it may seem difficult to undertake experiments in criminal justice settings, several studies with experimental designs have been carried out with much success, yielding powerful findings (Lempert and Visher, 1987). Unfortunately, such studies are complicated, often vulnerable to a number of threats that may spoil the ability to draw strong conclusions, and generally very costly and time-consuming.

Quasi-Experimental Studies

Where random assignment of participants to treatment or control groups is not feasible for practical, ethical, or legal reasons, the evaluator may choose quasi-experimental evaluation designs to approximate the advantages of random selection. One such design is to identify a comparison group that is similar to the treatment group in those characteristics thought to be capable of influencing the outcome under examination (Campbell and Stanley, 1963). The strength of this design rests on the extent to which all the influential characteristics are accounted for in selecting the control group. The analyst can then account statistically for differences between groups that might influence the observed outcomes. The only requirements are that no differentiating characteristic belongs uniquely to one group, and that such competing factors be measured in both groups.

Because the use of a non-random comparison group does not eliminate all alternative explanations for the relationship between treatment and outcome, this type of design requires much more complicated analysis and yields less certain results that true experiments. Nonetheless, quasi-experimental designs can produce findings that are much stronger than other types of evaluation methods that impose fewer controls (e.g., case studies, before/after comparisons, descriptive models).

Some projects are not well suited to either experimental or quasi-experimental evaluation designs because their operations fluctuate too much due to their newness. Both evaluation designs require that treatment be constant and uniform throughout the time that the data are collected. If programs have not reached a state of relative stability in operations, the expense and time required of an experiment is likely to be wasted. In such instances, a focus on program implementation is likely to be more fruitful.

Chapter 4: Conducting Evaluations In-house or Under Contract

Developing an evaluation capability in the state office may or may not be productive, depending in part on the level of resources that are devoted to evaluations and the kind of expertise that is required. If a central evaluation or research capability already exists in state government, adding staff to evaluate the projects supported by the Drug Control and System Improvement Grant Program may be accomplished relatively easily. If no such central evaluation capability currently exists, creating one has numerous fiscal implications that the state administrator will recognize. (Will, for example, the amount of work to be demanded of evaluations be sufficient to build an entirely new staff with supporting equipment?)

A decision to hire evaluators or to contract for their services should be governed by a desire to maximize several values: (1) the technical skills of the evaluators; (2) the evaluator's familiarity with the details of the criminal justice system, including sensitivity to the political/bureaucratic tensions that prevail; (3) the disinterestedness of the evaluator; and (4) the utility of the evaluation for the decision makers.

Whether the state can attract staff with the experience and training needed for evaluations, especially evaluations of a project's impact, depends in large part on local market conditions. In many parts of the country, it may be difficult to hire persons with sufficient evaluation experience or training. In these instances, contracting with professionals at a university or research organization may be desirable.

In many instances, hiring evaluators into the state office may be the most efficient way to develop expertise that is tuned to the idiosyncracies of the local justice system agencies. In-house evaluators are also more likely, because of proximity and on-going working relationships, to develop close communication with the consumers of evaluation information, thereby producing evaluations that are suited well to the decision maker's needs. These close ties may, however, undercut the evaluator's ability to maintain a disinterested stance vis-a-vis the projects being assessed. How well this tension is balanced depends in large part on the state administrator's willingness to receive objective reports about projects' performance, including reports that find projects failing to accomplish their mission.

Chapter 5: Using Evaluation Findings for Decision Making

Consider two truisms: "Evaluation activities should meet the information needs of decision makers who fund them," and "The purpose of evaluations is to provide feedback to decision makers about program operations and their effectiveness so that their decisions can be as fully informed as possible." Most experienced administrators and evaluators however, know that this often does not happen. Evaluations may be undertaken because they are required, and the reports are subsequently shelved with little comment or effect. This may occur for several reasons, including:

failing to address directly the policy makers' or program administrators' principal questions;

not communicating the results and nature of the study in a way that can be readily understood by a client;

framing the presentation of the study and its findings without a clear understanding of who the primary and secondary audiences are;

not meshing the conclusion of the study with the schedule upon which policy or programmatic decisions are made;

developing findings that are perceived as challenging by the stakeholder and are therefore resisted.

Several strategies can be employed to overcome these obstacles.

Increasing Relevance for Decision Makers

Evaluations may not fit well with decision makers' interests, and therefore will be seen as irrelevant for at least two reasons. First, the questions that the evaluator chooses to focus the study upon do not correspond closely enough with the decision makers' principal concerns. Secondly, the evaluation's findings may fail to suggest clear and explicit recommendations for action.

The first problem can be overcome by communication throughout the study between the decision maker/client and the evaluator. In the design phase of the study, the concerns expressed by decision makers can be translated with the evaluator's assistance into questions that are capable of being addressed by the evaluation. The decision maker may also become more aware of the implications of various choices of study design for the conclusions that may ultimately be drawn from the evaluation. Many aspects of the evaluation design reflect choices and trade-offs where there is no single correct answer. Through discussions with the evaluator, decision makers will understand better these choices and be able to assess the advantages and disadvantages of various options, as well as the consequences of choices for the study's results.

The frequent absence of clear recommendations for action in the evaluation springs from deeper tensions (Moore, 1983). Scientific studies -- as well as evaluation that use scientific methods -- aim to develop accurate descriptions of the observable world and powerful explanations of why phenomena occur. Translating these descriptions and explanations into prescriptions for action is not a scientific activity. Nor do most descriptions and explanations directly employ specific policy decisions. At best, they can estimate with varying degrees of certainty the likely consequences of alternative courses of action. Weighing the desirability of such options, however, involves making value judgments and calculating the optimal trade-offs between costs and benefits, advantages and disadvantages. Most evaluations, even extremely rigorous and thorough ones, do not test the validity of all elements considered in these optimizing calculations. Nor are values derived from evidence or inferences about evidence. Values come from outside the realm of science.

Whether the evaluation is to include recommendations for action (beyond undertaking further study) should be discussed among evaluators and decision makers. Decision makers may feel that they are better able to make the translation from findings to action themselves (Lipton, 1984,155). If the evaluator is to stop at presenting findings, without making recommendations, the findings should be present in ways that facilitate the decision makers' developing conclusions on their own. If, on the other hand evaluators present recommendations for action, they should make explicit the values that they bring to their work and upon which they base their recommendations. If the evaluator's values are separate clearly from the descriptions, explanations, and other findings of the evaluation, the decision maker will be able to see how the recommendations were developed. Decision makers will also be able to bring different values or concerns to the findings and develop their own recommendations, which may differ from those of the evaluator. Maintaining a distinct boundary between the more strictly structured enterprise of evaluating criminal justice projects and the task of making recommendations about them will protect the integrity of both tasks, thereby maximizing the chances that decision makers will be able to use evaluation findings for different types of political and administrative decisions.

Explicating the Advantages and Disadvantages of Different Options

The practical concerns of policy makers involve choosing between alternatives. Learning that a project did or did not produce the desired effect is helpful, but clarifying both the alternatives and their relative costs and benefits, or advantages and disadvantages, is generally more useful.

Communicating the Findings in Understandable Terms

Policy makers are generally most interested in study results and recommendations. Discussions of study design, limitations of data, and methodological problems encountered are of less importance. To the extent that these matters affect the ability to draw conclusions from the data, they should not be ignored, but the evaluator should avoid long or technical discussions in the main text of the report. The readers should be alerted that the findings are qualified, but referred to an appendix for a fuller and, perhaps, more technical discussion of them (Majchrzak, 1984, 93; Lipton & Appel, 1984, 158-159). The tendency to write for other researchers alone should be discouraged.

Writing for the Intended Audience

Evaluations are often written for many different audiences, which may weaken the relevance of the study's findings for the primary audience. State administrators often have different interests in a project than other high-level policy makers, or project managers. For example, an evaluation that says that a project has not had its intended effects will be of little use to project managers, but of more interest to policy makers. Because it is difficult to be all things to all people in an evaluation, the evaluator should establish who is the primary audience and write to that audience's interests. This will direct the type of questions addressed, the way findings are discussed, as well as the depth with which they are examined (Weiss, 1972, 119).

Meshing the Evaluation's Schedule With Time Frames for Decision Making

The demands for policy or programmatic decisions often impose nearly impossible schedules upon evaluations, especially upon the more demanding evaluations that aim to estimate a project's effects. The frequent result is that the evaluation's findings come too late, after the point where the effective decision was made. Several strategies exist for ameliorating this mis-match of schedules. One is for the evaluator to report frequently to the principal client throughout the course of the evaluation. Unfortunately, what the decision maker often wants most are the final results of the evaluation. This poses a dilemma. The evaluator may resist the pressure to present tentative findings, but the opportunity to have input into the decision process may pass irretrievably. The alternative is to present the tentative findings, which may be little more than crude or rough estimates, knowing that these will be refined and perhaps even revised afterwards (Lipton, 1984, 158).

Preparing Clients For Challenging Findings

To avoid the appearance of a "surprise attack" by the evaluator on key stakeholders, it may be useful to involve these stakeholders in the study at various stages. Evaluators might even conduct mock analyses with them before the data are collected to prepare them for possible findings. At this early stage, standards of project performance can be established in coordination with project staff so that they know how their work is going to be measured. Also, surprises can be avoided by reporting findings on an interim basis. Quarterly reports or ongoing discussions of preliminary results may be used to keep key administrators apprized of the status of the evaluation. In discussing findings, alternative interpretations can be explored, which may contribute to a more balanced final report. Similarly, early trade-offs that are made regarding design, measurement and sampling should be discused when findings are reported so that the audience can make appropriate inferences from the results or, at least, can evaluate the inferences that the evaluator makes.

The outcome of such a strategy, however, can be to shift the project's operations during the course of the evaluation. Such midstream changes may affect the measurement of impact and the ability to identify cause and effect relationships. This is less a problem with evaluations of project implementation, but if the evaluation design requires constancy in program operations (as many impact evaluations do), the analysis may become confounded. Evaluators who work closely with program administrators during the course of the evaluation should anticipate analytic problems and devise a method for dealing with them. Positive changes which improve the program are a successful outcome of the evaluation process even if they complicate the analysis.


Evaluation can be a valuable resource for informing project administrator and policy makers' decisions. Half of the task is to design the evaluation and carry it out well. This is largely the evaluator's responsibility. Of equal importance, however, is the linkage of evaluation activities to the ongoing process of project administration and development, and of policy making. Excellent evaluations that are not used by decision makers represent wasted resources and lost opportunities. Grant administrators at the state and federal levels should recognize that the utilization of evaluation findings by practitioners doesn't just happen. Someone must make it happen.

Appendix: Sources of Additional Information and Assistance on Evaluation Methods

Stuart Adams, Evaluative Research in Corrections: A Practical Guide (Washington, D.C.: U.S. Department of Justice, March 1975).

Peter M. Bentler, Dan J. Lettieri, and Gregory A. Austin (eds.), Data Analysis Strategies and Designs for Substance Abuse Research (Washington, D.C. Government Printing Office, 1976).

Donald T. Campbell and Julian C. Stanley, Experimental and Quasi-Experimental Designs For Research (Boston, MA: Houghton Mifflin Company, 1963).

Thomas D. Cook and Donald T. Campbell, Quasi-Experimentation (Boston, MA,: Houghton Mifflin Company, 1979).

William Davidson, et al., Evaluation Strategies in Criminal Justice (New York, NY: Pergamon Press, 1981).

Judith Fiedler, Field Research: A Manual for Logistics and Management Of Scientific Studies in Natural Settings (San Francisco, CA: Jossey-Bass, Inc., 1978).

Daniel Glaser, Evaluation Research and Decision Guidance: For Correctional Addiction-Treatment, Mental Health and Other People-Changing Agencies (New Brunswick, NJ: Transaction, 1988).

Harry Hatry, et al., Program Analysis For State And Local Governments (Washington, D.C.: The Urban Institute, 1976).

Richard O. Lempert and Christy A. Visher (eds.) Randomized Field Experiments in Criminal Justice Agencies: Workshop Proceedings (Washington, D.C.: National Research Council, 1987).

Douglas S. Lipton and Phillip Appel, " The State Perspective," in Frank M. Tims and Jacqueline P. Ludford (eds.), Research Analysis and Utilization System, NIDA Research Monograph Series, No. 51 (Rockville, MD: Naional Institute of Drug Abuse, 1984).

Ann Majchrzak, Methods for Policy Research (Newbury Park, CA: Sage Publications, Inc., 1984).

Michael D. Maltz, Evaluation of Crime Control Programs (Washington, D.C.: U.S. Department of Justice, April 1972).

Mark H. Moore, "Social Science and Policy Analysis," in Daniel Callahan and Bruce Jennings (eds.), Ethics, The Social Sciences, and Policy Analysis (Plenum Publishing Corporation, 1983).

Leonard Oberlander (ed.), Quantitative Tools For Criminal Justice Planning (Washington, D.C.: U.S. Department of Justice, 1975).

Michael Quinn Patton, Qualitative Evaluation Methods (Beverly Hills, CA: Sage Publications, 1980).

______________, Performance Measurement and the Criminal Justice System (Washington, D.C.: National Institute of Law Enforcement and Criminal Justice, October 1976).

______________, Utilization-Focused Evaluation, second edition (Beverly Hills, CA: Sage Publications, 1985).

Peter H. Rossi and Freeman, Howard E., Evaluation: A Systematic Approach (Beverly Hills, CA: Sage Publications, 1985).

Lawrence G. Siegel and Martin J. Molof, Ph.D., A Handbook for Planning and Performing Criminal Justice Evaluation (McLean, VA: The MITRE Corporation, 1979).

Edward A. Suchman, Evaluative Research (New York, NY: Russell Sage Foundation, 1967).

James M. Tien, On Developing Evaluation Designs: A Summary Report (Washington, D.C.: National Institute of Justice, 1983).

Donald R. Weidman, et al., Intensive Evaluation For Criminal Justice Planning Agencies (Washington, D.C.: U.S. Department of Justice, July 1975).

Carol Weiss, Evaluation Research: Methods in Assessing Program Effectiveness (Englewood Cliffs, NJ: Prentice Hall, 1972).

Robert K. Yin, "The Case Study As A Serious Research Strategy," Knowledge: Creation, Diffusion, Utilization 3, September 1981).

______________, Case Study Research (Beverly Hills, CA: Sage Publications, 1984).

______________, Designing and Doing Case Studies (Beverly Hills, CA: Sage Publications, 1988).