evaluation research feb. 2, 2004 up504 prof. campbell piet mondrian counterfactual control...

Evaluation Research

Feb. 2, 2004

UP504

Prof. Campbell

Piet Mondrian

counterfactual

Control groupExperimental group

“but for …”

Random assignment

rival explanation

Program theoryUn

inte

nd

ed c

onse

qu

ence

s (s

ide

-eff

ects

)

Placebo effect

Regression

Housing Project

Vacancy Rate

1 33%2 33%3 18%4 23%5 28%6 19%7 32%8 20%9 15%

10 32%11 19%12 19%13 25%14 29%15 17%16 25%17 5%18 28%19 19%20 26%21 15%22 8%23 11%24 30%25 14%26 27%27 21%28 13%29 21%30 31%

Passersby walk past the Bromley-Heath Housing Complex in the Jamica Plain district of Boston, Monday, Nov. 2, 1998. What began as a 1960's experiment in self-government ended this past weekend when management of Bromley-Heath Housing Complex, Boston's only tenant-run public housing, was seized by the city after a number of drug related arrests. (AP Photo/Steven Senne)

Example:A city has 30 low-income housing projects.

A large number of vacant units in these projects creates a wide variety of problems (reduced revenues, vandalism, lower morale of existing tenants, etc.)

There is a wide range of vacancy rates, from less than 10 percent to over 30 percent.

The city officials believe that drug trafficking in the housing projects is discouraging people from either moving into or staying in the projects.

http://accuweather.ap.org/cgi-bin/aplaunch.pl

To prove the key role of drug dealing in shaping housing project vacancy rates, the city releases data showing that vacancy rates in projects with anti-drug programs (run by the police department) have a lower vacancy rate:

Vacancy rate (projects with anti-drug programs): 19 percent

Vacancy rate (projects without anti-drug programs): 25 percent

Housing Project

Vacancy Rate

1 33%2 33%3 18%4 23%5 28%6 19%7 32%8 20%9 15%

10 32%11 19%12 19%13 25%14 29%15 17%16 25%17 5%18 28%19 19%20 26%21 15%22 8%23 11%24 30%25 14%26 27%27 21%28 13%29 21%30 31%

Police Anti-Drug

Program?

001001110110001010101011100110

t-Test: Two-Sample Assuming Equal Variances

vacancy rate NO

drug program)

vacancy rate (drug program)

Mean 0.248 0.189Variance 0.005 0.006Observations 15.000 15.000Pooled Variance 0.005Hypothesized Mean Difference 0.000df 28.000t Stat 2.215P(T<=t) one-tail 0.018t Critical one-tail 1.701P(T<=t) two-tail 0.035t Critical two-tail 2.048

And just to be sure, they ran a difference of means test to demonstrate that the results were statistically significant at the 0.05 level.

To further demonstrate the significant role that this policy anti-drug program plays, the city also collects data on

housing expenses

family structure

(since these two variables also affect vacancy rates).

Housing Project

Vacancy Rate

Percent of Gross Hhd

Income Spent on

rent

percent 2-parent

families

Police Anti-Drug

Program?

1 33% 35% 12% 02 33% 52% 14% 03 18% 36% 15% 14 23% 51% 41% 05 28% 40% 18% 06 19% 50% 21% 17 32% 37% 22% 18 20% 19% 24% 19 15% 38% 25% 0

10 32% 21% 26% 111 19% 20% 27% 112 19% 25% 27% 013 25% 38% 33% 014 29% 35% 29% 015 17% 44% 46% 116 25% 47% 46% 017 5% 19% 57% 118 28% 50% 46% 019 19% 49% 30% 120 26% 24% 30% 021 15% 39% 33% 122 8% 50% 41% 023 11% 41% 36% 124 30% 22% 39% 125 14% 38% 39% 126 27% 37% 40% 027 21% 19% 37% 028 13% 19% 44% 129 21% 28% 22% 130 31% 20% 30% 0

An unidentified Long Beach Police officer questions a suspect during a drug raid on an apartment complex as a child watches, Tuesday July 15, 1997 in Long Beach, Calif. An 80-member police task force arrested nearly a dozen people in a four-block area targeted for rock cocaine sales and drug safe houses. The operation culminated six weeks of surveillance and was called, "Operation Clean Streets."(AP Photo/MIchael Caulfieldhttp://accuweather.ap.org/cgi-bin/aplaunch.pl

The city then releases the results of their own in-house multiple regression analysis, controlling for these two variables.

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.8067R Square 0.6508Adjusted R Square 0.6105Standard Error 0.0448Observations 30

ANOVA

df SS MS FSignificance F

Regression 3 0.0974 0.0325 16.1513 0.0000Residual 26 0.0523 0.0020Total 29 0.1496

Coefficients

Standard Error t Stat P-value

Intercept 0.4925 0.0443 11.1171 0.0000Percent of Gross Hhd Income Spent on rent -0.3746 0.0862 -4.3448 0.0002percent 2-parent families -0.3860 0.0776 -4.9768 0.0000Police Anti-Drug Program? -0.0593 0.0166 -3.5711 0.0014

Even controlling for the other two variables, the police anti-drug program is statistically significant, and seems to reduce the vacancy rate by 6 percentage points -- and then asks for more money for the program.

Housing Project

Vacancy Rate


Income Spent on

rent

percent 2-parent

families

Police Anti-Drug

Program?

Active Tenants

Group? (1 = yes; 0 =

no)

1 33% 35% 12% 0 02 33% 31% 14% 0 03 18% 25% 15% 1 14 23% 22% 41% 0 05 28% 34% 18% 0 06 19% 51% 21% 1 17 32% 42% 22% 1 08 20% 29% 24% 1 19 15% 33% 25% 0 1

10 32% 37% 26% 1 011 19% 46% 27% 1 112 19% 45% 27% 0 113 25% 31% 33% 0 014 29% 25% 29% 0 015 17% 18% 46% 1 116 25% 39% 46% 0 017 5% 23% 57% 1 118 28% 21% 46% 0 019 19% 29% 30% 1 120 26% 49% 30% 0 021 15% 27% 33% 1 122 8% 47% 41% 0 123 11% 21% 36% 1 124 30% 46% 39% 1 025 14% 51% 39% 1 126 27% 27% 40% 0 027 21% 40% 37% 0 128 13% 24% 44% 1 129 21% 18% 22% 1 130 31% 23% 30% 0 0

http://www.nhi.org/online/issues/95/phorg.html

NOT SO FAST, cries a tenants organization, which has been skeptical of the police anti-drug program’s effectiveness, and instead argues that the strong presence of organized tenants groups makes the difference in keeping vacancy rates lower.

Controlling also for this new variable, the police anti-drug program is no longer statistically significant, an instead the presence of the active tenants group makes the dramatic difference. (and look at that great R square!). However, we are no quite done…


ANOVA

df SS MS FSignificance F


Coefficients

Standard Error t Stat P-value

Intercept 0.500 0.008 60.294 0.000Percent of Gross Hhd Income Spent on rent -0.399 0.016 -24.610 0.000percent 2-parent families -0.288 0.015 -19.422 0.000Police Anti-Drug Program? -0.004 0.004 -1.238 0.227Active Tenants Group? (1 = yes; 0 = no) -0.102 0.004 -28.827 0.000

Since the police variable now has a statistically insignificant t-score, we remove it from the model. (We also remove the income variable, since it also becomes insignificant after we remove the police variable.) We are left with two independent variables: percent of 2-parent families and active tenants group.

SUMMARY OUTPUT


ANOVAdf SS MS F Significance F


Coefficients

Standard Error t Stat P-value BETA

Intercept 0.36582 0.017 20.908 0.000percent 2-parent families -0.2565 0.051 -5.017 0.000 -0.362Active Tenants Group? (1 = yes; 0 = no) -0.1246 0.011 -11.347 0.000 -0.821

Write out the equation:Predicted vacancy rate = + 0.366 - 0.256 [percent 2-parent families]- 0.125 [active tenants group]

Example (50% 2-parent & active tenants group]:Predicted vacancy rate = + 0.366 - 0.256 [.50]- 0.125 [1] = .113 or 11.3 percent

Vacancy Rate


Income Spent on

rent

percent 2-parent

families

Active Tenants

Group? (1 = yes; 0 =

no)

Police Anti-Drug

Program?Vacancy Rate 1.00Percent of Gross Hhd Income Spent on rent 0.10 1.00percent 2-parent families -0.44 0.20 1.00Active Tenants Group? (1 = yes; 0 = no) -0.86 0.12 0.10 1.00Police Anti-Drug Program? -0.39 -0.08 0.03 0.53 1.00

Looking at a correlation matrix can help understand the interrelationships between variables.

Moral of the story?

•Multiple regression can be a powerful tool in evaluation research

•One should be careful of generalizing from under-specified models (with omitted variables). This is especially true when the R-square is dramatically less than 1.00.

•Evaluation research is the effort to isolate the specific (or unique) impact of a specific policy/program/influence on a dependent variable (an outcome).

•A great challenge is to estimate the “counter-factual”, in this case, what would the vacancy rate have been without the police anti-drug program. (Here we used regression to make a prediction -- one can never know for sure.)

•Why might the police program initially seemed to have a strong positive effect? Perhaps because of self-selection: the police -- either intentionally or by accident -- may have targeted their program on housing projects that already had the characteristics of projects with lower vacancy rates.

•How do you get around this self-selection bias: use random assignment (a sometimes difficult but powerful approach in evaluation research).

Evaluation Research How do we determine whether the program did what it was supposed to do?

Or more broadly: what were the overall impacts (intended and unintended, positive and negative) of the program?

Some terms:Control group vs. Experimental groupCounterfactual (what would have happened if…”)“but for…” vs. deadweight spendingPlacebo effectRival explanationProgram theoryRandom assignmentUnintended consequences (side -effects, e.g., displacement)Program opportunity costsNet benefits vs. total benefits

Therefore:Regression Analysis can be an effective tool in Evaluation Research

Why do evaluation?

To answer the basic question:

Did the program (plan, intervention, treatment) do what it was supposed to do?

Several basic questions of evaluation research

What is the program theory? (the causal logic implicit in the program, such as an enterprise zone will lower the cost of doing business in an area and thus attract a greater number of businesses -- or -- higher gas taxes will encourage greater transit ridership.)

•you do not have to believe in the theory.•a hypothesis: program x leads to outcome y, etc. •often set up to be rejected. (like a null hypothesis)•if the theory connects concept x and y, then one needs measures of both. (Example: if you hypothesize that greater entrepreneurship leads to greater innovation and thus more businesses with growth potential, you will need measures of all three concepts.)

Measurement -- continued Why is urbanization and community development (as opposed to, say, economic growth or drug effectiveness) hard to measure?The three criteria for easy measurement of concepts are violated:1. Directly measured? Not always2. Simple? No, rather complex3. Neutral? No, development can be rather controversial (normative)

In addition:4. Time frame is long-term5. Cause effect is hard to determine (a function of complexity as well)And then you have these “intangibles”: community, business climate, culture of poverty, urban milieu, etc.

§How do you determine the counterfactual -- what would have happened without the intervention (program, treatment, etc.)?

•we can never know this for sure, especially in the social sciences where all events are to some degree unique to time and space, but we can estimate it. this is the crux of evaluation research.

•We don't need to know the whole future, but just the specific variable in question. (though, and here is the rub: we need to know the other variables so that we can predict the one in question).

•The counterfactual is a huge question in planning; what is the alternative to planning intervention? is it the market? is it predictable? is it chaos? predictability is key for evaluation.

•If the counterfactual is the same outcome, then the policy had no effect -- and thus any spending was “deadweight spending” (it would have happened anyway).

§How do we determine the counterfactual?

Is it an improvement over time, or instead simply the prevention of further deterioration/decline?

An example from economic development ----

§Is the counterfactual job loss or simply no job creation (or both)?

before after before after

Job creation

Counterfactual is no change

Job retention

Counterfactual is job loss

Note: job retention can be equally as effective as job creation (remember: a “job saved is a job earned,” to paraphrase Ben Franklin). Retention can in fact at times be even more important, because it often provides employment for an existing resident at risk, rather than for a potential new resident. However, capitalist economies -- and political careers -- are built on growth, so “mere” retention is often neglected.

Also: be cause of exaggerated claims for successful job retention (e.g., this tax break prevented an entire firm and its 8,000 employees from leaving).

Note: this is a reminder that the counterfactual is NOT necessarily the status quo (I.e., continuation of the same pattern).

§How do you determine the counterfactual? Several approaches:

•Regression analysis •Matched pair•before and after. However, this can be unreliable.•Random assignment: control and experimental groups (Role of placebos -- to separate psychological and physical effects)•Quasi-experiments (using the logic of experimental design, even though no control; use a pre-existing division of subjects into several groups. Example later in the presentation: the Gautreaux case in Chicago)

before after

Matched pairs

controlexperimental

Random assignment

Identical in all but one characteristic (variable)

The Public Housing Vacancy Rate Example-> How do you find the counterfactual (I.e., what would have happened without the police anti-drug program)?

•We have already seen how one can synthetically estimate the counterfactual by using regression analysis.•Here are three other approaches

before after

Matched pairs

controlexperimental

Random assignment

Identical in all but one characteristic (variable)

Presence of police anti-drug program Introduction of

police anti-drug program

police anti-drug programno program

•How do you test the rival explanations of the outcome?is the model fully specified? that is, could something else actually explain y? (EX: increased job creation in an enterprise zone could in fact be due to another program or due to overall national economic growth).

§How do you measure the level of impact? (e.g., effectiveness versus adequacy. relative to the current state or to the counterfactual? Etc.)

§ What are the unintended consequences of the program (both positive and negative)?

[think of drug side-effects; class size reduction in California: unintended consequence: no substitutes remaining; lowered teacher quality). For economic development, this may be, for example, displacement or deterrence (e.g., program inhibits the development of other economic activity) EXAMPLE: Aberdeen, Scotland: North Sea Oil Boom deterred other local development.

Variable x2 Variable y

Variable x1

Variable x3

Focus:

•To find all the significant explanatory variables

•To maximize the amount of variation in y explained (R2 )

Beta weights (path coefficients)

+.42

+.27

-.21

Evaluation research models may seem quite similar to regression models -- and often evaluation researchers in fact use regression models.

The distinction is often in the way evaluation researchers use models to isolate the cause -> effect relationship of a specific intervention / treatment (e.g., an education program) on a specific goal (e.g., student test scores).

Variable x

Another causeof y (not in the model)

a side effectof x

Variable y

12

3

Another causeof y (and moreeffective than x)

4

Focus:

•To isolate the specific impact of an intervention (x) on a desired outcome (y)

•To discover rival explanations and unintended consequences

casinos Increased jobs in community

12

3

4

Question:

Did the opening of casinos in Detroit lead to more jobs?

Or: was there some other cause of new jobs, such as growth in another sector (e.g., autos)?

And: is there a better way (e.g., more cost-effective) to achieve the same goal, such as job training, tax breaks?

Any side-effects (good or bad?)

Improve local education, job training, etc.

Overall increase in national economy

Increased crime

Other Issues: OPPORTUNITY COSTS

•Program opportunity costs: what benefits have been foregone due to other programs not funded? (e.g., funding casinos may lead to job creation, but it means that other programs may lack funding)

•Labor opportunity costs: people would have found jobs elsewhere (this can be seen as another form of displacement). So: net benefit = marginal increase in wages.

•Land opportunity costs: what were the benefits of other possible uses of the land (especially important if a large site in a city with limited land resources).

Other Issues:

•Formative (during the process) vs. summative (after the process) evaluation

•Do we measure changes in employment, hours worked or wages?

TOTAL WAGE BILL = employment * hours worked * wagesso an increase in any of the three increases the flow of income into the community.

FINALLY: what geographic scale used for measurement?

If you just care about local benefits, then creating new jobs, or encouraging firms to relocate from a neighboring community, would have similar impacts.

However, if your scale is larger (e.g., the region, state, nation or world), then you would need to discount such relocations within the same geographic area (since it is “robbing Peter to pay Paul”).

KEY CONCEPT: zero-sum game

Cabrini-Green public housing project, foreground and mid-photo high-rises, is seen against the Chicago Skyline in May 1996. Cabrini-Green, for decades one of the nation's least desirable addresses, has become the target of a massive redevelopment project becoming one of the city's most coveted pieces of real estate. (AP Photo/Beth A. Keiser)http://accuweather.ap.org/cgi-bin/aplaunch.pl

Case Study: the Gautreaux Program and the Chicago Housing Authority

A young boy crosses a snow-covered field at 48th and State Streets on Chicago's South Side Saturday, Jan. 17, 1998. The field is the site where civil rights leader Dr. Martin Luther King, Jr. spoke on July 24, 1965, to housing project residents. The Robert Taylor Homes was a model of modern public housing in 1965, but today it is considered one of the nation's worst examples of public housing. Gangs, drugs, violence and poorly maintained buildings are the reality to thousands who live in the Taylor homes. On Monday, Jan. 19, the nation will celebrate Martin Luther King day. (AP Photo/Top photo, File, Bottom photo, Beth A. Keiser)http://accuweather.ap.org/cgi-bin/aplaunch.pl

Source: “BPI. In the Public Interest… Celebrating the First 30 Years.” Business and Professional People for the Public Interest •17 East Monroe Street, Suite 212, Chicago, Illinois 60603312.641.5570 •Fax 312.641.5454 •www.bpichicago.org •[email protected]/pdfs/thirty_anniversary.pdf

EXAMPLE:James Rosenbaum, Changing the Geography of opportunity by Expanding Residential Choice: Lessons from the Gautreaux Program. Housing Policy Debate 6 (1): 231-68.

A legal decision against the Chicago Housing Authority (1976) for racial discrimination led to a housing voucher program (Section 8): Low income blacks randomly assigned to either urban neighborhoods or white, middle class areas.

Appeal of studying program: •Quasi-experimental situation•Random assignment (a rare and great opportunity)•overall strategy: compare the outcomes of two groups: the experimental group and the control group.

The appeal of Gautreaux for social scientists:

“Because of its design, the Gautreaux program presents an unusual opportunity to test the effect of helping low-income people move to better labor markets, better schools, and better neighborhoods.” [233]

BY CONTRAST - in most situations, one faces the confounding influence of self-selection:…

“…it is hard to tell whether the suburbs increased black employment or whether the blacks who happen to live in suburbs are different, perhaps moving to the suburbs after getting a job…” [233]

Better jobs and educational outcomes

Suburban, middle class location

More ambitious and successful people

Black arrows: geography shapes opportunity

red arrows: more successful people move to better

neighborhoods

Residential relocation

Social isolation

Educational and employment outcomes

Innercity school improvement and job training

Self-selection of program participants

Criteria for reviewing this article:• program theory• defining and measuring the counterfactual• unit(s) of analysis• types and sources of evidence• number and selection of cases• the use of a large statistical analysis vs. in-depth case studies• accounting/controlling for rival explanations of the impact• measurement issues• short-term vs. long-term impacts• unintended consequences (side-effects)• generalizing the results to other cases

Key Elements / concepts:program theory: geography of opportunity: location shapes opportunity, with white middle class locations leading to greater opportunity.

counterfactual: those that did not get sent to the suburbs, but rather within the city (the control group). This was NOT the choice of the individual (I.e., not self-selected). (random assignment: great but unusual)

Participants:over 5,000 families since 1976; over half to the suburbs. the selection process is not very selective [234], leading to a quasi-experimental design.

What are the advantages of random assignment? [233]can distinguish between the self-selection effect and the impact of the program itself.

The Results in table form [236]Suburban movers more likely to get jobs (73.6%) that those who stayed in the city (64.6%). [237]

(you can do a difference of means test here -- using a t test)

And suburban movers did better in school [239 - ]however, higher standards in the suburbs are not only an incentive, but also a barrier. [242]

Complex issues require complex solutions?it is not just where you live, but also access to transportation, wealth, networks, etc. (three key factors: “personal safety, role models, and access to jobs.” [266]). Housing in the right neighborhoods may be a start, but there are other factors as well.

Finally: links between evaluation research and community organizing / communicative action:

One sees a shift from an evaluation as simply “objective, neutral analysis of a program” to one that is politically, organizationally effective.

•Communicates well

•Is understandable

•Shows both shortcomings and possibilities of success

•Is discussed, incorporated into the political debate

•Shapes the language and structure of the political discussion

•And thus allows for conflict resolution over policies

Last slide

evaluation research feb. 2, 2004 up504 prof. campbell piet mondrian counterfactual control...

Documents