oinurt af cl!a:lifarnia - board of trustees...

$: Oinurt af Cl!a:lifarnia - Board of Trustees Meetingsboard.calbar.ca.gov/docs/agendaItem/Public/agendaitem...S.\N FRANCISCO, CA 941024797 TANI G. CANTtL-S .. KAun: CHill~· }IIS'rlC:R$
~upr.em:.e Oinurt af Cl!a:lifarnia 350 McALLI&'TER STREET

S.\N FRANCISCO, CA 941024797

TANI G. CANTtL-S .. KAun: CHill~· }IIS'rlC:R OF CALIPORNI .. (~lSI 1165-7060

February 28, 2017

James Fox, President, Board ofTrustees Elizabeth Parker, Executive Director State Bar ofCalifornia 180 Howard Street San Francisco, CA 94105

Re: California Bar Exam

Dear Mr. Fox and Ms. Parker,

The Supreme Court of California received the attached February 1, 20 17, letter from the Deans of 20 AHA-accredited law schools, in which the Deans request the court order the State Bar ofCalifornia to lower the "cut score" of 144 that the State Bar applies to the Multistate Bar Exam (MBE) portion of the California bar exam. In support of their request, the Deans observe that California's cut score of 144 is the second highest in the nation. They note California bar takers, on average, score higher on the MBE portion of the exam than the national average, yet fare significantly worse at bar admission - and they contend this is so because California uses an atypically high cut score.

Leaving aside the question of what has caused this situation, the Deans raise a significant concern, particularly given the high cost of attending law school and the reality that non-admission to the bar could mean the loss of employment opportunities while student loan debt continues to compound. It appears prudent to consider and address whether 144 is an appropriate score for evaluating the minimum competence necessary for entering attorneys to practice law in California.

Of course, there may be reasons to question how much the cut score is contributing to the pass rate. For one, the cut score has remained consistent for three decades as overall bar pass rates have fluctuated. It is unclear, therefore, whether the July 2016 pass rate, a 30-year low, constitutes evidence that the cut score needs to be lowered.

Appendix A

A1

James Fox Elizabeth Parker February 28, 2017 Page 2 of3

Yet given the significant impact of the pass rate on law school graduates, the issue calls for a thorough and expedited study. The court is informed that the State Bar has begun investigating t4e potential causes of the declining California bar pass rates and is reviewing the bar exam and its grading system. The court agrees such an investigation is critically important, and directs the State Bar to ensure the investigation includes: ( 1) identification and exploration of all issues affecting California bar pass rates; (2) a meaningful analysis of the current pass rate and information sufficient to determine whether protection ofpotential clients and the public is served by maintaining the current cut score; and (3) participation of experts and stakeholders in the process, including psychometricians, law student representatives and law school faculty or deans.

The court directs that, once the investigation and all studies are concluded, the State Bar make a report to the court. The report must include a detailed summary of the investigation and findings, as well as recommendations for changes, if any, to the bar exam and/or its grading, and a timeline for implementation. The State Bar's report and recommendations should be submitted to the court as soon as practicable, and in no event later than December 1, 2017. The State Bar is further directed to submit bi-monthly letter reports to the court regarding the progress of its investigation, beginning March 1.

Sincerely

Tani G. CantU-Sakauye

Attach. cc: Sent via email

Erwin Chemerinsky, University of California, Irvine School ofLaw Judith F. Daar, Whittier Law School Allen Easley, Western State College ofLaw David L. Faigman, University ofCaJifornia, Hastings College of Law Stephen C. Ferruolo, University of San Diego School ofLaw Thomas F. Guernsey, Thomas Jefferson School of Law Andrew T. Guzman, University of Southern California Gould School ofLaw Gilbert A. Holmes, University of La Verne College ofLaw Lisa A. Kloppenberg, Santa Clara University School ofLaw M. Elizabeth Magill, Stanford Law School Jennifer L. Mnookin, UCLA School of Law Francis J. Mootz, III, University of the Pacific, McGeorge School of Law Melissa Murray, University ofCalifornia Berkeley School ofLaw

Appendix A

A2

James Fox Elizabeth Parker February 28, 2017 Page 3 of3

cc: (con't) Matthew~. Parlow, Dale E. Fowler School ofLaw at Chapman University Susan Westerberg Prager, Southwestern Law School Niels B. Schaumann, California Western School of Law Deanell Reece Tacha, Pepperdine University School ofLaw John Trasviiia, University of San Francisco School ofLaw Rachel Van Cleave, Golden Gate University, School ofLaw Michael E. Waterstone~ Loyola Law School

Appendix A

A3

Submitted By: Chad W. Buckendahl, Ph.D.

Office: 702-586-7386 [email protected]

11035 Lavender Hill Drive, Suite 160-433 · Las Vegas, NV 89135 | w w w . a c s v e n t u r e s . c o m

Conducting a Standard Setting Study for the California Bar Exam F i n a l R e p o r t July 28, 2017

Appendix B

B1

http://www.acsventures.com/

ACS Ventures, LLC – Bridging Theory & Practice

Contents Executive Summary ................................................................................................................................................. 3

Introduction and Overview ..................................................................................................................................... 6

Assessment Design.............................................................................................................................................. 6

Study Purpose and Validity Framework .............................................................................................................. 6

Procedures .............................................................................................................................................................. 7

Panelists and Observers ...................................................................................................................................... 7

Method ............................................................................................................................................................... 9

Workshop Activities .......................................................................................................................................... 10

Orientation .................................................................................................................................................... 11

Training/Practice with the Method............................................................................................................... 12

Operational Standard Setting Judgments ..................................................................................................... 12

Analysis and Results .............................................................................................................................................. 13

Panelists’ Recommendations ............................................................................................................................ 15

Process Evaluation Results ................................................................................................................................ 17

Evaluating the Cut Score Recommendations ........................................................................................................ 18

Procedural ......................................................................................................................................................... 18

Internal .............................................................................................................................................................. 18

External ............................................................................................................................................................. 18

Determining a Final Passing Score ........................................................................................................................ 20

References ............................................................................................................................................................ 21

Appendix A – Panelist Information ....................................................................................................................... 22

Appendix B – Standard Setting Materials ............................................................................................................. 23

Appendix C – Standard Setting Data ..................................................................................................................... 24

Appendix D – Evaluation Comments..................................................................................................................... 25

2 of 27

Appendix B

B2


Executive Summary

The California State Bar conducted a standard setting workshop1 May 15-17, 2017 to evaluate the passing score for the California Bar Exam. The results from this workshop serve as an important source of evidence for informing the final policy decision on what, if any, changes to make in the current required passing score. The workshop involved gathering judgments from panelists through the application of a standardized process for recommending passing scores and then calculating a recommendation for a passing score.

The standard setting workshop applied a modification of the Analytic Judgment Method (AJM; Plake & Hambleton, 2001). This method entails asking panelists to classify illustrative responses into defined categories (e.g., not competent, competent, highly competent). The selection of the AJM for the California Bar Examination reflected consideration of the characteristics of the exam as well as requirements of the standard setting method itself. The AJM was designed for examinations that use constructed response questions (i.e. narrative written answers) that are designed to measure multiple traits. The responses produced by applicants on the essay questions and performance task are examples of constructed response questions for which the AJM is applicable.2

The methodology involved identifying exemplars of applicant performance that span the observed score scale for the examination. The exemplar performances were good representations of the respective score point such that the underlying score was not in question. The rating task for the panelists was to first broadly classify each exemplar into two or more categories (e.g., not competent, competent, highly competent). Once this broad classification was completed, panelists then refined those judgments by identifying the papers close to the target threshold (i.e., minimally competent). This meant that the panelists identified the best of the not competent exemplars and the worst of the competent exemplars that they had initially classified. The process was repeated for each essay question and performance task with the results summed across questions to form an individual panelist’s recommendation.

To calculate the recommended cut score for a given question for a panelist, the underlying scores for the exemplars identified by a respective panelist were averaged (i.e., mean, median) across the group. These calculations were summed across the questions with each essay question being equally weighted and the performance task counting for twice as much as an individual essay question to model the operational scoring that will occur beginning with the July 2017 administration.

Following these judgments, we calculated the recommended score and associated passing rate when considering the written part of the examination. However, we needed to know what score on the total exam corresponded to this same pass rate. To answer this question, another step was needed to transform these

1 Standard setting is the phase of examination development and validation that involves the systematic application of policy to the scores and decisions on an examination. Conducting these studies to establish passing scores is expected by the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014). 2 Alternative methods that rely on panelists’ judgments of candidate work include Paper Selection and Body of Work (see Hambleton & Pitoniak, 2006, for additional details on these and a discussion of the categories of standard setting methods).

3 of 27

Appendix B

B3


judgments to the score scale on the full-length examination. After creating distributions of individual recommendations for the written part of the examination, to estimate the score for the full-length examination we applied an equipercentile linking approach to find the score that yielded the same percent passing as was determined on just the written component of the examination that panelists evaluated. Equipercentile involves finding the equivalent percentile rank within one distribution of scores and transforming to another score distribution to retain the same impact from one examination to another or in this instance, from a part of the examination on which panelists made judgments to the full examination.

The standard setting meeting results and evaluation feedback generally supported the validity of the panel’s recommended passing score for use with the California Bar Examination. Results from the study were analyzed to create a range of recommended passing scores. However, additional policy factors may be considered when establishing the passing score. One of these factors may include the recommended passing score and impact relative to the historical passing score and impact. The panel’s median recommended passing score of 1439 converged with the program’s existing passing score while the mean recommended passing score of 1451 was higher.

Additional factors that could be considered in determining the appropriate cut score for California might include the passing rates from other states that have similarly large numbers of bar applicants sitting for the examination. However, the interpretation of these results and the comparability are mitigated by the different eligibility policies among these jurisdictions and California’s more inclusive policies as to who may sit for the exam 3along with the downward trend in bar examination performance across the country, particularly over the last few years. In some instances, the gap passing the bar exam between California’s applicants and other states has closed and in others, the gap observed in 2007 has remained essentially constant as the trend declined on a similar slope.

An additional factor warrants consideration as part of the policy deliberation. Specifically, the consideration of policy tolerance for different types of classification errors is relevant. Because we know that there is measurement error with any test score, when applying a passing score to make an important decision about an individual, it is important to consider the risk of each type of error. A Type I error represents an individual who passes an examination, but whose true abilities are below the cut score. These types of classification errors are considered false positives. Conversely, a Type II error represents an individual who does not pass an examination, but whose true abilities are above the passing score. These types of classification errors are known as false negatives. Both types of errors are theoretical in nature because we cannot know which test takers in the distribution around the passing score may be false positives or false negatives.

A policy body can articulate its rationale for supporting adoption of the group’s recommendation or adjusting the recommendation in such a way that minimizes one type of misclassification. The policy rationale for licensure examination programs is based primarily on deliberation of the risk of each type of error. For

3 California has a uniquely inclusive policy as to who may be eligible to take the Bar Exam. Not only those who have graduated from schools nationally accredited by the American Bar Association, but applicants from California accredited and unaccredited law schools are also allowed to take the exam, as well as those who have ‘read law.’ This sets California apart from virtually all other jurisdictions.

4 of 27

Appendix B

B4


example, many licensure and certification examinations in healthcare fields have a greater policy tolerance for Type II errors than Type I errors with the rationale that the public is at greater risk for adverse consequences from an unqualified candidate who passes (i.e., Type I error) than a qualified one who fails (i.e., Type II error).

In applying the rationale, if the policy decision is that there is a greater tolerance for Type I errors, then the decision would be to accept the recommendation of the panel (i.e., 144) or adopt a value that is one to two standard errors below the recommendation (i.e., 139 to 141). Conversely, if the policy decision is that there is a greater tolerance for Type II errors, then the decision would be to accept the recommendation of the panel (i.e., 144) or adopt a value that is one to two standard errors above the recommendation (i.e., 148 to 150). Because standard setting is an integration of policy and psychometrics, the final determination will be policy driven, but supported by the data collected in this workshop and this study more broadly.

5 of 27

Appendix B

B5


Introduction and Overview The purpose of licensure examinations like the California Bar Exam4 is to distinguish competent candidates from those that could do harm to the public. This examination purpose is distinguished from other types of exams in that licensure exams are not designed to evaluate training programs, evaluate mastery of content, predict success in professional practice, or ensure employability. Although other stakeholders may attempt to use scores from the examination for one or more of these purposes, it is important to clearly state what inferences the test scores are designed to support or not. Therefore, the standard setting process was designed in a way to focus expert judgments about the level of performance that aligns with minimal competence.

Ass ess ment Des ign The California Bar Exam is built on multiple components intended to measure the breadth and depth of content needed by entry level attorneys who are minimally competent. These components are the Multistate Bar Exam (MBE), five essay questions, and a performance task5. Beginning with the July 2017 examination, the combined score for the examination weights the MBE at 50% and the constructed response components at 50% with the performance task being weighted as twice as much as an essay question.6 A decision about passing or failing is based on the compensatory performance of applicants on the examination and not any single component. This means that an applicant’s total score on the examination is evaluated relative to the passing score to determine pass/fail status. The applicant does not need to separately “pass” the MBE and the constructed response questions.

S tudy Purpos e and Val id i ty Framewo rk The purpose of this study was to recommend a passing score that distinguished the performance characteristics of someone who was minimally competent from someone who was not competent. To establish a recommended passing score, Dr. Chad Buckendahl of ACS Ventures, LLC (ACS) facilitated a standard setting meeting for The State Bar of California on May 15-17, 2017 in San Francisco, CA. The purpose of the meeting was to enlist subject matter experts (SMEs) to serve as panelists and recommend cut scores that designate the targeted level of minimally competent performance.

4 Note that the California Department of Consumer Affairs is responsible for managing the licensure process for many professions and consults with many others. As such, a representative from the Department was asked to serve as an external reviewer for this study. 5 The performance task is designed to measure skills associated with the entry level practice of law (e.g., legal analysis, reasoning, written communication) separate from the domain specific application of these skills to specific subject areas as are measured in the essay questions. 6 Prior to the July 2017 exam, MBE accounted for 35% of the exam, with the constructed response components weighted 65% of the total. Previously, constructed responses consisted of six essay and two performance task questions. While the papers used in the workshop were originally administered according to the old format, in anticipation of the new cut score potentially applied to exams from July 2017 based on the new format, the five essay and one performance test questions were used in the workshop to conform with the new exam structure.

6 of 27

Appendix B

B6


To evaluate the cut score recommendations that were generated from this study, Kane’s (2001) framework for evaluating standard setting activities was used. Within this framework, Kane suggests three sources of evidence should be considered in the validation process: procedural, internal, and external. When evaluating procedural evidence, practitioners generally look to panelist selection and qualification, the choice of methodology, the application of the methodology, and the panelists’ perspectives about the implementation of the methodology as some of the primary sources. The internal evidence for standard setting is often evaluated by examining the consistency of panelists’ ratings and the convergence of the recommendations. Sources of external evidence of validity for similar studies include impact data to inform the reasonableness of the recommended cut scores.

This report describes the sources of validity evidence that were collected and reports the study’s passing score recommendations. The California Bar is receiving these recommended passing score within ranges of standard error to contribute to discussions about developing a policy recommendation that will then be provided to the California Supreme Court for final decision-making. These results would serve as a starting point for a final passing score to be established for use with the California Bar Exam.

Procedures The standard setting study used a modified version of the Analytic Judgment Method (AJM; Plake & Hambleton, 2001). The AJM approach is characterized as a test based method (Hambleton & Pitoniak, 2006) that focuses on the relationship between item difficulty and examinee performance on the test. It is appropriate for tests that use constructed response items like the essay questions and performance task that are part of the written part of the California Bar Exam (see Buckendahl & Davis-Becker, 2012). The primary modification for the study was to reduce the number of applicants’ performances that panelists reviewed from 50 to 30 given the score scale for each essay question and the performance task.

Panel i s ts and Obs ervers A total of 20 panelists participated in the workshop7. The panelists were licensed attorneys with an average of 14 years of experience in the field. Panelists were recruited to represent a range of stakeholder groups. These groups were defined as Recently Licensed Professionals (panelists with less than five years of experience), Experienced Professionals (panelists with ten or more years of experience), and Faculty/Educator (panelists who are employed at a college or university). Note that some panelists were associated with multiple roles. Some of the experienced attorneys also served as adjunct faculty members at law schools. In listing their employment type in the table below, we have documented the primary role indicated by panelists. A summary of the panelists’ qualifications is shown in Table 1. In addition to the panelists, there were also observers who attended the in-person standard setting workshop. These included an external evaluator with expertise in standard setting, a representative from the California Department of Consumer Affairs, representatives from California Law Schools, a representative from the Committee on Bar Examinations, and staff from the California Bar Examination. Observers were instructed

7 Nominations to participate on the standard setting panel were submitted to the Supreme Court who selected participants to represent diverse backgrounds with respect to experience, practice areas, size of firms, geographic location, gender, and race/ethnicity.

7 of 27

Appendix B

B7


during the orientation of the meeting that they were not to intervene or discuss the standard setting activities with the panelists. All panelists and observers signed confidentiality and nondisclosure agreements that permitted them to discuss the standard setting activities and processes outside the workshop, but that they would not be able to discuss the specific definition of the minimally competent candidate or any of the preliminary results that they may have heard or observed during the study. External evaluators and observers were included in the process to promote the transparency of the standard setting and to critically evaluate the fidelity of the process by which a passing score would be recommended. Table 1. Summary of panelist demographic characteristics. Race/Ethnicity Freq. Percent

Gender Freq. Percent

Asian 3 15.0

Female 9 45.0 Asian/White 1 5.0

Male 11 55.0

Black 4 20.0

Total 20 100.0

Hispanic 2 10.0 White 10 50.0

Years of Practice Freq. Percent

Total 20 100.0

5 Years or Less 10 50.0

>=10 10 50.0

Nominating Entity Freq. Percent

Total 20 100.0

ABA Law Schools 3 15.0 Assembly Judiciary

Comm. 1 5.0

Primary Employment Type Freq. Percent

Board of Trustees 2 10.0

Academic 2 10.0 BOT - CBE* 1 5.0

Court 1 5.0

BOT - COAF* 8 40.0

District Attorney 1 5.0 BOT - CYLA* 2 10.0

Large Firm 4 20.0

CALS Law Schools 1 5.0

Non Profit 3 15.0 Governor 1 5.0

Other Govt. 3 15.0

Senior Grader 1 5.0

Public Defender 1 5.0 Total 20 100.0

Small Firm 3 15.0

* Committee of Bar Examiners; Council on Access and Fairness; California Young Lawyers Association.

Solo Practice 2 10.0

Total 20 100.0

Practice Areas Freq. %

Business 12 17% Personal Injury 6 9% Appellate 5 7% Criminal 5 7% Labor Relations 4 6%

8 of 27

Appendix B

B8


Juvenile Delinquency 3 4% Probate 3 4% Real Estate 3 4% Antitrust 2 3% Disability Rights 2 3% Employment 2 3% Environmental Law 2 3% Family 2 3% Insurance Coverage 2 3% Intellectual Property 2 3% Administrative Law 1 1% Civil Rights 1 1% Contract Indemnity Litigation 1 1% Education 1 1% Elder Abuse 1 1% General Commercial Litigation 1 1% Government Transparency 1 1% Immigration 1 1% Legal Malpractice 1 1% Mass Tort 1 1% Nonprofit Law 1 1% Policy Advocacy 1 1% Product Liability 1 1% Public Interest 1 1%

Total 69 100%

Metho d Numerous standard setting methods are used to recommend passing scores on credentialing8 exams (Hambleton & Pitoniak, 2006). The selection of the Analytical Judgment Method (AJM; Plake & Hambleton, 2001) for the California Bar Exam reflected consideration of the characteristics of the exam as well as requirements of the standard setting method itself. The AJM was designed for examinations that use constructed response questions that are designed to measure multiple traits. The responses produced by the applicants on the essay questions and performance task of the California Bar Exam are examples of constructed response questions where the AJM is applicable.

The methodology first involves identifying exemplars of applicant performance that span the observed score scale for the examination. The exemplar performances should be good representations of the respective score point such that the underlying score should not be in question. Plake and Hambleton (2001) suggested using

8 Credentialing is an inclusive term that is used to refer to licensure, certification, registration, and certificate programs.

9 of 27

Appendix B

B9


50 exemplars to ensure that there was sufficient representation of the score scale. Once these exemplars have been identified, they should be randomly ordered and coded to de-identify the score for the standard setting panelists. The goal is to have the panelists focus on the interpretation of the performance level descriptor of minimum competency and not the score of the paper.

The rating task for the panelists is to then broadly classify each exemplar into two or more categories (e.g., not competent, competent, highly competent). Once this broad classification is completed, panelists are asked to then refine those judgments by identifying the papers close to one or more thresholds. For example, if the target threshold is minimum competency, then panelists would identify the best of the not competent exemplars and the worst of the competent exemplars. To calculate the recommended cut score for a given question, the underlying scores for these exemplars are averaged (i.e., mean, median) to determine a value for this question. The process is then repeated for each essay question and performance task with the results summed across questions to form an individual panelist’s recommendation.

In the operationalization of this method for this study, two modifications of the methodology were used. First, rather than having 50 exemplars for each question, panelists evaluated 30 exemplars for each question. This modification was applied primarily due to the width of the effective scale. Meaning, although the theoretical score scale for each essay question spans from 0-100, the effective score scale only ranges from approximately 45-90 and is limited to increments of 5 points. This reduces the number of potential scale score points and thereby reduces the number exemplars necessary for each score point to illustrate the range. The second modification of the process involved sharing with the panelists a generic scoring guide/rubric as opposed to specific ones for each question. This was done to avoid potentially biasing the panelists in their judgments and to focus on the common structure of how the constructed response questions were scored.

In the rating task, panelists were asked to review examples of performance and categorize each example as either characteristic of not competent, competent, or highly competent performance. Even though the only target threshold level was minimally competent, the use of highly competent as a loosely defined category was meant to filter out exemplars that would not be considered in the refined judgments. Following the broad classification, these initial classifications were then refined to identify the papers that best represented the transition point from not competent to competent (i.e., minimally competent). Once these papers were identified by the panelists (i.e., the two best not competent exemplars and the two worst competent exemplars), the actual scores that these exemplars received during the actual, original grading process were used to calculate the average values of the panelists’ recommendations for each question and then summed across questions.

Wo rks ho p A ct iv i t ies The California Bar Exam standard setting meeting was conducted May 15-17, 2017 in San Francisco, CA. Prior to the meeting, participants were informed that they would be engaging in tasks that would result in a recommendation for a passing score for the examination. The standard setting procedures consisted of orientation and training, operational standard setting activities for each essay/performance task, and successive evaluations to gather panelists’ opinions of the process. Chad Buckendahl, Ph.D., served as the facilitator for the meeting. Workshop orientation materials are provided in Appendix B.

10 of 27

Appendix B

B10


Orientation The meeting commenced on May 15th with Dr. Buckendahl providing a general orientation for all panelists that included the goals of the meeting, an overview of the Analytical Judgment Method and its application, and specific instructions for panel activities. Additionally, the opening orientation described how cut scores would ultimately be determined through recommendations to the California State Bar. In addition, a generic scoring guide/rubric was shared with the panelists to provide a framework for how essay questions and the performance task would be scored. The different areas of the scoring criteria were a) Issue spotting, b) Identifying elements of applicable law, c) Analysis and application of law to fact pattern, d) Formulating conclusions based on analysis, and e) Justification for conclusions. Each essay question and performance task had a unique scoring guide/rubric for the respective question, but followed this generic structure.

Part of the orientation was a discussion around the expectations for someone who is a minimally competent lawyer and therefore should be capable of passing the exam. The process for defining minimum competency is policy driven and started with a draft definition produced by the California Bar. Feedback was solicited from law school deans, the Supreme Court of California, and the workshop facilitator for substance and style.

Based on the input from multiple stakeholder groups and relying on best practice as suggested by Egan et al. (2012), the California Bar provided the following description of minimally competent candidate (MCC).

A minimally competent applicant will be able to demonstrate the following at a level that shows meaningful knowledge, skill and legal reasoning ability, but will likely provide incomplete responses that contain some errors of both fact and judgment:

(1) Rudimentary knowledge of a range of legal rules and principles in a number of fields in which many practitioners come into contact. May need assistance to identify all elements or dimensions of these rules.

(2) Ability to distinguish relevant from irrelevant information when assessing a particular situation in light of a given legal rule, and identify what additional information would be helpful in making the assessment.

(3) Ability to explain the application of a legal rule or rules to a particular set of facts. An applicant may be minimally competent even if s/he may over or under-explain these applications, or miss some dimensions of the relationship between fact and law.

(4) Formulate and communicate basic legal conclusions and recommendations in light of the law and available facts.

Additionally, the facilitator guided the panel through a process where panelists further discussed the MCC by answering the following questions:

What knowledge, skills, and abilities are representative of the work of the MCC?

What knowledge, skills, and abilities would be easier for the MCC?

What knowledge, skills, and abilities would be more difficult for the MCC?

11 of 27

Appendix B

B11


The results of this discussion and the illustrative characteristics of MCC performance for each of the subject areas that were included in this study are included as an embedded document in Appendix C.

Training/Practice with the Method Panelists also engaged in specific training regarding the AJM. This involved a discussion about the initial task of broadly classifying exemplars into one of three categories – not competent, competent, or highly competent – and using the performance level descriptor (PLD) of the MCC to guide those judgments. In addition, prior to the operational ratings, panelists were given an opportunity to practice with the methodology. The practice activity replicated the operational judgments with two exceptions: a) panelists were only given 10 exemplars

distributed across the score scale to review and b) panelists only identified one exemplar that represented the best not competent and the worst competent. Panelists then discussed their selections and the reasoning for why their judgments reflected the upper and lower bound of the expected performance of the MCC.

Operational Standard Setting Judgments After completing the training activities panelists began their ratings by independently classifying the 30 exemplars that were selected for the first question. The 30 exemplars for each question were selected to approximate a uniform distribution (i.e., about the same number of exemplars across the range of observed scores). Figure 1 below shows the distribution of scores for the written section of the examination along with the distribution of exemplars that were selected for this study.

Figure 1. Distribution of observed scores and selected exemplars for the written section of the California Bar Examination from July 2016.

Score Freq. % Freq. %40 29 0.1 0 0.045 436 0.8 19 10.050 6,669 12.6 25 13.255 14,354 27.1 25 13.260 14,678 27.7 26 13.765 9,383 17.7 26 13.770 4,365 8.2 25 13.275 2,206 4.2 25 13.280 689 1.3 16 8.485 178 0.3 3 1.690 33 0.1 0 0.095 3 0.0 0 0.0

Total 53,023 100.0 190 100.0

Actual SelectedWritten Exam Score Distributions - Actual and Sample Selected for Workshop

010

2030

% o

f Tot

al

40 45 50 55 60 65 70 75 80 85 90 95

Actual Selected

12 of 27

Appendix B

B12


For the study, these exemplars were then randomly ordered and only identified with a code that represented the score that the exemplar received during the grading process in 2016. Panelists were not told the scores on the exemplars to maintain their focus on the content rather than an intuitive perception of a given score. After panelists made their initial, broad classification, they identified the two best not competent exemplars and the two worst competent exemplars from their initial classifications. The selection of these specific exemplars is used to estimate the types of performance that would be demonstrated by a MCC. Panelists used a predeveloped rating form to indicate the codes on the exemplars that aligned with these instructions.

To convert the panelists’ ratings into numerical values to then calculate the recommendations, the first step was to use a look up table to determine the underlying score associated with a given exemplar code. This was done for each question and each panelist. The conversion of the exemplar codes into the scores that each exemplar received permitted the summation of the values, calculation of averages (i.e., mean, median) across panelists.

After completing their ratings on the first question, the facilitator led a discussion of the rationale for why they selected the exemplars that they did. This process of discussion occurred as a full group and was intended to reinforce the methodology and the need to use the definition of minimum competency to inform the judgments about exemplar classification. Following this discussion, the judgment process was replicated for each of the subsequent essay questions and the performance task with an exception that a group discussion did not occur after each question. For logistics purposes, the remaining four essay questions were evaluated by half the group as a split panel. Following their ratings on the essay questions, the full panel then replicated the judgment process for the performance task. After completing key phases in the process (e.g., orientation/training, operational rating) panelists completed a written evaluation form of the process.

Analysis and Results Following the design of the process, each panelist reviewed 3 essay questions (1 as a full group and then 2 as part of their subgroup) and the performance task. For each, panelists were asked to select four borderline papers that represented the best non-competent responses (2) and the best competent responses (2). After the study, the scores for each of the selected borderline papers were identified and used to determine the level of performance expected for candidates at this level.

To calculate the recommended passing score on the examination from the panelists’ judgments, the individual recommendations for each panelist were summed across the questions with each essay question being equally weighted and the performance task counting for twice as much as an individual essay question to model the operational scoring that will occur beginning with the July 2017 administration. Because some essay questions were evaluated by half the group per the design, mean and median replacement were used to estimate the individual recommendations. Mean and median replacement are missing data techniques that are used to approximate the missing values when panelists do not make direct judgments.

The strategy first calculates the mean or median for the available data and then replaces the missing values with the calculated values. This approach retained the recommended values across questions for the panelists while permitting calculations of the standard error of the mean and standard error of the median. The standard error is an estimate of the variability of the panelists’ recommendations adjusted for the sample size

13 of 27

Appendix B

B13


of the group. These values provide additional information for interpreting the results of the panelists’ recommendations.

Following these judgments, we calculated the recommended score and associated passing rate when considering the written part of the examination. However, we needed to know what score on the total exam corresponded to this same pass rate. To answer this question, another step was needed to transform these judgments to the score scale on the full-length examination. After creating distributions of individual recommendations for the written part of the examination, to estimate the score for the full-length examination we applied an equipercentile linking approach to find the score that yielded the same percent passing as was determined on just the written component of the examination that panelists evaluated.

This methodology is characterized as equipercentile because the goal is to find the equivalent percentile rank within one distribution of scores and transform it over to another score distribution to retain the same impact from one examination to another or in this instance, from a part of the examination on which panelists made judgments to the full examination. This linking occurred applying the weight that 50% of the total score would be contributed by each component – written and MBE.

There are two important assumptions when applying equipercentile linking. First, we assume that the same or a randomly equivalent group of candidates are used to create the two score distributions. Second, we assume that the examinations are sufficiently correlated to support the interpretation. In this application, the same candidate scores were used from the written part to the full-length examination. In addition, the correlation between the written scores and the total score (of which the written scores are a part) was 0.97 suggesting a strong relationship between the distributions to support applying an equipercentile linking approach.

The summary results are presented in Table 2. The panel’s recommended mean and median with the associated standard errors are included along with the impact and combined score associated with the recommendation, along with a +/- 2 standard error of mean or median. Individual ratings for each essay question, the performance task, and the summary calculations are included in Appendix C and have been de-identified to preserve anonymity of individual panelists. The summary results of these analyses are shown here in Table 2.

Table 2. Summary results with range of recommendations on written and combined score scales with impact (i.e., pass rate). Written Score -

Mean Combined Score – Mean (pass rate)

Written Score – Median

Combined Score – Median (pass rate)

-2 SEMean/Median 419 1414 (53%) 414 1388 (60%) -1 SEMean/Median 424 1436 (47%) 419 1414 (53%) Recommended score (SEMean/Median)

428 (4.47) 1451 (43%) 425 (5.60) 1439 (45%)

+1 SEMean/Median 432 1480 (36%) 431 1477 (37%) +2 SEMean/Median 437 1504 (31%) 436 1504 (31%)

14 of 27

Appendix B

B14


Panel i s ts ’ R ecommendat io ns Interpreting the results of the panelists’ recommendations involves a combination of sources of evidence and related factors. The results shown in this section represent one of those sources, specifically, the ratings provided by subject matter experts on exemplars of performance from the California Bar Examination. Additional discussion of empirical and related policy considerations is provided in the Evaluating the Cut Score Recommendations section below.

The goal in analyzing the results of the panelists’ judgments was to best represent the recommendation from the group. There are different ways this could have been done, each involving a measure of central tendency (e.g., mean, median). The mean calculation is the arithmetic average that most people are familiar with, however, it may not be the best representation of the group’s recommendation when the distribution is skewed. For smaller samples or when extreme scores are observed in a distribution, the mean may be higher or lower than the group would have otherwise intended. In these instances, the median is calculated at the point where half the recommendations are above the value and half the recommendations are below the value to balance the effects of an extreme or outlier recommendation. When the mean and median do not converge, it is generally recommended that the median be used as the better representation of the central tendency of the observed score distribution. This approach is analogous to the data that are often shared with respect to housing prices in cities where a median is used to offset the effects of outliers on upper and lower end of the distribution.

Although the values calculated for the panelists were close, the mean and median recommendations did not converge. Therefore, the median likely serves as a better indicator of central tendency of the recommendation of the panelists. The median recommended cut score for the written portion of the exam based on all panelists’ judgments was 423.75 and was rounded to the nearest observable score of 425 on a theoretical scale that ranges from 0 to 700 (i.e., 100 points for each essay question, 200 points for the performance task). To then determine how this recommendation would be interpreted with respect to a pass/fail decision, we evaluated the impact on a cumulative percent distribution using only the written component performance by applicants who took the July 2016 California Bar Examination.

To evaluate the impact of this recommendation, we found the location in the cumulative percent distribution of the written scores that corresponded with this value (i.e., 425). This value resulted in an overall impact of 46% pass and 54% fail based on the applicants who took the July 2016 California Bar Examination. To then determine the score on the full examination that corresponded to this impact, we then used an equipercentile linking approach to find the value on the combined score that corresponded to the same impact (i.e., 46% pass and 54% fail), and the corresponding value in the distribution yielded a score of 1439. The same process was followed in evaluating the mean score that was calculated for the group.

When collecting data from a sample, it is important to acknowledge that the results are an estimate. For example, when public opinion polls are conducted to gather perceptions about a given topic (e.g., upcoming elections, customer satisfaction), the results are reported in conjunction with methodology, sample size, and margin of error to illustrate that there is a level of uncertainty in the estimate. In selecting a representative sample of panelists for this study, we similarly collected data that resulted in a distribution of judgments from which we could calculate an estimate of the recommendation of the group.

15 of 27

Appendix B

B15


Because the mean and median were calculated from a distribution of scores, it is also appropriate to estimate the variability in those recommendations to produce a range within which policymakers may consider the panel’s recommendation. This range was calculated using the standard error of the mean and median. The standard error is an estimate of the standard deviation (i.e., variability) of the sampling distribution. To calculate the standard error of the median (SEmedian), the standard error of the mean is first calculated and can then be approximated by multiplying that value by the square root of pi (i.e., 3.14159. . .) divided by two which produces a slightly wider range than the standard error of the mean. Though technical in nature, the Standard Error of the Median can also be interpreted conceptually as the margin of error in the judgments provided by the panel.

Given a median recommendation of 425 on the written section with a SEmedian of 5.60, the range of recommended passing scores on the written score scale would be 414 to 436 which translates to a range of 1388 to 1504 on the combined score scale. This range would correspond to the interpretative scale of 139 to 150. If the mean recommendation range was used, it would correspond to a 1414 to 1504 which on the interpretive scale would be 141 to 150.

16 of 27

Appendix B

B16


Pro cess Evaluat io n Res ul ts Panelists completed a series of evaluations during the study that included both multiple-choice questions and open-ended prompts. The responses to the questions are included in Table 3 and the comments provided are included in Appendix D. With the exception of Question 2 that was rated on a 3-point scale (1 = not enough, 2 = about right, 3 = too much), ratings closer to 4.0 can be interpreted as more positive perceptions of the question (e.g., success of training, confidence in ratings, appropriate time) versus values closer to 1.0 which suggest perceptions that are more negative with respect to these questions. Table 3. Written Process Evaluation Summary Results Median 1 -

Lower 2 3 4 -

Higher 1. Success of Training Orientation to the workshop 4 0 0 9 11

Overview of the exam 3 0 0 12 8 Discussion of the PLD 4 0 1 5 14 Training on the methodology 3.5 0 2 8 10

2. Time allocation to Training 2 4 16 0 N/A

3. Confidence moving from Practice to Operational 3 1 1 15 3

4. Time allocated to Practice 3 1 6 10 3

6. Confidence in Day 1 recommendations 3 1 2 11 6

7. Time allocated to Day 1 recommendations 2 5 6 9 0





14. Overall success of the workshop 3 0 1 12 7

15. Overall organization of the workshop 4 0 0 7 13 Collectively, the results of the panelists’ evaluation suggested generally positive perception of the activities for the workshop, their ratings, and the outcomes. The ratings regarding the time allocation were generally lower which can be attributed to the intensity of the task and the amount of work. Future studies may benefit from an additional day or two to permit more reasonable workload for the panelists.

17 of 27

Appendix B

B17


Evaluating the Cut Score Recommendations To evaluate the passing score recommendations that were generated from this study, we applied Kane’s (1994; 2001) framework for validating standard setting activities. Within this framework, Kane suggested three sources of evidence that should be considered in the validation process: procedural, internal, and external. Threats to validity that were observed in these areas should inform policymakers’ judgments regarding the usefulness of the panelists’ recommendations and the validity of the interpretation. Evidence within each of these areas that was observed in this study is discussed here.

Pro cedural When evaluating procedural evidence, practitioners generally look to panelist selection and qualifications, the choice of methodology, the application of the methodology, and the panelists’ perspectives about the implementation of the methodology as some of the primary sources. For this study, the panel that was recruited and selected by the Supreme Court represented a wide range of stakeholders: newer and more experienced attorneys and representatives from legal education who collectively included diverse professional experiences and backgrounds. The choice of methodology was appropriate given the constructed response aspects of the essay questions and performance task. Panelists’ perspectives on the process were collected and the evaluation responses were very positive.

Internal The internal evidence for standard setting is often evaluated by examining the consistency of panelists’ ratings and the convergence of the recommendations. The standard error of the median on which the recommendation was based (5.60) was reasonable given the theoretical range of the scale (0-700) for the written component of the examination. This means that most panelists’ individual recommendations were within about six raw score points of the median recommended value. Even considering the effective range of the scale (approximately 280-630), the deviation of scores across panelists did not vary widely. Similar variation was also observed for the mean recommendation. These observations suggest that panelists were generally in agreement regarding the expectations of which applicant responses were characteristic of the Minimally Competent Candidate.

External Although external evidence is difficult to collect, some sources were available for this study that will be useful for policy makers in their consideration of the recommendations of the group. The use of impact data from applicants in California from the July 2016 examination can be used as one source of evidence to inform the reasonableness of the recommended passing score. In addition, the application of the recommendation to scores from other exams (e.g., February 2016, February 2017, July 2017) would also be useful to evaluate the potential range of impact. This would be particularly valuable given the different ability distributions of applicants who take the examination in February versus July. In addition, consideration of first time test takers versus repeat test takers is another potential factor because applicants who are repeating the exam do not represent the full range of abilities.

A limitation of the study was the inability to include items from the MBE as part of the judgmental process. Although it would have been a desired part of the standard setting design, the MBE was not made available to California for inclusion in the study. In using half of the examination for the study, we can make a reasonable approximation of a recommendation for the full examination (see, for example, Buckendahl, Ferdous, &

18 of 27

Appendix B

B18


Gerrow, 2010). The correlation between the written and MBE scores is approximately 0.72 suggesting moderate to strong correlation, but with some unique variance contributed by each component of the examination.

In addition, passing scores on bar examinations from other states can also be used to inform the final policy. However, the use of data from other states should be done with caution for multiple factors. First, it is unclear whether other states have conducted formal standard setting study activities, so to evaluate comparability based solely on the passing standard may not support California’s definition of minimum competency. Second, California has different eligibility criteria than other states that will have an impact on the ability distribution of the population of applicants. Specifically, California has a more inclusive eligibility policy than most jurisdictions with respect to the legal education requirements. Third, each jurisdiction may have a different definition of minimum competency as to how it is applied to their examination. These can contribute to different policy decisions.

To illustrate how California passing score compares with other, larger population jurisdictions, Table 4 is shown here for comparison purposes. The overall test taker passing rates are shown from 2007 to 2016 to illustrate the current rate, but also the trend in performance over time.

Table 4. Overall passing rates in selected states and nationally from 2007-2016.9 Jurisdiction 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016

California 49% 54% 49% 49% 51% 51% 51% 47% 44% 40%

Florida 66% 71% 68% 69% 72% 71% 70% 65% 59% 54%

Illinois 82% 85% 84% 84% 83% 81% 82% 79% 74% 69%

New York 64% 69% 65% 65% 64% 61% 64% 60% 56% 57%

Texas 76% 78% 78% 76% 80% 75% 80% 70% 65% 66%

National Average

67% 71% 68% 68% 69% 67% 68% 64% 59% 58%

Note that across jurisdictions and for the nation, there has been a consistent, downward trend in overall passing rates beginning in 2014. Similar trends were observed for first-time test takers.6 With passing scores for jurisdictions being held constant through policy and statistical equating, the changing variables of ability within the candidate population in terms of law school admissions, matriculation, as well as any influence on curriculum and instruction have likely contributed to this observed pattern. These data reinforce the caution of not simply relying on current passing scores used in other jurisdictions.

9 Data for Table 4 were obtained NCBE 2016 Statistics document (pp. 17-20) and represent the combined pass rate for a given year across the February and July administrations. This report can be accessed: http://www.ncbex.org/pdfviewer/?file=%2Fdmsdocument%2F205.

19 of 27

Appendix B

B19

http://www.ncbex.org/pdfviewer/?file=%2Fdmsdocument%2F205


Determining a Final Passing Score The standard setting meeting results and evaluation feedback generally support the validity of the panel’s recommended passing score for use with the California Bar Examination. Results from the study were analyzed to create a range of recommended passing scores. However, additional policy factors may be considered when establishing the passing score. One of these factors may include the recommended passing score and impact relative to the historical passing score and impact. The panel’s median recommended passing score of 1439 (effectively 144 on the interpretative scale) converged with the program’s existing passing score with the mean recommended passing score being slightly higher.

Factors that could be considered include the passing rates from other states that have similarly large numbers of bar applicants sitting for the examination. However, the interpretation of these results and the comparability are mitigated by the different eligibility policies among these jurisdictions and California’s more inclusive policies along with the downward trend in bar examination performance across the country, particularly over the last few years. In some instances, the gap between California’s applicants and other states has closed and in others, the gap observed in 2007 has remained essentially constant as the trend declined on a similar slope.

An additional factor warrants consideration as part of the policy deliberation. Specifically, the consideration of policy tolerance for different types of classification errors. Because we know that there is measurement error with any test score, when applying a passing score to make an important decision about an individual, it is important to consider the risk of each type of error. A Type I error represents an individual who passes an examination, but whose true abilities are below the cut score. These types of classification errors are considered false positives. Conversely, a Type II error represents an individual who does not pass an examination, but whose true abilities are above the passing score. These types of classification errors are known as false negatives. Both types of errors are theoretical in nature because we cannot know which test takers in the distribution around the passing score may be false positives or false negatives.

A policy body can articulate its rationale for supporting adoption of the group’s recommendation or adjusting the recommendation in such a way that minimizes one type of misclassification. The policy rationale for licensure examination programs is based primarily on deliberation of the risk of each type of error. For example, many licensure and certification examinations in healthcare fields have a greater policy tolerance for Type II errors than Type I errors with the rationale that the public is at greater risk for adverse consequences from an unqualified candidate who passes (i.e., Type I error) than a qualified one who fails (i.e., Type II error).

In applying the rationale, if the policy decision is that there is a greater tolerance for Type I errors, then the decision would be to accept the recommendation of the panel (i.e., 144) or adopt a value that is one to two standard errors below the recommendation (i.e., 139 to 141). Conversely, if the policy decision is that there is a greater tolerance for Type II errors, then the decision would be to accept the recommendation of the panel (i.e., 144) or adopt a value that is one to two standard errors above the recommendation (i.e., 148 to 150). Because standard setting is an integration of policy and psychometrics, the final determination will be policy driven, but supported by the data collected within this workshop and for this study more broadly.

20 of 27

Appendix B

B20


References Buckendahl, C., Ferdous, A., & Gerrow, J. (2010). Recommending cut scores with a subset of items: An

empirical illustration. Practical Assessment, Research & Evaluation, 15(6). Available online: http://pareonline.net/getvn.asp?v=15&n=6.

Buckendahl, C. W. & Davis-Becker, S. (2012). Setting passing standards for credentialing programs. In G. J.

Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 485-502). New York, NY: Routledge.

Egan, K. L., Schneider, M. C., & Ferrara, S. (2012). Performance level descriptors: History, practice and a

proposed framework. In G. J. Cizek (Ed.), Setting performance standards: Foundations, method, and innovations (2nd ed., pp. 79-106). New York, NY: Routledge.

Hambleton, R. K., & Pitoniak, M. J. (2006). Setting performance standards. In R. L. Brennan (Ed.), Educational

measurement (4th ed., pp. 433–470). Westport, CT: American Council on Education and Praeger. Kane, M. (1994). Validating the performance standards associated with passing scores. Review of Educational

Research, 64 (3), 425-461. Kane, M. T. (2001). So much remains the same: Conception and status of validation in setting standards. In G.

J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 53-88). Mahwah, NJ: Erlbaum.

Plake, B. S. & Hambleton, R. K. (2001). The analytic judgment method for setting standards on complex, large-

scale assessments. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 283-312). Mahwah, NJ: Erlbaum.

21 of 27

Appendix B

B21


Appendix A – Panelist Information

Standard setting panelists.xlsx

22 of 27

Appendix B

B22

Appendix B – Standard Setting Materials

The nomination form for panelists and documentation used in the standard setting are included below.

State Bar Standard Setting Study Nomin

Agenda

Training

Evaluations

Appendix B

B23


Appendix C – Standard Setting Data

PLD Discussion for Minimally Competen

California Bar Standard Setting Da

24 of 27

Appendix B

B24

ACS Ventures LLC – Bridging Theory & Practice

Appendix D – Evaluation Comments Each panelist completed an evaluation of the standard setting process that included several open-ended response questions. The responses provided to each are included below. Day 1 – Training

• Lots of reading • More time could easily be spent on the practice rating, but I doubt that it would make a

difference in the outcome. • Dr. Buckendahl trained us very effectively. He is engaging, clear, and attentive. I have confidence

in him and the process. Good work! • Perhaps it was the result of the lively discussions we were having, but a little more time for

practice would have been ideal as I felt I was a bit rushed. • More background information before initiating the process would be helpful • Perhaps additional time spent as a group discussing not the themes/genres of knowledge for

each subject, but on what it means to read an essay and decide whether a discussion of the theme is sufficient to communicate minimal competency.

• Not convinced this methodology is valid. Many of us clearly do not know some applicable law and these conclusions may therefore determine that incompetent answers amounting to malpractice are nevertheless passing/competent.

• Great and important discussion about minimal competencies on each exam answer discussed. • It would have been helpful at the top to have a broader discussion about why the study is being

done, what the Bar is hoping to learn, and how the individuals (participants) were selected. • Would be helpful if watchers could be talking outside [the] room instead of in during review of

essays. • [Related to confidence rating] - only because some of my ratings were different from the

majority. Otherwise, very confident. • [Related to time rating] - Had to rush in order to have time for lunch. • I think a broader discussion at the outset before the practice/identification of key issues would

have been helpful. We all seemed to struggle with our own lack of knowledge and addressing that more up front may have helped us move along more efficiently.

Day 1 – Standard Setting

• I would have liked to know ahead of time that I would be "grading" 40 essays when I came in. • I did not finish and felt rushed. More time for first question. • Snacks for end of day grading would help :) I feel like I'm in a groove now and understand the

concept of what I'm doing, but 30 tests to read is a lot at the end of a long day. Grateful we can finish in the a.m.!

• More time please • I'm still not completely certain that I understand how we are qualified to do this without

answers. It seems like this could have the overall effect of making it easier to pass?

Appendix B

B25


• Although a lot of folks complained that we didn't "know enough" of subject matter, after reading 30 tests, yes we are - it became easier to spot the competent from the not competent. Perhaps this could be talked about at the outset to avoid this needless discussion altogether.

• I am concerned that an unprepared attorney, without the benefit of experience, studying, or a rubric, is not a good indicator of a minimally competent attorney. We all have an ethical duty to become competent. New lawyers/3 Ls do that by preparing for the exam. A more seasoned lawyer does that by refreshing recall of old material or by resort[ing] to practice guides. Having neither the benefit of studying nor outside sources, at least some of us may be grading with lack of minimum adequate knowledge. By studying for the exam, test-takers are becoming competent and gaining that minimal competency. Practicing professionals who become specialized may lose/atrophy that competence in certain field, which needs to be refreshed by CLG and other sources. So these scores may be of limited utility.

• It's too much. Too many questions to review. • No changes • Got 24/30 done [on the first day]

Day 2 – Standard Setting

• It was very difficult to read 60 essays in one day • The discussion about where certain papers fall on the spectrum is helpful to let us know we are

on the right track. • We need breaks to stretch our bodies and we need to go outside, so our brains can get fresh air. • It might be helpful to have some kind of "correct" sample answer to avoid having to go back and

re-score or re-read for lack of knowing "the correct answer." • I do NOT like being tricked into grading/reading 130 frigging essays! We should have been told

that this is what the project was. • Snacks were a great addition to the day. • Thanks for the afternoon snacks! • We did not follow the agenda which indicated we should build an "outline" for the "question."

Instead, on Day 1, we outlined subject areas. There will not be consistency among the group. This was clear this AM when there was no agreement regarding Question 1. Each of the 30 essays was marked as the best no-pass or worst pass by at least one person. We should have outlined as a group.

• After initial "calibration" session on Day 1; and with more time, I feel confident about my ability to apply the PLDs to these essays.

• No changes

Day 3 – Standard Setting and Overall Evaluation

• This no doubt took a lot of work, so thank you to all staff and State Bar folks! • The early activities and group discussion were helpful in allowing me to orient and direct what I

ought to be doing for my recommendations. Perhaps a few more panelists to ease the burden would be helpful for the future!

• No changes

Appendix B

B26


• I really found the time available to review the subject-matter answers to be very challenging. Trying to discriminate among those last four papers and a few on either side of them was difficult. An idea: have readers make their 3 initial stacks and identify not more than x (10?) papers that fall closer to the borderline. Do that for all answers. Then have readers spend last session choosing the "two and two" all at once.

• I'm not entirely sure I understand how what feels like an arbitrary process by 20 graders/panelists results in a less arbitrary cut score. Perhaps some additional information or process would be helpful.

• Although providing a scoring rubric would make categorization more consistent, it would do so in view of the thoughts of the author and not of the 20 panelists. Having no rubric was tough, but appropriate.

• Breaks between assignments • Work with Dr. Buckendahl again. He was very careful, clear, and engaging. Well done! • The performance test, unlike subject matter knowledge tests (essays) is much more amenable

to this sort of standard setting. While, as with essays, we did not outline/rubric/calibrate, that is less necessary because of closed universe and the skills being tested.

• Overall, I think this process made sense. I was troubled that at least one of the panelists had clear familiarity with the existing exam and process and a clear knowledge of "right" answers as currently graded. I'm not sure everyone had a clear understanding of "minimally competent attorney" so we may have had different standards in mind.

• I'd like to be included in next steps or discussions. Other than just more grading/reading essays. • I had a hard time with the time limit to review each answer. I am not clear if I was being too

thorough, or I missed the lesson on how to move through answers at a quicker pace.

Appendix B

B27

Appendix C

C1

Appendix C

C2

Appendix C

C3

Appendix C

C4

Appendix C

C5

Appendix C

C6

Appendix C

C7

Appendix C

C8

Appendix C

C9

Appendix C

C10

Appendix C

C11

Appendix C

C12

Appendix C

C13

Appendix C

C14

Appendix D

D1

Appendix D

D2

Appendix D

D3

Appendix E

E1

Appendix E

E2

Appendix E

E3

Appendix E

E4

Appendix E

E5

Appendix E

E6

Appendix E

E7

Appendix E

E8

Appendix E

E9

Due to the volume of public comments received, these have been posted online in three separate files:

Public Comments Received via E-Mail: http://apps.calbar.ca.gov/cbe/docs/agendaItem/Public/agendaitem1000002007.pdf

Public Comments Received via Online Comment Box: http://apps.calbar.ca.gov/cbe/docs/agendaItem/Public/agendaitem1000002008.pdf

Public Comments Received from Other Sources: http://apps.calbar.ca.gov/cbe/docs/agendaItem/Public/agendaitem1000002009.pdf

Appendix F

F1

http://apps.calbar.ca.gov/cbe/docs/agendaItem/Public/agendaitem1000002007.pdf



1330 1350 1390 1414 1440 1330 1350 1390 1414 1440

Total # Passing 7,242 6,920 6,017 5,642 5,329 5,451 5,053 4,010 3,598 3,332% Passing 84.1% 80.4% 69.9% 65.5% 61.9% 70.9% 65.7% 52.1% 46.8% 43.3%% Increase* 35.9% 29.9% 12.9% 5.9% 63.6% 51.7% 20.3% 8.0%

Gender

Male # Passing 3,795 3,617 3,121 2,911 2,756 2,679 2,484 1,970 1,760 1,635% Passing 83.8% 79.9% 68.9% 64.3% 60.9% 72.2% 66.9% 53.1% 47.4% 44.0%% Increase* 37.7% 31.2% 13.2% 5.6% 63.9% 51.9% 20.5% 7.6%

Female # Passing 3,441 3,297 2,890 2,726 2,568 2,722 2,525 2,005 1,805 1,665% Passing 84.5% 80.9% 71.0% 66.9% 63.0% 69.5% 64.5% 51.2% 46.1% 42.5%% Increase* 34.0% 28.4% 12.5% 6.2% 63.5% 51.7% 20.4% 8.4%

Race/Ethnicity

Asian # Passing 1,520 1,435 1,205 1,113 1,046 1,161 1,066 835 735 676% Passing 81.8% 77.2% 64.8% 59.9% 56.3% 64.0% 58.8% 46.1% 40.5% 37.3%% Increase* 45.3% 37.2% 15.2% 6.4% 71.7% 57.7% 23.5% 8.7%

Black # Passing 314 287 215 181 164 252 222 146 117 104% Passing 66.1% 60.4% 45.3% 38.1% 34.5% 49.8% 43.9% 28.9% 23.1% 20.6%% Increase* 91.5% 75.0% 31.1% 10.4% 142.3% 113.5% 40.4% 12.5%

Hispanic # Passing 621 591 471 432 397 734 663 478 419 379% Passing 76.5% 72.8% 58.0% 53.2% 48.9% 65.7% 59.3% 42.8% 37.5% 33.9%% Increase* 56.4% 48.9% 18.6% 8.8% 93.7% 74.9% 26.1% 10.6%

White # Passing 4,368 4,200 3,765 3,570 3,392 3,063 2,874 2,369 2,165 2,019% Passing 87.6% 84.3% 75.5% 71.6% 68.0% 77.7% 72.9% 60.1% 54.9% 51.2%% Increase* 28.8% 23.8% 11.0% 5.2% 51.7% 42.3% 17.3% 7.2%

Other # Passing 98 91 71 67 60 100 93 66 56 52% Passing 79.0% 73.4% 57.3% 54.0% 48.4% 67.6% 62.8% 44.6% 37.8% 35.1%% Increase* 63.3% 51.7% 18.3% 11.7% 92.3% 78.8% 26.9% 7.7%

Simulated Cut Scores for July 2008 GBX Simulated Cut Scores for July 2016 GBX

Appendix G

G1

1330 1350 1390 1414 1440 1330 1350 1390 1414 1440

Simulated Cut Scores for July 2008 GBX Simulated Cut Scores for July 2016 GBX

First Time or Repeat

First Time # Passing 5,657 5,521 5,078 4,870 4,682 4,089 3,881 3,317 3,066 2,896% Passing 90.6% 88.4% 81.4% 78.0% 75.0% 79.5% 75.4% 64.5% 59.6% 56.3%% Increase* 20.8% 17.9% 8.5% 4.0% 41.2% 34.0% 14.5% 5.9%

Repeat # Passing 1,585 1,399 939 772 647 1,362 1,172 693 532 436% Passing 67.0% 59.1% 39.7% 32.6% 27.3% 53.5% 46.0% 27.2% 20.9% 17.1%% Increase* 145.0% 116.2% 45.1% 19.3% 212.4% 168.8% 58.9% 22.0%

School Type

ABA # Passing 4,240 4,119 3,767 3,571 3,415 3,397 3,196 2,629 2,387 2,231% Passing 92.6% 90.0% 82.3% 78.0% 74.6% 82.5% 77.6% 63.8% 57.9% 54.2%% Increase* 24.2% 20.6% 10.3% 4.6% 52.3% 43.3% 17.8% 7.0%

CA Accredited # Passing 458 408 265 225 196 356 294 169 131 100% Passing 61.5% 54.8% 35.6% 30.2% 26.3% 46.2% 38.1% 21.9% 17.0% 13.0%% Increase* 133.7% 108.2% 35.2% 14.8% 256.0% 194.0% 69.0% 31.0%

Registered # Passing 194 165 107 88 76 111 95 44 38 35% Passing 60.8% 51.7% 33.5% 27.6% 23.8% 41.0% 35.1% 16.2% 14.0% 12.9%% Increase* 155.3% 117.1% 40.8% 15.8% 217.1% 171.4% 25.7% 8.6%

Out of State # Passing 1,629 1,556 1,369 1,307 1,242 1,033 975 801 730 685% Passing 87.1% 83.2% 73.2% 69.9% 66.4% 72.9% 68.8% 56.5% 51.5% 48.3%% Increase* 31.2% 25.3% 10.2% 5.2% 50.8% 42.3% 16.9% 6.6%

* Percent increase of the number of applicants that would have passed under each simulated cut score level relative to the number of passing applicants under the curent cut score of 1440.

Appendix G

G2

oinurt af cl!a:lifarnia - board of trustees...

Documents