data-assistive course articulation using machine translation · the 1960 california “master...

Computational Approaches toHuman Learning (CAHL) research lab

Data-Assistive Course Articulation

Using Machine Translation

Zachary A. PardosAssistant Professor

Graduate School of EducationSchool of Information

UC Berkeley

zachpardos.com

zpardos

Context for this research

• 2015: $600,000 in NSF grants awarded• 2016: First prototype of AskOski course recommender released• 2017: Small scale user study of satisfaction and desired functionality• 2018: Articulation research begins• 2019: User study on serendipity of suggestions• 2019: User study on relevance of search “inferred keywords”• 2019: Explore, Requirements, Search features live (askoski.berkeley.edu)

The 1960 California “Master Plan” for Higher Ed

Among high-school graduates’ GPA/Test scores:● The top one-eighth (12.5%) would be guaranteed a place at one of the nine campus of the

University of California (e.g., UC Berkeley, UCLA)

● The top one-third (33.3%) would be able to enter one of the twenty three campuses of the

California State University system (e.g., SFSU, SJSU).

● All graduating, or otherwise “capable,” students could enter one of the 115 community colleges

Graduates of the 2-year community colleges would then be guaranteed transfer to either

the Cal State or UC systems in order to complete a 4-year bachelor's degree.

“California’s Upward Mobility Machine”- New York Times (2015)

3

2-year programContinuing 4-year program

Alignment of programs is difficult

Introduction

20 million students enter higher education in the United States each year, 45% of these students begin at 2-year public

institutions

(Hossler et al., 2012)

81.4% of surveyed (N=19,000) beginning 2-year students expressed an intent to transfer to a 4-year program

(Radford et al., 2010)

Six years after starting at the 2-year, only 13% of students had obtained a 4-year degree (N=852,439)

(Shapiro et al., 2017)

US Higher Ed Background

Introduction

20 million students enter higher education in the United States each year, 45% of these students begin at 2-year public

institutions

(Hossler et al., 2012)

81.4% of surveyed (N=19,000) beginning 2-year students expressed an intent to transfer to a 4-year program

(Radford et al., 2010)

Six years after starting at the 2-year, only 13% of students had obtained a 4-year degree (N=852,439)

(Shapiro et al., 2017)

US Higher Ed Background

Insufficient/Imprecise articulations between institutions are partly to blame

100s of millions of dollars are being spent to help students navigate existing transfer paths

Course-to-Course Articulation

?

Taking courses at Institution A that satisfy degree-credit at Institution B is often required to qualify for transfer to Institution B

Courses at Institution A

Sophomore courseat Institution B

Which course (if any) at Institution A is academically equivalent to a course at Institution B?

(California Articulation Policies and Procedures Handbook)

Current Practice

• Based on demand• Favors regional schools• Favors public schools

• 94% loss of credit transferring to a private (GAO, 2017)

• 42% is lost when transferring to a public

• Slow process (~1yr)• Open to instructor bias

Introduction

The California post-secondary system alone has:• 115 2-year California Community Colleges (CC)• 23 California State Universities (CSU)• 9 University of California campuses (UC)

The number of articulations to consider between 1 CC and 1 UC:

63M pairs (1,000*7,000)

- Without department-to-department mapping between institutions

- With department-to-department mapping between institutions

35,000 (50*20*35)

The Challenge of Articulation

A methodological approach is needed that can scale to this magnitude

36M between all UC and CC

We Propose• A data-assistive machine learning approach, using both course catalog

descriptions and historic enrollment patterns to triage the articulation problem by automatically surfacing candidates for articulation.

Potential Impact• More (valid) articulations would mean greater credit mobility, giving

students more ways to qualify for transfer (fewer wasted credits)• Fewer wasted credits translates to decreased burden on financial aid• Approach could be generalized to other credit acknowledgement

scenarios (e.g., K-12 to college, military to college.).

Datasets

4- year University of California

campus (UC1)

(Fall 2008 - Fall 2016)

2-year California Community

College (CC1)

(Fall 2013 – Fall 2018)

Enrollment records 4.8M 298,174

Number of students 164,196 58,716

Number of courses 7,487 1,000

Number of departments

179 53

Course descriptions 325 words per description 489 descriptions w/less than 10 words

27 words per description

62 descriptions w/less than 10 words

Existing articulation pairs

65 articulation pairs between CC1 and UC1(184 UC1 required degree courses have no CC1 course articulated)

Data

Title: Data Structures (CS 61B)

Catalog description: Fundamental dynamic data structures,

including linear lists, queues, trees, and other linked

structures; arrays strings, and hash tables. Storage

management. Elementary principles of software

engineering. Abstract data types. Algorithms for sorting and

searching. Introduction to the Java programming language.

Is this Berkeley course equivalent to any course at Laney College?

Title: Data Structures and Algorithms (CIS 27)

Catalog description: Use of abstract forms of

data in programming: Concepts, and

implementation and applicability of different

forms of data to various programming problems

Match?

Task

Methodology• Validate models of cross-institution course

similarity using existing 65 articulated pairs• Generate candidate articulations for a

subset of the unarticulated 184 UCB courses

Models of Course Similarity

Content-based (catalog description):

Bag-of-words: Using a vocabulary frequency vector to represent a course description

Word CS61A MATH54 EDUC161

fundamentals 1 0 0

design 0 0 1

construction 0 1 1

sculpture 0 0 0

algorithm 1 0 0

regression 0 1 0

...




TF-IDF: Using term frequency – inverse document frequency to represent course description

Word CS61A MATH54 EDUC161

fundamentals 0.12 1.435 0

design 0 0.898 2.314

construction 0 1.123 0

sculpture 0 0 0

algorithm 2.97 0 0.67

regression 0 0 1.12

...




TF-IDF: Using term frequency – inverse document frequency to represent a course description

DescVec: Averaging Google’s word2vec representation of all words in a course description

Embedding CS61A MATH54 EDUC161

Dim1 0.142 5.438 0.002

Dim2 -4.12 2.908 -0.005

Dim3 4.99 -1.108 3.020

Dim4 0.019 -5.109 1.021

Dim5 -1.194 2.232 -1.220

… … … …

Dim300 -2.229 1.009 4.210

What is Word2vec?

A way of making meaning from a large corpus of text

Mikolov, Chen, Corrado, & Dean (2013)

(From Language)

“All happy families are alike; each unhappy family is unhappy in its own way.”

families

All happy are alike

KING[vec] – MAN[vec] + WOMAN[vec] ≈ QUEEN[vec]

Mikolov, Yih, & Zweig (2013)

( )

Distributed representation of “royalty” (emergent semantics)

Skip-gram model

Mikolov et al. (2013)

What is Word2vec?

This training process, using SGD, is run on a corpus of 1b wordsto learn vector representations of each word in the vocabulary

Training word2vec

alike

0.505 0.9412 0.1154 0.8524

All happy are

families Word2vec/skip-gramMikolov, T., & Dean, J. (2013)

“All happy families are alike; each unhappy family is unhappy in its own way.”

context

input

0 … 0 0 1 … 0

(e.g., Google News archive)

What is the probability of the word “happy” being in context with the word “families?”

Input layer (one-hot of word – 1x5)

Input weight coefficient matrix Wih (5x2)

Representation layer h (1x2)

Output weight coefficient matrix Who (5x2)

Output layer (1x5)

(for every document d in corpus D)

(assumes vocabulary size = 5)

Representation layer encodes properties of words

19

Mikolov, Chen, Corrado, & Dean (2013)

Leveraging enrollment data

Course2vec: Using course vectors learned from enrollment histories

“MATH1B SPA12 STAT200B CS61B CUE100A CS188 CS C267 CS268 ENN1B”

Instead of analyzing the content of a course, analyze the context in which it has been enrolled in

Methodology

● Skip-gram (word2vec) algorithm NLP example

MAN

WOMAN

UNCLE

AUNT

QUEEN

(Mikolov, Yih, & Zweig, 2013)

RQ: Can this method embed

courses into a “concept space”?

KING

This training process, using SGD, is run on a corpus of 130k student enrollment sequences to learn vector representations of each course offered between 2008 and 2017

Training course2vec

CUE100A

0.505 0.9412 0.1154 0.8524

CS61A MATH1B STAT200B

SPA12Pardos, Fan, & Jiang (2019)

“CS61A MATH1B SPA12 STAT200B CUE100A CS188 CS C267 CS268 ENN1B”

(From enrollments)context

input

0 … 0 0 1 … 0


Course enrollment / collaborative-based:

Course2vec: Using course vectors learned from enrollment historiesEmbedding CS61A MATH54 EDUC161

Dim1 1.152 2.438 -1.002

Dim2 2.16 -5.908 0.205

Dim3 -2.09 -1.708 -3.420

Dim4 1.059 2.109 2.521

Dim5 1.594 2.562 5.620

… … … …

Dim229 (UC1)Dim20 (CC1)

2.229 -4.41 4.210

“MATH1B SPA12 STAT200B CS61B CUE100A CS188 CS C267 CS268 ENN1B”


Course enrollment / collaborative-based:

Course2vec: Using continuous course vectors learned from enrollment historiesCourse2vec+DescVec: Combine content (word2vec) with context (course2vec)representation

Machine Translation from UC1 to CC1Linear translation between institutions (similar to language translation)

t-SNE visualization of UC1 and CC1 Course2vec+DescVec

Evaluation

• Attempt to recover the 65 official articulation pairs

• For each method, suggest N candidate courses from CC1 to articulate to each UC1

• Report the percentage of suggestions containing the true course(Recall @ N)

• Use leave-one-out cross validation to train the translation

See paper for details on different loss functions, distance metrics, and hyper parameter tuning

Results

Model Inspection

PCA visualization of UC1 and CC1 Course2vec+DescVec, Zoomed in on Comp Sci

department courses

UC1 and CC1 exhibit similar topology within the Comp Sci department, suggesting a linear translation can be found between the two embeddings

Java courseJava course

C++ courseC++ course

Data structures

Data structures

Producing Articulation Candidates Report

Which of the 185 unarticulated UC1 courses have the most promising CC1 candidates?

See paper for details on heuristics used to narrow down the 185

Discussion

• Econometric approach• Policy approach

Challenges• Course2Vec not available without enrollment data (hard to get)• Translation method requires existing articulations• Accuracy not high enough to use this method without human validation• Efficacy of articulations (wrt outcomes) needs validation

Conclusions• Enrollment data provides complementary information about courses• Data-assistive methods can narrow down articulation candidates,

making greater articulation tractable for institutions

Future directions• Explore translation methods that do not require existing pairs• Continue pilot with 19 institution City University of New York system

(thanks to grant from Ithika S+R & Heckscher Foundation for Children)

Zachary A. PardosUniversity of California at BerkeleyThank You!

Questions?

Acknowledgement: The authors would like to thank Tammeil Gilkerson (Laney College) and

Walter Wong (UC Berkeley) for contributing anonymized enrollment data to this effort.

Data-Assistive Course Articulation Using Machine

Translation

zpardos

Computational Approaches toHuman Learning (CAHL) research lab

[email protected] | zachpardos.com

Pardos, Z.A., Chau, H., Zhao, H. (2019) Data-Assistive Course-to-Course Articulation Using Machine

Translation. In J. C. Mitchell & K. Porayska-Pomsta (Eds.) Proceedings of the 6th ACM Conference on

Learning @ Scale (L@S). Chicago, IL. ACM. Pages 1-10. *Best paper award* [paper]

https://twitter.com/zpardos

mailto:[email protected]

zachpardos.com

data-assistive course articulation using machine translation · the 1960 california “master...

Documents