aniqua baset tamara denning school of computing ... · data collection: 3062 publications,1980-2015...

49
Aniqua Baset, Tamara Denning School of Computing, University of Utah A Data-Driven Reflection on 36 Years of Security and Privacy Research

Upload: others

Post on 30-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Aniqua Baset Tamara DenningSchool of Computing University of Utah

A Data-Driven Reflection on36 Years of Security and Privacy Research

What we did

2

3062 publications1980-2015

Topic modeling Trends in authorship and contents

What we did

3

3062 publications1980-2015

Topic modeling Trends in authorship and contents

Online visualizations + Data

4

So why even do this kind of study129300

5

Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies

- Computational linguistics Human computer interaction Ubiquitous computing Games hellip

6

Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies

- Computational linguistics Human computer interaction Ubiquitous computing Games hellip

Past security amp privacy introspection bull Panel talks keynote invited papers with valuable

expert insights

7

Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies

- Computational linguistics Human computer interaction Ubiquitous computing Games hellip

hellip But we lack structured data-driven efforts

Past security amp privacy introspection bull Panel talks keynote invited papers with valuable

expert insights

How should we do data-driven introspection129300

8

9

Publications fromdifferent venues

Cohesive categorization

We need

10

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

11

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

12

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

Topic modeling on full contents of the publications

Our approach

Overview of topic modeling

13

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Overview of topic modeling

14

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial

Our methodology

15

Data collection 3062 publications1980-2015

16

1993-19941996-2015 1066

CCSACM Computer and Communications Security

1997-2015 932

NDSSNetwork and Distributed System Security Symposium

1980-2015 456

SampPIEEE Symposium on Security amp Privacy

19931995-19961998-2015

608

USENIXUSENIX Security Symposium

Full content+ Title Authors andtheir affiliations

Session name+ +

From publishersFrom publishers

From websiteFrom website

Pre-processing input for topic modeling

17

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Pre-processing input for topic modeling

18

1) TEXTPDFPS

HTMLMain body stripping off meta-data

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

19

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 2: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

What we did

2

3062 publications1980-2015

Topic modeling Trends in authorship and contents

What we did

3

3062 publications1980-2015

Topic modeling Trends in authorship and contents

Online visualizations + Data

4

So why even do this kind of study129300

5

Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies

- Computational linguistics Human computer interaction Ubiquitous computing Games hellip

6

Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies

- Computational linguistics Human computer interaction Ubiquitous computing Games hellip

Past security amp privacy introspection bull Panel talks keynote invited papers with valuable

expert insights

7

Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies

- Computational linguistics Human computer interaction Ubiquitous computing Games hellip

hellip But we lack structured data-driven efforts

Past security amp privacy introspection bull Panel talks keynote invited papers with valuable

expert insights

How should we do data-driven introspection129300

8

9

Publications fromdifferent venues

Cohesive categorization

We need

10

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

11

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

12

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

Topic modeling on full contents of the publications

Our approach

Overview of topic modeling

13

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Overview of topic modeling

14

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial

Our methodology

15

Data collection 3062 publications1980-2015

16

1993-19941996-2015 1066

CCSACM Computer and Communications Security

1997-2015 932

NDSSNetwork and Distributed System Security Symposium

1980-2015 456

SampPIEEE Symposium on Security amp Privacy

19931995-19961998-2015

608

USENIXUSENIX Security Symposium

Full content+ Title Authors andtheir affiliations

Session name+ +

From publishersFrom publishers

From websiteFrom website

Pre-processing input for topic modeling

17

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Pre-processing input for topic modeling

18

1) TEXTPDFPS

HTMLMain body stripping off meta-data

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

19

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 3: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

What we did

3

3062 publications1980-2015

Topic modeling Trends in authorship and contents

Online visualizations + Data

4

So why even do this kind of study129300

5

Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies

- Computational linguistics Human computer interaction Ubiquitous computing Games hellip

6

Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies

- Computational linguistics Human computer interaction Ubiquitous computing Games hellip

Past security amp privacy introspection bull Panel talks keynote invited papers with valuable

expert insights

7

Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies

- Computational linguistics Human computer interaction Ubiquitous computing Games hellip

hellip But we lack structured data-driven efforts

Past security amp privacy introspection bull Panel talks keynote invited papers with valuable

expert insights

How should we do data-driven introspection129300

8

9

Publications fromdifferent venues

Cohesive categorization

We need

10

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

11

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

12

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

Topic modeling on full contents of the publications

Our approach

Overview of topic modeling

13

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Overview of topic modeling

14

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial

Our methodology

15

Data collection 3062 publications1980-2015

16

1993-19941996-2015 1066

CCSACM Computer and Communications Security

1997-2015 932

NDSSNetwork and Distributed System Security Symposium

1980-2015 456

SampPIEEE Symposium on Security amp Privacy

19931995-19961998-2015

608

USENIXUSENIX Security Symposium

Full content+ Title Authors andtheir affiliations

Session name+ +

From publishersFrom publishers

From websiteFrom website

Pre-processing input for topic modeling

17

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Pre-processing input for topic modeling

18

1) TEXTPDFPS

HTMLMain body stripping off meta-data

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

19

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 4: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

4

So why even do this kind of study129300

5

Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies

- Computational linguistics Human computer interaction Ubiquitous computing Games hellip

6

Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies

- Computational linguistics Human computer interaction Ubiquitous computing Games hellip

Past security amp privacy introspection bull Panel talks keynote invited papers with valuable

expert insights

7

Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies

- Computational linguistics Human computer interaction Ubiquitous computing Games hellip

hellip But we lack structured data-driven efforts

Past security amp privacy introspection bull Panel talks keynote invited papers with valuable

expert insights

How should we do data-driven introspection129300

8

9

Publications fromdifferent venues

Cohesive categorization

We need

10

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

11

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

12

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

Topic modeling on full contents of the publications

Our approach

Overview of topic modeling

13

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Overview of topic modeling

14

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial

Our methodology

15

Data collection 3062 publications1980-2015

16

1993-19941996-2015 1066

CCSACM Computer and Communications Security

1997-2015 932

NDSSNetwork and Distributed System Security Symposium

1980-2015 456

SampPIEEE Symposium on Security amp Privacy

19931995-19961998-2015

608

USENIXUSENIX Security Symposium

Full content+ Title Authors andtheir affiliations

Session name+ +

From publishersFrom publishers

From websiteFrom website

Pre-processing input for topic modeling

17

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Pre-processing input for topic modeling

18

1) TEXTPDFPS

HTMLMain body stripping off meta-data

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

19

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 5: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

5

Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies

- Computational linguistics Human computer interaction Ubiquitous computing Games hellip

6

Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies

- Computational linguistics Human computer interaction Ubiquitous computing Games hellip

Past security amp privacy introspection bull Panel talks keynote invited papers with valuable

expert insights

7

Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies

- Computational linguistics Human computer interaction Ubiquitous computing Games hellip

hellip But we lack structured data-driven efforts

Past security amp privacy introspection bull Panel talks keynote invited papers with valuable

expert insights

How should we do data-driven introspection129300

8

9

Publications fromdifferent venues

Cohesive categorization

We need

10

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

11

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

12

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

Topic modeling on full contents of the publications

Our approach

Overview of topic modeling

13

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Overview of topic modeling

14

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial

Our methodology

15

Data collection 3062 publications1980-2015

16

1993-19941996-2015 1066

CCSACM Computer and Communications Security

1997-2015 932

NDSSNetwork and Distributed System Security Symposium

1980-2015 456

SampPIEEE Symposium on Security amp Privacy

19931995-19961998-2015

608

USENIXUSENIX Security Symposium

Full content+ Title Authors andtheir affiliations

Session name+ +

From publishersFrom publishers

From websiteFrom website

Pre-processing input for topic modeling

17

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Pre-processing input for topic modeling

18

1) TEXTPDFPS

HTMLMain body stripping off meta-data

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

19

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 6: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

6

Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies

- Computational linguistics Human computer interaction Ubiquitous computing Games hellip

Past security amp privacy introspection bull Panel talks keynote invited papers with valuable

expert insights

7

Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies

- Computational linguistics Human computer interaction Ubiquitous computing Games hellip

hellip But we lack structured data-driven efforts

Past security amp privacy introspection bull Panel talks keynote invited papers with valuable

expert insights

How should we do data-driven introspection129300

8

9

Publications fromdifferent venues

Cohesive categorization

We need

10

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

11

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

12

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

Topic modeling on full contents of the publications

Our approach

Overview of topic modeling

13

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Overview of topic modeling

14

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial

Our methodology

15

Data collection 3062 publications1980-2015

16

1993-19941996-2015 1066

CCSACM Computer and Communications Security

1997-2015 932

NDSSNetwork and Distributed System Security Symposium

1980-2015 456

SampPIEEE Symposium on Security amp Privacy

19931995-19961998-2015

608

USENIXUSENIX Security Symposium

Full content+ Title Authors andtheir affiliations

Session name+ +

From publishersFrom publishers

From websiteFrom website

Pre-processing input for topic modeling

17

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Pre-processing input for topic modeling

18

1) TEXTPDFPS

HTMLMain body stripping off meta-data

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

19

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 7: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

7

Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies

- Computational linguistics Human computer interaction Ubiquitous computing Games hellip

hellip But we lack structured data-driven efforts

Past security amp privacy introspection bull Panel talks keynote invited papers with valuable

expert insights

How should we do data-driven introspection129300

8

9

Publications fromdifferent venues

Cohesive categorization

We need

10

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

11

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

12

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

Topic modeling on full contents of the publications

Our approach

Overview of topic modeling

13

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Overview of topic modeling

14

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial

Our methodology

15

Data collection 3062 publications1980-2015

16

1993-19941996-2015 1066

CCSACM Computer and Communications Security

1997-2015 932

NDSSNetwork and Distributed System Security Symposium

1980-2015 456

SampPIEEE Symposium on Security amp Privacy

19931995-19961998-2015

608

USENIXUSENIX Security Symposium

Full content+ Title Authors andtheir affiliations

Session name+ +

From publishersFrom publishers

From websiteFrom website

Pre-processing input for topic modeling

17

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Pre-processing input for topic modeling

18

1) TEXTPDFPS

HTMLMain body stripping off meta-data

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

19

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 8: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

How should we do data-driven introspection129300

8

9

Publications fromdifferent venues

Cohesive categorization

We need

10

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

11

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

12

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

Topic modeling on full contents of the publications

Our approach

Overview of topic modeling

13

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Overview of topic modeling

14

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial

Our methodology

15

Data collection 3062 publications1980-2015

16

1993-19941996-2015 1066

CCSACM Computer and Communications Security

1997-2015 932

NDSSNetwork and Distributed System Security Symposium

1980-2015 456

SampPIEEE Symposium on Security amp Privacy

19931995-19961998-2015

608

USENIXUSENIX Security Symposium

Full content+ Title Authors andtheir affiliations

Session name+ +

From publishersFrom publishers

From websiteFrom website

Pre-processing input for topic modeling

17

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Pre-processing input for topic modeling

18

1) TEXTPDFPS

HTMLMain body stripping off meta-data

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

19

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 9: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

9

Publications fromdifferent venues

Cohesive categorization

We need

10

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

11

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

12

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

Topic modeling on full contents of the publications

Our approach

Overview of topic modeling

13

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Overview of topic modeling

14

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial

Our methodology

15

Data collection 3062 publications1980-2015

16

1993-19941996-2015 1066

CCSACM Computer and Communications Security

1997-2015 932

NDSSNetwork and Distributed System Security Symposium

1980-2015 456

SampPIEEE Symposium on Security amp Privacy

19931995-19961998-2015

608

USENIXUSENIX Security Symposium

Full content+ Title Authors andtheir affiliations

Session name+ +

From publishersFrom publishers

From websiteFrom website

Pre-processing input for topic modeling

17

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Pre-processing input for topic modeling

18

1) TEXTPDFPS

HTMLMain body stripping off meta-data

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

19

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 10: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

10

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

11

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

12

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

Topic modeling on full contents of the publications

Our approach

Overview of topic modeling

13

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Overview of topic modeling

14

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial

Our methodology

15

Data collection 3062 publications1980-2015

16

1993-19941996-2015 1066

CCSACM Computer and Communications Security

1997-2015 932

NDSSNetwork and Distributed System Security Symposium

1980-2015 456

SampPIEEE Symposium on Security amp Privacy

19931995-19961998-2015

608

USENIXUSENIX Security Symposium

Full content+ Title Authors andtheir affiliations

Session name+ +

From publishersFrom publishers

From websiteFrom website

Pre-processing input for topic modeling

17

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Pre-processing input for topic modeling

18

1) TEXTPDFPS

HTMLMain body stripping off meta-data

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

19

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 11: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

11

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

12

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

Topic modeling on full contents of the publications

Our approach

Overview of topic modeling

13

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Overview of topic modeling

14

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial

Our methodology

15

Data collection 3062 publications1980-2015

16

1993-19941996-2015 1066

CCSACM Computer and Communications Security

1997-2015 932

NDSSNetwork and Distributed System Security Symposium

1980-2015 456

SampPIEEE Symposium on Security amp Privacy

19931995-19961998-2015

608

USENIXUSENIX Security Symposium

Full content+ Title Authors andtheir affiliations

Session name+ +

From publishersFrom publishers

From websiteFrom website

Pre-processing input for topic modeling

17

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Pre-processing input for topic modeling

18

1) TEXTPDFPS

HTMLMain body stripping off meta-data

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

19

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 12: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

12

Publications fromdifferent venues

Cohesive categorization

We need

Data-driven approach from publicationsWe want

Not all SampP venues have keywords eg USENIX security

Problem

Topic modeling on full contents of the publications

Our approach

Overview of topic modeling

13

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Overview of topic modeling

14

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial

Our methodology

15

Data collection 3062 publications1980-2015

16

1993-19941996-2015 1066

CCSACM Computer and Communications Security

1997-2015 932

NDSSNetwork and Distributed System Security Symposium

1980-2015 456

SampPIEEE Symposium on Security amp Privacy

19931995-19961998-2015

608

USENIXUSENIX Security Symposium

Full content+ Title Authors andtheir affiliations

Session name+ +

From publishersFrom publishers

From websiteFrom website

Pre-processing input for topic modeling

17

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Pre-processing input for topic modeling

18

1) TEXTPDFPS

HTMLMain body stripping off meta-data

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

19

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 13: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Overview of topic modeling

13

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Overview of topic modeling

14

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial

Our methodology

15

Data collection 3062 publications1980-2015

16

1993-19941996-2015 1066

CCSACM Computer and Communications Security

1997-2015 932

NDSSNetwork and Distributed System Security Symposium

1980-2015 456

SampPIEEE Symposium on Security amp Privacy

19931995-19961998-2015

608

USENIXUSENIX Security Symposium

Full content+ Title Authors andtheir affiliations

Session name+ +

From publishersFrom publishers

From websiteFrom website

Pre-processing input for topic modeling

17

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Pre-processing input for topic modeling

18

1) TEXTPDFPS

HTMLMain body stripping off meta-data

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

19

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 14: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Overview of topic modeling

14

Latent Dirichlet Allocation (LDA)

Topic ModelingDocuments

TopicsTopic = group of words

Topic distribution for each document

= bag-of-words

Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial

Our methodology

15

Data collection 3062 publications1980-2015

16

1993-19941996-2015 1066

CCSACM Computer and Communications Security

1997-2015 932

NDSSNetwork and Distributed System Security Symposium

1980-2015 456

SampPIEEE Symposium on Security amp Privacy

19931995-19961998-2015

608

USENIXUSENIX Security Symposium

Full content+ Title Authors andtheir affiliations

Session name+ +

From publishersFrom publishers

From websiteFrom website

Pre-processing input for topic modeling

17

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Pre-processing input for topic modeling

18

1) TEXTPDFPS

HTMLMain body stripping off meta-data

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

19

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 15: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Our methodology

15

Data collection 3062 publications1980-2015

16

1993-19941996-2015 1066

CCSACM Computer and Communications Security

1997-2015 932

NDSSNetwork and Distributed System Security Symposium

1980-2015 456

SampPIEEE Symposium on Security amp Privacy

19931995-19961998-2015

608

USENIXUSENIX Security Symposium

Full content+ Title Authors andtheir affiliations

Session name+ +

From publishersFrom publishers

From websiteFrom website

Pre-processing input for topic modeling

17

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Pre-processing input for topic modeling

18

1) TEXTPDFPS

HTMLMain body stripping off meta-data

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

19

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 16: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Data collection 3062 publications1980-2015

16

1993-19941996-2015 1066

CCSACM Computer and Communications Security

1997-2015 932

NDSSNetwork and Distributed System Security Symposium

1980-2015 456

SampPIEEE Symposium on Security amp Privacy

19931995-19961998-2015

608

USENIXUSENIX Security Symposium

Full content+ Title Authors andtheir affiliations

Session name+ +

From publishersFrom publishers

From websiteFrom website

Pre-processing input for topic modeling

17

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Pre-processing input for topic modeling

18

1) TEXTPDFPS

HTMLMain body stripping off meta-data

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

19

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 17: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Pre-processing input for topic modeling

17

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Pre-processing input for topic modeling

18

1) TEXTPDFPS

HTMLMain body stripping off meta-data

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

19

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 18: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Pre-processing input for topic modeling

18

1) TEXTPDFPS

HTMLMain body stripping off meta-data

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

19

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 19: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Pre-processing input for topic modeling

19

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 20: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Pre-processing input for topic modeling

20

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 21: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Pre-processing input for topic modeling

21

1) TEXTPDFPS

HTMLMain body stripping off meta-data

Lemmatization [eg attacksattackedattacking rarr attack]3)

Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]

4)

Stopword list bull Most common English words [a has the] bull Common across our corpus

[words with low Inverse Document Frequency (IDF)]

5)

2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 22: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Generating and selecting a topic model

22

By varying bull of topics (60 to 120) bull hyperparameters

Different topic models

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 23: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Generating and selecting a topic model

23

Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr

Different topic models

Model ranking with average PMI

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 24: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Generating and selecting a topic model

24

Different topic models

Model ranking with average PMI

5 high-scoring models

No perfect model (not uncommon)Human intervention (accepted)

Highest scoring model Post-processing to refine

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 25: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Refining selected topic model

25

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 26: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Refining selected topic model

26

1 Graph with bull vertices = publications bull edge weights = divergences between topic

distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide

Mixed topics

Eg Garbled circuit and integrated circuit [share words circuit gate bit]

= garbled circuit = integrated circuit

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 27: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Topic labeling

27

bull Top words bull Top publications

- keywords or CCS index (if available) - session name (if available)

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 28: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Letrsquos take a look at our final model

28

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 29: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Total of 95 topics

29

(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and

defenses Secure (multiparty) computation

Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling

Binary code analysis Encryption Machine learning Security policies

BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack

Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization

Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security

Client-server accountability Genomics Network attacks defenses and detections TCPIP

Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem

Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime

Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs

Cryptographic protocols Institutional security Online services Virtual machines and virtualization

DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning

Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches

Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities

Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers

Grouped into 20 categories

CRYPTO TRUST FORMALISM SYSTEM

HARDWARE NETWORKS WEB AUTH

COMPUTATION DATA MALWARE PROGRAMS

INFORMATION LEAKAGE

INTERNET MOBILE CRIME amp FRAUD

ANONYMITY amp CENSORSHIP

VIRTUAL METHOD MISCELLANEOUS

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 30: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Example CRYPTO category

30

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 31: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Example CRYPTO category

31

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 32: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Example CRYPTO category

32

Topic label Top 5 words from LDA

Cryptographic protocols protocol session party session-key secret

Encryption encryption ciphertext encrypted decryption decrypt

Network authentication authentication authenticate kerberos secret service

Crypto and number theory mod bit prime rsa random

Digital signature signature sign public signer verification

Public-key cryptography certificate CA trust revocation sign

Key distributionmanagement round broadcast secret threshold secret-sharing

Group communication group member multicast join communication

Random numbers entropy output random pool randomness

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 33: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Example CRIME amp FRAUD category

33

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 34: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Example CRIME amp FRAUD category

34

Topic label Top 5 words from LDADark web site URL search web website

Spam scam and fraud spam email account mail post

Online advertising ad ads publisher click target

Online crime account market service customer country

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 35: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Letrsquos look at some trends

35

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 36: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

36

How have the venues changed over time

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 37: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

37

How have the venues changed over time

Entropyuarr asymp Diversityuarr

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 38: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

How has the category distribution changed over time

38

CRYPTO PROGRAMS

FORMALISMMISC

METHOD

MALWARESYSTEM

NETWORK

WEBMOBILE

HARDWARE

INTERNET

ANON amp CENSOR

COMPUTATION

VIRTUAL

TRUST

DATAAUTH

INFO LEAKAGE CRIME amp FRAUD

PROGRAMSCRYPTO

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 39: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

39

How consistent are authors and topics year-to-year

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 40: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

40

Jaccard Index(88-89 90-91)

Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr

How consistent are authors and topics year-to-year

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 41: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

How has industry and government participation changed over time

41

Academics

Academics + Industry

Industry

GovernmentOthers

Academics + Government

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 42: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

42

Do non-academic collaborators have same interests

Mobile appVeri comp amp ZKPMachine learning

MalwareData privacy

Social networks amp (De)annon

Crypto amp num theoryInformation flow

Formal meth amp verVMs amp virtualization

Tor

Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability

BrowserSecure compDark webJavaScript

SSLTLSProgram exploit

Side-channelHW low level

Static amp dynamic analysisBinary analysis

GovernmentIndustry

Overall

Top 15 topics in recent years (2011-2015)

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 43: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Tool and data availability

43

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 44: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip

Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip

44

Our site secprivmetanet

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 45: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

secprivmetanet demo

45

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 46: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu

46

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 47: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

ReferencesBackup

47

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 48: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Publications

- Towards a computational history of the ACL 1980-2008

- Studying the history of ideas using topic models

- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change

- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis

- Games research today Analyzing the academic landscape 2000-2014

bull Online tooldataset

- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975

papers on the study of computational linguistics and natural language processing

References introspective studies in other fields

48

Category trends using number of papers

49

Page 49: Aniqua Baset Tamara Denning School of Computing ... · Data collection: 3062 publications,1980-2015 16 1993-1994 1996-2015 1066 CCS ACM Computer and Communications Security 1997-2015

Category trends using number of papers

49