aniqua baset tamara denning school of computing ... · data collection: 3062 publications,1980-2015...
TRANSCRIPT
Aniqua Baset Tamara DenningSchool of Computing University of Utah
A Data-Driven Reflection on36 Years of Security and Privacy Research
What we did
2
3062 publications1980-2015
Topic modeling Trends in authorship and contents
What we did
3
3062 publications1980-2015
Topic modeling Trends in authorship and contents
Online visualizations + Data
4
So why even do this kind of study129300
5
Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies
- Computational linguistics Human computer interaction Ubiquitous computing Games hellip
6
Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies
- Computational linguistics Human computer interaction Ubiquitous computing Games hellip
Past security amp privacy introspection bull Panel talks keynote invited papers with valuable
expert insights
7
Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies
- Computational linguistics Human computer interaction Ubiquitous computing Games hellip
hellip But we lack structured data-driven efforts
Past security amp privacy introspection bull Panel talks keynote invited papers with valuable
expert insights
How should we do data-driven introspection129300
8
9
Publications fromdifferent venues
Cohesive categorization
We need
10
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
11
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
12
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
Topic modeling on full contents of the publications
Our approach
Overview of topic modeling
13
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Overview of topic modeling
14
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial
Our methodology
15
Data collection 3062 publications1980-2015
16
1993-19941996-2015 1066
CCSACM Computer and Communications Security
1997-2015 932
NDSSNetwork and Distributed System Security Symposium
1980-2015 456
SampPIEEE Symposium on Security amp Privacy
19931995-19961998-2015
608
USENIXUSENIX Security Symposium
Full content+ Title Authors andtheir affiliations
Session name+ +
From publishersFrom publishers
From websiteFrom website
Pre-processing input for topic modeling
17
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Pre-processing input for topic modeling
18
1) TEXTPDFPS
HTMLMain body stripping off meta-data
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
19
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
What we did
2
3062 publications1980-2015
Topic modeling Trends in authorship and contents
What we did
3
3062 publications1980-2015
Topic modeling Trends in authorship and contents
Online visualizations + Data
4
So why even do this kind of study129300
5
Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies
- Computational linguistics Human computer interaction Ubiquitous computing Games hellip
6
Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies
- Computational linguistics Human computer interaction Ubiquitous computing Games hellip
Past security amp privacy introspection bull Panel talks keynote invited papers with valuable
expert insights
7
Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies
- Computational linguistics Human computer interaction Ubiquitous computing Games hellip
hellip But we lack structured data-driven efforts
Past security amp privacy introspection bull Panel talks keynote invited papers with valuable
expert insights
How should we do data-driven introspection129300
8
9
Publications fromdifferent venues
Cohesive categorization
We need
10
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
11
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
12
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
Topic modeling on full contents of the publications
Our approach
Overview of topic modeling
13
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Overview of topic modeling
14
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial
Our methodology
15
Data collection 3062 publications1980-2015
16
1993-19941996-2015 1066
CCSACM Computer and Communications Security
1997-2015 932
NDSSNetwork and Distributed System Security Symposium
1980-2015 456
SampPIEEE Symposium on Security amp Privacy
19931995-19961998-2015
608
USENIXUSENIX Security Symposium
Full content+ Title Authors andtheir affiliations
Session name+ +
From publishersFrom publishers
From websiteFrom website
Pre-processing input for topic modeling
17
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Pre-processing input for topic modeling
18
1) TEXTPDFPS
HTMLMain body stripping off meta-data
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
19
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
What we did
3
3062 publications1980-2015
Topic modeling Trends in authorship and contents
Online visualizations + Data
4
So why even do this kind of study129300
5
Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies
- Computational linguistics Human computer interaction Ubiquitous computing Games hellip
6
Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies
- Computational linguistics Human computer interaction Ubiquitous computing Games hellip
Past security amp privacy introspection bull Panel talks keynote invited papers with valuable
expert insights
7
Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies
- Computational linguistics Human computer interaction Ubiquitous computing Games hellip
hellip But we lack structured data-driven efforts
Past security amp privacy introspection bull Panel talks keynote invited papers with valuable
expert insights
How should we do data-driven introspection129300
8
9
Publications fromdifferent venues
Cohesive categorization
We need
10
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
11
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
12
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
Topic modeling on full contents of the publications
Our approach
Overview of topic modeling
13
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Overview of topic modeling
14
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial
Our methodology
15
Data collection 3062 publications1980-2015
16
1993-19941996-2015 1066
CCSACM Computer and Communications Security
1997-2015 932
NDSSNetwork and Distributed System Security Symposium
1980-2015 456
SampPIEEE Symposium on Security amp Privacy
19931995-19961998-2015
608
USENIXUSENIX Security Symposium
Full content+ Title Authors andtheir affiliations
Session name+ +
From publishersFrom publishers
From websiteFrom website
Pre-processing input for topic modeling
17
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Pre-processing input for topic modeling
18
1) TEXTPDFPS
HTMLMain body stripping off meta-data
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
19
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
4
So why even do this kind of study129300
5
Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies
- Computational linguistics Human computer interaction Ubiquitous computing Games hellip
6
Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies
- Computational linguistics Human computer interaction Ubiquitous computing Games hellip
Past security amp privacy introspection bull Panel talks keynote invited papers with valuable
expert insights
7
Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies
- Computational linguistics Human computer interaction Ubiquitous computing Games hellip
hellip But we lack structured data-driven efforts
Past security amp privacy introspection bull Panel talks keynote invited papers with valuable
expert insights
How should we do data-driven introspection129300
8
9
Publications fromdifferent venues
Cohesive categorization
We need
10
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
11
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
12
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
Topic modeling on full contents of the publications
Our approach
Overview of topic modeling
13
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Overview of topic modeling
14
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial
Our methodology
15
Data collection 3062 publications1980-2015
16
1993-19941996-2015 1066
CCSACM Computer and Communications Security
1997-2015 932
NDSSNetwork and Distributed System Security Symposium
1980-2015 456
SampPIEEE Symposium on Security amp Privacy
19931995-19961998-2015
608
USENIXUSENIX Security Symposium
Full content+ Title Authors andtheir affiliations
Session name+ +
From publishersFrom publishers
From websiteFrom website
Pre-processing input for topic modeling
17
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Pre-processing input for topic modeling
18
1) TEXTPDFPS
HTMLMain body stripping off meta-data
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
19
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
5
Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies
- Computational linguistics Human computer interaction Ubiquitous computing Games hellip
6
Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies
- Computational linguistics Human computer interaction Ubiquitous computing Games hellip
Past security amp privacy introspection bull Panel talks keynote invited papers with valuable
expert insights
7
Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies
- Computational linguistics Human computer interaction Ubiquitous computing Games hellip
hellip But we lack structured data-driven efforts
Past security amp privacy introspection bull Panel talks keynote invited papers with valuable
expert insights
How should we do data-driven introspection129300
8
9
Publications fromdifferent venues
Cohesive categorization
We need
10
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
11
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
12
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
Topic modeling on full contents of the publications
Our approach
Overview of topic modeling
13
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Overview of topic modeling
14
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial
Our methodology
15
Data collection 3062 publications1980-2015
16
1993-19941996-2015 1066
CCSACM Computer and Communications Security
1997-2015 932
NDSSNetwork and Distributed System Security Symposium
1980-2015 456
SampPIEEE Symposium on Security amp Privacy
19931995-19961998-2015
608
USENIXUSENIX Security Symposium
Full content+ Title Authors andtheir affiliations
Session name+ +
From publishersFrom publishers
From websiteFrom website
Pre-processing input for topic modeling
17
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Pre-processing input for topic modeling
18
1) TEXTPDFPS
HTMLMain body stripping off meta-data
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
19
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
6
Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies
- Computational linguistics Human computer interaction Ubiquitous computing Games hellip
Past security amp privacy introspection bull Panel talks keynote invited papers with valuable
expert insights
7
Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies
- Computational linguistics Human computer interaction Ubiquitous computing Games hellip
hellip But we lack structured data-driven efforts
Past security amp privacy introspection bull Panel talks keynote invited papers with valuable
expert insights
How should we do data-driven introspection129300
8
9
Publications fromdifferent venues
Cohesive categorization
We need
10
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
11
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
12
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
Topic modeling on full contents of the publications
Our approach
Overview of topic modeling
13
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Overview of topic modeling
14
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial
Our methodology
15
Data collection 3062 publications1980-2015
16
1993-19941996-2015 1066
CCSACM Computer and Communications Security
1997-2015 932
NDSSNetwork and Distributed System Security Symposium
1980-2015 456
SampPIEEE Symposium on Security amp Privacy
19931995-19961998-2015
608
USENIXUSENIX Security Symposium
Full content+ Title Authors andtheir affiliations
Session name+ +
From publishersFrom publishers
From websiteFrom website
Pre-processing input for topic modeling
17
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Pre-processing input for topic modeling
18
1) TEXTPDFPS
HTMLMain body stripping off meta-data
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
19
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
7
Introspection is importantbull Past evolution and future direction bull Comprehensive overview for newexternal audience bull Many communities have introspective studies
- Computational linguistics Human computer interaction Ubiquitous computing Games hellip
hellip But we lack structured data-driven efforts
Past security amp privacy introspection bull Panel talks keynote invited papers with valuable
expert insights
How should we do data-driven introspection129300
8
9
Publications fromdifferent venues
Cohesive categorization
We need
10
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
11
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
12
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
Topic modeling on full contents of the publications
Our approach
Overview of topic modeling
13
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Overview of topic modeling
14
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial
Our methodology
15
Data collection 3062 publications1980-2015
16
1993-19941996-2015 1066
CCSACM Computer and Communications Security
1997-2015 932
NDSSNetwork and Distributed System Security Symposium
1980-2015 456
SampPIEEE Symposium on Security amp Privacy
19931995-19961998-2015
608
USENIXUSENIX Security Symposium
Full content+ Title Authors andtheir affiliations
Session name+ +
From publishersFrom publishers
From websiteFrom website
Pre-processing input for topic modeling
17
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Pre-processing input for topic modeling
18
1) TEXTPDFPS
HTMLMain body stripping off meta-data
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
19
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
How should we do data-driven introspection129300
8
9
Publications fromdifferent venues
Cohesive categorization
We need
10
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
11
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
12
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
Topic modeling on full contents of the publications
Our approach
Overview of topic modeling
13
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Overview of topic modeling
14
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial
Our methodology
15
Data collection 3062 publications1980-2015
16
1993-19941996-2015 1066
CCSACM Computer and Communications Security
1997-2015 932
NDSSNetwork and Distributed System Security Symposium
1980-2015 456
SampPIEEE Symposium on Security amp Privacy
19931995-19961998-2015
608
USENIXUSENIX Security Symposium
Full content+ Title Authors andtheir affiliations
Session name+ +
From publishersFrom publishers
From websiteFrom website
Pre-processing input for topic modeling
17
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Pre-processing input for topic modeling
18
1) TEXTPDFPS
HTMLMain body stripping off meta-data
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
19
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
9
Publications fromdifferent venues
Cohesive categorization
We need
10
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
11
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
12
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
Topic modeling on full contents of the publications
Our approach
Overview of topic modeling
13
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Overview of topic modeling
14
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial
Our methodology
15
Data collection 3062 publications1980-2015
16
1993-19941996-2015 1066
CCSACM Computer and Communications Security
1997-2015 932
NDSSNetwork and Distributed System Security Symposium
1980-2015 456
SampPIEEE Symposium on Security amp Privacy
19931995-19961998-2015
608
USENIXUSENIX Security Symposium
Full content+ Title Authors andtheir affiliations
Session name+ +
From publishersFrom publishers
From websiteFrom website
Pre-processing input for topic modeling
17
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Pre-processing input for topic modeling
18
1) TEXTPDFPS
HTMLMain body stripping off meta-data
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
19
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
10
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
11
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
12
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
Topic modeling on full contents of the publications
Our approach
Overview of topic modeling
13
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Overview of topic modeling
14
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial
Our methodology
15
Data collection 3062 publications1980-2015
16
1993-19941996-2015 1066
CCSACM Computer and Communications Security
1997-2015 932
NDSSNetwork and Distributed System Security Symposium
1980-2015 456
SampPIEEE Symposium on Security amp Privacy
19931995-19961998-2015
608
USENIXUSENIX Security Symposium
Full content+ Title Authors andtheir affiliations
Session name+ +
From publishersFrom publishers
From websiteFrom website
Pre-processing input for topic modeling
17
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Pre-processing input for topic modeling
18
1) TEXTPDFPS
HTMLMain body stripping off meta-data
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
19
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
11
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
12
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
Topic modeling on full contents of the publications
Our approach
Overview of topic modeling
13
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Overview of topic modeling
14
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial
Our methodology
15
Data collection 3062 publications1980-2015
16
1993-19941996-2015 1066
CCSACM Computer and Communications Security
1997-2015 932
NDSSNetwork and Distributed System Security Symposium
1980-2015 456
SampPIEEE Symposium on Security amp Privacy
19931995-19961998-2015
608
USENIXUSENIX Security Symposium
Full content+ Title Authors andtheir affiliations
Session name+ +
From publishersFrom publishers
From websiteFrom website
Pre-processing input for topic modeling
17
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Pre-processing input for topic modeling
18
1) TEXTPDFPS
HTMLMain body stripping off meta-data
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
19
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
12
Publications fromdifferent venues
Cohesive categorization
We need
Data-driven approach from publicationsWe want
Not all SampP venues have keywords eg USENIX security
Problem
Topic modeling on full contents of the publications
Our approach
Overview of topic modeling
13
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Overview of topic modeling
14
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial
Our methodology
15
Data collection 3062 publications1980-2015
16
1993-19941996-2015 1066
CCSACM Computer and Communications Security
1997-2015 932
NDSSNetwork and Distributed System Security Symposium
1980-2015 456
SampPIEEE Symposium on Security amp Privacy
19931995-19961998-2015
608
USENIXUSENIX Security Symposium
Full content+ Title Authors andtheir affiliations
Session name+ +
From publishersFrom publishers
From websiteFrom website
Pre-processing input for topic modeling
17
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Pre-processing input for topic modeling
18
1) TEXTPDFPS
HTMLMain body stripping off meta-data
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
19
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Overview of topic modeling
13
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Overview of topic modeling
14
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial
Our methodology
15
Data collection 3062 publications1980-2015
16
1993-19941996-2015 1066
CCSACM Computer and Communications Security
1997-2015 932
NDSSNetwork and Distributed System Security Symposium
1980-2015 456
SampPIEEE Symposium on Security amp Privacy
19931995-19961998-2015
608
USENIXUSENIX Security Symposium
Full content+ Title Authors andtheir affiliations
Session name+ +
From publishersFrom publishers
From websiteFrom website
Pre-processing input for topic modeling
17
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Pre-processing input for topic modeling
18
1) TEXTPDFPS
HTMLMain body stripping off meta-data
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
19
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Overview of topic modeling
14
Latent Dirichlet Allocation (LDA)
Topic ModelingDocuments
TopicsTopic = group of words
Topic distribution for each document
= bag-of-words
Challenges bull Measuring quality of a topic model is hard bull High-scoring topic ne High-quality for people bull Pre-processing texts is crucial
Our methodology
15
Data collection 3062 publications1980-2015
16
1993-19941996-2015 1066
CCSACM Computer and Communications Security
1997-2015 932
NDSSNetwork and Distributed System Security Symposium
1980-2015 456
SampPIEEE Symposium on Security amp Privacy
19931995-19961998-2015
608
USENIXUSENIX Security Symposium
Full content+ Title Authors andtheir affiliations
Session name+ +
From publishersFrom publishers
From websiteFrom website
Pre-processing input for topic modeling
17
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Pre-processing input for topic modeling
18
1) TEXTPDFPS
HTMLMain body stripping off meta-data
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
19
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Our methodology
15
Data collection 3062 publications1980-2015
16
1993-19941996-2015 1066
CCSACM Computer and Communications Security
1997-2015 932
NDSSNetwork and Distributed System Security Symposium
1980-2015 456
SampPIEEE Symposium on Security amp Privacy
19931995-19961998-2015
608
USENIXUSENIX Security Symposium
Full content+ Title Authors andtheir affiliations
Session name+ +
From publishersFrom publishers
From websiteFrom website
Pre-processing input for topic modeling
17
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Pre-processing input for topic modeling
18
1) TEXTPDFPS
HTMLMain body stripping off meta-data
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
19
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Data collection 3062 publications1980-2015
16
1993-19941996-2015 1066
CCSACM Computer and Communications Security
1997-2015 932
NDSSNetwork and Distributed System Security Symposium
1980-2015 456
SampPIEEE Symposium on Security amp Privacy
19931995-19961998-2015
608
USENIXUSENIX Security Symposium
Full content+ Title Authors andtheir affiliations
Session name+ +
From publishersFrom publishers
From websiteFrom website
Pre-processing input for topic modeling
17
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Pre-processing input for topic modeling
18
1) TEXTPDFPS
HTMLMain body stripping off meta-data
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
19
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Pre-processing input for topic modeling
17
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Pre-processing input for topic modeling
18
1) TEXTPDFPS
HTMLMain body stripping off meta-data
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
19
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Pre-processing input for topic modeling
18
1) TEXTPDFPS
HTMLMain body stripping off meta-data
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
19
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Pre-processing input for topic modeling
19
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Pre-processing input for topic modeling
20
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Pre-processing input for topic modeling
21
1) TEXTPDFPS
HTMLMain body stripping off meta-data
Lemmatization [eg attacksattackedattacking rarr attack]3)
Preserving technical phrases [eg man in the middlerarrman-in-the-middle MITMrarrman-in-the-middle]
4)
Stopword list bull Most common English words [a has the] bull Common across our corpus
[words with low Inverse Document Frequency (IDF)]
5)
2) Fixing conversion errors [eg fiflffl ligatures homoglyphs]
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Generating and selecting a topic model
22
By varying bull of topics (60 to 120) bull hyperparameters
Different topic models
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Generating and selecting a topic model
23
Pointwise Mutual Information (PMI)PMI = coherence score of a topic based on topic-words PMIuarr Topic qualityuarr
Different topic models
Model ranking with average PMI
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Generating and selecting a topic model
24
Different topic models
Model ranking with average PMI
5 high-scoring models
No perfect model (not uncommon)Human intervention (accepted)
Highest scoring model Post-processing to refine
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Refining selected topic model
25
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Refining selected topic model
26
1 Graph with bull vertices = publications bull edge weights = divergences between topic
distributions (using Kullback-Leibler divergence) 2 Find sub-community using graph modularity 3 Is a sub-community is a valid topic Yes rarr divide
Mixed topics
Eg Garbled circuit and integrated circuit [share words circuit gate bit]
= garbled circuit = integrated circuit
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Topic labeling
27
bull Top words bull Top publications
- keywords or CCS index (if available) - session name (if available)
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Letrsquos take a look at our final model
28
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Total of 95 topics
29
(De)obfuscation and decompilation Domain Name System (DNS) Location privacytracking Real-world sensing(User) interfaces E-commerce Malicious hardware RoutingAccess control Electronic voting Malware SSLTLSAutomated analysis protocols and files Embedded and hardware security Memory disclosure attacks and
defenses Secure (multiparty) computation
Anonymity Encodingdecoding Memory errors exploits and defenses Security labeling
Binary code analysis Encryption Machine learning Security policies
BitcoinCrypto-currency File and file system security String matching and regular expressions Side-channel attack
Bots and Botnet Fingerprints and fingerprinting System calls Social network and (de)anonymization
Browser security Formal methods User study Software and trustCAPTCHA Formal methods and verification Mobile app Spam scam and fraudCards and tokens Formal specification and verification Mobile devices Static and dynamic analysisCensorship Game and game theory Mobile network Storage security
Client-server accountability Genomics Network attacks defenses and detections TCPIP
Cloud Group communication Network authentication TorCompartmentalization Hardwares RFIDs and ICs Network design Trust managementControl flow Hardwares low level Network perimeter controls Trusted computingsystem
Covert channel Hardwares physical properties Network traffic analysis attacks and defences Online crime
Crypto and number theory Information flow Online advertising Verifiable computation and zero knowledge proofs
Cryptographic protocols Institutional security Online services Virtual machines and virtualization
DOM and documents Intrusionanomaly detection Passwords Viruses and worms propagation and scanning
Dark web Java security Peer-to-peer communications Vulnerabilities exploits disclosure and patches
Data privacy JavaScript security Program exploitations attacks and defenses Web application vulnerabilities
Databases Kernels Public-key cryptography Wireless signalDigital signature Key distributionmanagement Random numbers
Grouped into 20 categories
CRYPTO TRUST FORMALISM SYSTEM
HARDWARE NETWORKS WEB AUTH
COMPUTATION DATA MALWARE PROGRAMS
INFORMATION LEAKAGE
INTERNET MOBILE CRIME amp FRAUD
ANONYMITY amp CENSORSHIP
VIRTUAL METHOD MISCELLANEOUS
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Example CRYPTO category
30
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Example CRYPTO category
31
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Example CRYPTO category
32
Topic label Top 5 words from LDA
Cryptographic protocols protocol session party session-key secret
Encryption encryption ciphertext encrypted decryption decrypt
Network authentication authentication authenticate kerberos secret service
Crypto and number theory mod bit prime rsa random
Digital signature signature sign public signer verification
Public-key cryptography certificate CA trust revocation sign
Key distributionmanagement round broadcast secret threshold secret-sharing
Group communication group member multicast join communication
Random numbers entropy output random pool randomness
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Example CRIME amp FRAUD category
33
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Example CRIME amp FRAUD category
34
Topic label Top 5 words from LDADark web site URL search web website
Spam scam and fraud spam email account mail post
Online advertising ad ads publisher click target
Online crime account market service customer country
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Letrsquos look at some trends
35
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
36
How have the venues changed over time
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
37
How have the venues changed over time
Entropyuarr asymp Diversityuarr
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
How has the category distribution changed over time
38
CRYPTO PROGRAMS
FORMALISMMISC
METHOD
MALWARESYSTEM
NETWORK
WEBMOBILE
HARDWARE
INTERNET
ANON amp CENSOR
COMPUTATION
VIRTUAL
TRUST
DATAAUTH
INFO LEAKAGE CRIME amp FRAUD
PROGRAMSCRYPTO
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
39
How consistent are authors and topics year-to-year
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
40
Jaccard Index(88-89 90-91)
Similarity between topic sets of 88-89 and 90-91= Jaccard Indexuarr asymp Overlapuarr
How consistent are authors and topics year-to-year
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
How has industry and government participation changed over time
41
Academics
Academics + Industry
Industry
GovernmentOthers
Academics + Government
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
42
Do non-academic collaborators have same interests
Mobile appVeri comp amp ZKPMachine learning
MalwareData privacy
Social networks amp (De)annon
Crypto amp num theoryInformation flow
Formal meth amp verVMs amp virtualization
Tor
Online advertisingDOM amp documentsBots amp botnetSpam scam amp fraudClient-severaccountability
BrowserSecure compDark webJavaScript
SSLTLSProgram exploit
Side-channelHW low level
Static amp dynamic analysisBinary analysis
GovernmentIndustry
Overall
Top 15 topics in recent years (2011-2015)
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Tool and data availability
43
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Interactive visualizationsTopics | Topic-timelines | Topic-words | Publications | Authors | hellip
Available dataMeta-data with categorized affiliations | Acronym list |Stop-word list | Original topic model | hellip
44
Our site secprivmetanet
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
secprivmetanet demo
45
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Aniqua Baset aniquacsutaheduDr Tamara Denning tdenningcsutahedu
46
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
ReferencesBackup
47
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Publications
- Towards a computational history of the ACL 1980-2008
- Studying the history of ideas using topic models
- Identifying crisis of Ubicomp mapping 15 years of the fieldrsquos development and paradigm change
- CHI 1994-2013 mapping two decades of intellectual progress through co-word analysis
- Games research today Analyzing the academic landscape 2000-2014
bull Online tooldataset
- ACL Anthology Network (All About NLP) bull Text meta-data created using papers from ACL Anthology which hosts 51975
papers on the study of computational linguistics and natural language processing
References introspective studies in other fields
48
Category trends using number of papers
49
Category trends using number of papers
49