email transaction network analysis

44
ELECTRONIC MAIL TRANSACTION NETWORK ANALYSIS Keith P. Jolley

Upload: keithpjolley

Post on 24-Jan-2015

784 views

Category:

Education


1 download

DESCRIPTION

Thesis defense presentation - SDSU Computational Sciences, 2013. Use a number of network analysis tools, including community detection, pagerank, eigenvector centrality, etc., to determine key metrics of graphs determined by key word searches. These custom graphs and the associated metrics are then presented in interactive graphics and tables.

TRANSCRIPT

  • 1. ELECTRONIC MAIL TRANSACTION NETWORK ANALYSIS Keith P. Jolley

2. INTRODUCTION Deliverable of this project: Provide a means to identify who to turn to for more information on a topic, or several topics Provide better insight into project organization Use Network Analysis tools Augment, not replace, existing enterprise search tools Social Network Analysis Community detection algorithm PageRank, 3. WHAT WAS DONE Used several datasets as input: Archived public Qualcomm mail-lists: 136 mailboxes, 10.0 GB 20393 vertices, 940403 edges Enron email: 158 mailboxes, 1.3 GB 90026 vertices, 3715056 edges Test datasets: Karate Club: 34 vertices, 78 edges Les Misrables: 77 vertices, 820 edges Created an interactive web client: User search term input Interactive graphics and tables of metrics Perl CGI, R, C++, Javascript, D3 (Data Driven Documents) 4. MAILBOX INPUT Enron dataset: user based Wide range of topics per inbox All emails in each mailbox are all from:, to:, or cc: the same person Examples: Jeff Skilling Kenneth Lay Qualcomm dataset: topic based All emails have, for the most part, a common theme Emails in each mailbox are from multiple senders All emails include the mail-list as a recipient Emails may include other recipients, including other mail-lists Examples: Photography Android Hiking 5. User Interface - URL Enron: Test: Qualcomm: 6. User Interface mail-list Enron: Test: Qualcomm: 7. User Interface search Enron: Search term parsing: user input: San Diego Power regex: (san|diego|power) All Fields: searches all dataset columns Topic: search only the subject column People: only search to: and from: columns Maillist: matches only on the mail-list column* 8. PREPROCESSING Message-ID: Date: Wed, 7 Mar 2001 01:56:00 -0800 (PST) From: [email protected] To: [email protected] Subject: Re: Feb 12 delivery of "Methane Arctic" Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bcc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] X-From: "Joseph McKechnie" X-To: [email protected] X-cc: "Christopher Skinner" , [email protected], Paul.Y'[email protected], [email protected], "Rudy Adamiak" , "Jane Michalek" X-bcc: X-Folder: Paul_Ybarbo_Nov2001Notes FoldersCabot X-Origin: YBARBO-P X-FileName: pybarbo.nsf ... Transform Email 9. PREPROCESSING Message-ID: Date: Wed, 7 Mar 2001 01:56:00 -0800 (PST) From: [email protected] To: [email protected] Subject: Re: Feb 12 delivery of "Methane Arctic" Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bcc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] X-From: "Joseph McKechnie" X-To: [email protected] X-cc: "Christopher Skinner" , [email protected], Paul.Y'[email protected], [email protected], "Rudy Adamiak" , "Jane Michalek" X-bcc: X-Folder: Paul_Ybarbo_Nov2001Notes FoldersCabot X-Origin: YBARBO-P X-FileName: pybarbo.nsf ... Source Target Mailbox Subject jmckechnie dan.masters ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie jaime.sanabria ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie barbo ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie todd.peterson ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" Transform Email Into net.lists 10. PREPROCESSING Message-ID: Date: Wed, 7 Mar 2001 01:56:00 -0800 (PST) From: [email protected] To: [email protected] Subject: Re: Feb 12 delivery of "Methane Arctic" Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bcc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] X-From: "Joseph McKechnie" X-To: [email protected] X-cc: "Christopher Skinner" , [email protected], Paul.Y'[email protected], [email protected], "Rudy Adamiak" , "Jane Michalek" X-bcc: X-Folder: Paul_Ybarbo_Nov2001Notes FoldersCabot X-Origin: YBARBO-P X-FileName: pybarbo.nsf ... Source Target Mailbox Subject jmckechnie dan.masters ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie jaime.sanabria ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie barbo ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie todd.peterson ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" Message-ID: Date: Wed, 7 Mar 2001 01:56:00 -0800 (PST) From: [email protected] To: [email protected] Subject: Re: Feb 12 delivery of "Methane Arctic" Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bcc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] X-From: "Joseph McKechnie" X-To: [email protected] X-cc: "Christopher Skinner" , [email protected], Paul.Y'[email protected], [email protected], "Rudy Adamiak" , "Jane Michalek" X-bcc: X-Folder: Paul_Ybarbo_Nov2001Notes FoldersCabot X-Origin: YBARBO-P X-FileName: pybarbo.nsf ... Source Target Mailbox Subject jmckechnie dan.masters ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie jaime.sanabria ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie barbo ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie todd.peterson ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" Message-ID: Date: Wed, 7 Mar 2001 01:56:00 -0800 (PST) From: [email protected] To: [email protected] Subject: Re: Feb 12 delivery of "Methane Arctic" Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bcc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] X-From: "Joseph McKechnie" X-To: [email protected] X-cc: "Christopher Skinner" , [email protected], Paul.Y'[email protected], [email protected], "Rudy Adamiak" , "Jane Michalek" X-bcc: X-Folder: Paul_Ybarbo_Nov2001Notes FoldersCabot X-Origin: YBARBO-P X-FileName: pybarbo.nsf ... Source Target Mailbox Subject jmckechnie dan.masters ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie jaime.sanabria ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie barbo ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie todd.peterson ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" Message-ID: Date: Wed, 7 Mar 2001 01:56:00 -0800 (PST) From: [email protected] To: [email protected] Subject: Re: Feb 12 delivery of "Methane Arctic" Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bcc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] X-From: "Joseph McKechnie" X-To: [email protected] X-cc: "Christopher Skinner" , [email protected], Paul.Y'[email protected], [email protected], "Rudy Adamiak" , "Jane Michalek" X-bcc: X-Folder: Paul_Ybarbo_Nov2001Notes FoldersCabot X-Origin: YBARBO-P X-FileName: pybarbo.nsf ... Source Target Mailbox Subject jmckechnie dan.masters ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie jaime.sanabria ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie barbo ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie todd.peterson ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" 11. PREPROCESSING Message-ID: Date: Wed, 7 Mar 2001 01:56:00 -0800 (PST) From: [email protected] To: [email protected] Subject: Re: Feb 12 delivery of "Methane Arctic" Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bcc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] X-From: "Joseph McKechnie" X-To: [email protected] X-cc: "Christopher Skinner" , [email protected], Paul.Y'[email protected], [email protected], "Rudy Adamiak" , "Jane Michalek" X-bcc: X-Folder: Paul_Ybarbo_Nov2001Notes FoldersCabot X-Origin: YBARBO-P X-FileName: pybarbo.nsf ... Source Target Mailbox Subject jmckechnie dan.masters ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie jaime.sanabria ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie barbo ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie todd.peterson ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" Message-ID: Date: Wed, 7 Mar 2001 01:56:00 -0800 (PST) From: [email protected] To: [email protected] Subject: Re: Feb 12 delivery of "Methane Arctic" Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bcc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] X-From: "Joseph McKechnie" X-To: [email protected] X-cc: "Christopher Skinner" , [email protected], Paul.Y'[email protected], [email protected], "Rudy Adamiak" , "Jane Michalek" X-bcc: X-Folder: Paul_Ybarbo_Nov2001Notes FoldersCabot X-Origin: YBARBO-P X-FileName: pybarbo.nsf ... Source Target Mailbox Subject jmckechnie dan.masters ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie jaime.sanabria ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie barbo ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie todd.peterson ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" Message-ID: Date: Wed, 7 Mar 2001 01:56:00 -0800 (PST) From: [email protected] To: [email protected] Subject: Re: Feb 12 delivery of "Methane Arctic" Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bcc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] X-From: "Joseph McKechnie" X-To: [email protected] X-cc: "Christopher Skinner" , [email protected], Paul.Y'[email protected], [email protected], "Rudy Adamiak" , "Jane Michalek" X-bcc: X-Folder: Paul_Ybarbo_Nov2001Notes FoldersCabot X-Origin: YBARBO-P X-FileName: pybarbo.nsf ... Source Target Mailbox Subject jmckechnie dan.masters ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie jaime.sanabria ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie barbo ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie todd.peterson ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" Message-ID: Date: Wed, 7 Mar 2001 01:56:00 -0800 (PST) From: [email protected] To: [email protected] Subject: Re: Feb 12 delivery of "Methane Arctic" Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bcc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] X-From: "Joseph McKechnie" X-To: [email protected] X-cc: "Christopher Skinner" , [email protected], Paul.Y'[email protected], [email protected], "Rudy Adamiak" , "Jane Michalek" X-bcc: X-Folder: Paul_Ybarbo_Nov2001Notes FoldersCabot X-Origin: YBARBO-P X-FileName: pybarbo.nsf ... Source Target Mailbox Subject jmckechnie dan.masters ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie jaime.sanabria ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie barbo ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie todd.peterson ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" Lots of email: 12. PREPROCESSING Message-ID: Date: Wed, 7 Mar 2001 01:56:00 -0800 (PST) From: [email protected] To: [email protected] Subject: Re: Feb 12 delivery of "Methane Arctic" Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bcc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] X-From: "Joseph McKechnie" X-To: [email protected] X-cc: "Christopher Skinner" , [email protected], Paul.Y'[email protected], [email protected], "Rudy Adamiak" , "Jane Michalek" X-bcc: X-Folder: Paul_Ybarbo_Nov2001Notes FoldersCabot X-Origin: YBARBO-P X-FileName: pybarbo.nsf ... Source Target Mailbox Subject jmckechnie dan.masters ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie jaime.sanabria ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie barbo ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie todd.peterson ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" Message-ID: Date: Wed, 7 Mar 2001 01:56:00 -0800 (PST) From: [email protected] To: [email protected] Subject: Re: Feb 12 delivery of "Methane Arctic" Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bcc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] X-From: "Joseph McKechnie" X-To: [email protected] X-cc: "Christopher Skinner" , [email protected], Paul.Y'[email protected], [email protected], "Rudy Adamiak" , "Jane Michalek" X-bcc: X-Folder: Paul_Ybarbo_Nov2001Notes FoldersCabot X-Origin: YBARBO-P X-FileName: pybarbo.nsf ... Source Target Mailbox Subject jmckechnie dan.masters ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie jaime.sanabria ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie barbo ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie todd.peterson ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" Message-ID: Date: Wed, 7 Mar 2001 01:56:00 -0800 (PST) From: [email protected] To: [email protected] Subject: Re: Feb 12 delivery of "Methane Arctic" Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bcc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] X-From: "Joseph McKechnie" X-To: [email protected] X-cc: "Christopher Skinner" , [email protected], Paul.Y'[email protected], [email protected], "Rudy Adamiak" , "Jane Michalek" X-bcc: X-Folder: Paul_Ybarbo_Nov2001Notes FoldersCabot X-Origin: YBARBO-P X-FileName: pybarbo.nsf ... Source Target Mailbox Subject jmckechnie dan.masters ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie jaime.sanabria ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie barbo ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie todd.peterson ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" Message-ID: Date: Wed, 7 Mar 2001 01:56:00 -0800 (PST) From: [email protected] To: [email protected] Subject: Re: Feb 12 delivery of "Methane Arctic" Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bcc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] X-From: "Joseph McKechnie" X-To: [email protected] X-cc: "Christopher Skinner" , [email protected], Paul.Y'[email protected], [email protected], "Rudy Adamiak" , "Jane Michalek" X-bcc: X-Folder: Paul_Ybarbo_Nov2001Notes FoldersCabot X-Origin: YBARBO-P X-FileName: pybarbo.nsf ... Source Target Mailbox Subject jmckechnie dan.masters ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie jaime.sanabria ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie barbo ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie todd.peterson ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" Lots of email: 517424 Enron emails 13. PREPROCESSING Message-ID: Date: Wed, 7 Mar 2001 01:56:00 -0800 (PST) From: [email protected] To: [email protected] Subject: Re: Feb 12 delivery of "Methane Arctic" Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bcc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] X-From: "Joseph McKechnie" X-To: [email protected] X-cc: "Christopher Skinner" , [email protected], Paul.Y'[email protected], [email protected], "Rudy Adamiak" , "Jane Michalek" X-bcc: X-Folder: Paul_Ybarbo_Nov2001Notes FoldersCabot X-Origin: YBARBO-P X-FileName: pybarbo.nsf ... Source Target Mailbox Subject jmckechnie dan.masters ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie jaime.sanabria ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie barbo ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie todd.peterson ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" Message-ID: Date: Wed, 7 Mar 2001 01:56:00 -0800 (PST) From: [email protected] To: [email protected] Subject: Re: Feb 12 delivery of "Methane Arctic" Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bcc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] X-From: "Joseph McKechnie" X-To: [email protected] X-cc: "Christopher Skinner" , [email protected], Paul.Y'[email protected], [email protected], "Rudy Adamiak" , "Jane Michalek" X-bcc: X-Folder: Paul_Ybarbo_Nov2001Notes FoldersCabot X-Origin: YBARBO-P X-FileName: pybarbo.nsf ... Source Target Mailbox Subject jmckechnie dan.masters ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie jaime.sanabria ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie barbo ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie todd.peterson ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" Message-ID: Date: Wed, 7 Mar 2001 01:56:00 -0800 (PST) From: [email protected] To: [email protected] Subject: Re: Feb 12 delivery of "Methane Arctic" Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bcc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] X-From: "Joseph McKechnie" X-To: [email protected] X-cc: "Christopher Skinner" , [email protected], Paul.Y'[email protected], [email protected], "Rudy Adamiak" , "Jane Michalek" X-bcc: X-Folder: Paul_Ybarbo_Nov2001Notes FoldersCabot X-Origin: YBARBO-P X-FileName: pybarbo.nsf ... Source Target Mailbox Subject jmckechnie dan.masters ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie jaime.sanabria ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie barbo ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie todd.peterson ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" Message-ID: Date: Wed, 7 Mar 2001 01:56:00 -0800 (PST) From: [email protected] To: [email protected] Subject: Re: Feb 12 delivery of "Methane Arctic" Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bcc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] X-From: "Joseph McKechnie" X-To: [email protected] X-cc: "Christopher Skinner" , [email protected], Paul.Y'[email protected], [email protected], "Rudy Adamiak" , "Jane Michalek" X-bcc: X-Folder: Paul_Ybarbo_Nov2001Notes FoldersCabot X-Origin: YBARBO-P X-FileName: pybarbo.nsf ... Source Target Mailbox Subject jmckechnie dan.masters ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie jaime.sanabria ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie barbo ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie todd.peterson ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" jmckechnie [email protected] ybarbo-p re: feb 12 delivery of "methane arctic" Lots of email: 517424 Enron emails 3715056 edges 14. SOFTWARE DESIGN & OPERATION Custom Directed Graph Creation Step 1 parse dataset Create edge list Create adjacency matrix 15. SOFTWARE DESIGN & OPERATION Custom Directed Graph Creation Step 2 - simplify Remove self-loops Remove vertices without edges 16. PREPROCESSING Edge weight Two methods compared: bytes sent and message count. Bytes sent is the sum of the number of characters in each matching message, minus any included emails in each message. Message count is the total number of matching messages. Analysis and intuition says either method will provide similar results. Message count chosen because it is simpler and faster. Enron dataset 19k vertices, 3.5M vertices 17. COMMUNITY DETECTION Communities are clusters of vertices that have more interconnections within the cluster than outside of the cluster Any non-trivial social network will have communities The metric associated with communities is modularity, which ranges from -1 to 1, and is defined as: 18. COMMUNITY DETECTION Ai,j is the edge weight between vertices i and j ki is the sum of the weights attached to vertex i m is the the sum of all the weights in the graph (for compatibility with earlier definitions of modularity) (ci,cj) is equal to 1 if vertices i and j are in the same community, and 0 if not. A completely random network has modularity ~ 0 19. COMMUNITY DETECTION Modularity detection algorithms seek to maximize Q Exact solutions are computational expensive, particularly for large networks Various network detection algorithms exist For this project the Louvain method was used Authors: Vincent Blondel Jean-Loup Guillaume Renaud Lambiotte Etienne Lefebvre 20. COMMUNITY DETECTIONZachary Karate Club Step 1 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Starting graph 21. COMMUNITY DETECTIONZachary Karate Club Step 2 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Modularity: -0.04980276Assign each vertex to a unique community 22. COMMUNITY DETECTIONZachary Karate Club Step 2 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Modularity: -0.04980276 Pick a vertex and place in the community of each of its neighbors. Measure change in modularity at each step. Place the vertex in the community with greatest positive change. 23. COMMUNITY DETECTION Modularity: -0.04980276 Continue to pick vertices at random and swapping until a minimum increase is found after a complete cycle. Zachary Karate Club Step 3 through N swap communities 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 24. COMMUNITY DETECTION Modularity: 0.2483563 Continue to pick vertices at random and swapping until a minimum increase is found after a complete cycle. Zachary Karate Club Step 4 through N 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 25. COMMUNITY DETECTION Modularity: 0.2483563 Continue to pick vertices at random and swapping until a minimum increase is found after a complete cycle. Zachary Karate Club Step 4 through N 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 26. COMMUNITY DETECTION Modularity: 0.2483563 Combine all nodes and edges in each new community into super- communities. Repeat previous steps. Zachary Karate Club Step 5 through N 1 2 3 4 5 6 7 27. COMMUNITY DETECTION Collapse all nodes and edges in each new community into super- communities. Repeat previous steps. 1 2 3 4 5 6 7 28. COMMUNITY DETECTION Repeat previous steps until no further increase in modularity can be gained, or an upper limit on iterations is reached 1 2 3 4 29. COMMUNITY DETECTIONZachary Karate Club Final 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Modularity: 0.3532216 30. COMMUNITY DETECTION 31. COMMUNITY DETECTION 32. METRICS Eigenvector Centrality Favors nodes that are highly connected to other highly connected nodes PageRank Favors nodes that are connected from other highly connected nodes Strongly biased towards mail-lists 33. METRICS Eigenvector Centrality Favors nodes that are highly connected to other highly connected nodes 34. METRICS PageRank 0.0 0.1 0.2 0.3 A B C D E vertex value ranking pagerank eigenvector 35. METRICS Degree the number of other vertices connected to a vertex Strength (in, out, total) sum of the weights of inbound, outbound, edges Betweenness Centrality: The ratio of the shortest paths traversing through a vertex divided by the total number of shortest paths in the network 36. METRICS Closeness Centrality The sum of the inverse of the shortest paths that traverse a vertex 37. SOFTWARE DESIGN Testing done by using known datasets and comparing values to other published values First step of the CGI is to run PageRank and keep only the top 750 nodes Most searches likely only want the top few ranked vertices Keep processing on local machine manageable Prevents hairballs 38. SOFTWARE DESIGN Perl CGI running under either Apache or python CGIHTTPServer R does all the heavy lifting for the analysis Force-Directed Graph from D3, a javascript library, is used for interactive graphics DataTables creates interactive html tables for sorting and filtering The size of the vertices is an average of the PageRank and Eigenvector values Color is assigned by community 39. SOFTWARE USAGE 40. SOFTWARE USAGE http://localhost/~kjolley/cgi-bin/tw-qcom.pl 41. SOFTWARE USAGE http://localhost/~kjolley/cgi-bin/tw-qcom.pl 42. SOFTWARE USAGE Results from a chip design project Dark blue: configuration management Light blue: hardware design Dark green: senior leadership Light green: test and design for test Salmon: not exactly sure One of the senior leads recognized this as a good visualization of the organization of the team and said this would be of value to Qualcomm 43. FUTURE STEPS More datasources ClearCase, email, communities, Perforce, HR databases Better search Deeper search Make Gephi more scriptable Commercial products