faculty: dr. chengcui zhang students: wei-bang chen song gao richa tiwari
TRANSCRIPT
![Page 1: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/1.jpg)
Faculty: Dr. Chengcui ZhangStudents: Wei-Bang Chen
Song Gao Richa Tiwari
![Page 2: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/2.jpg)
Past projects
• Image Spam Clustering Project– Cluster image spam through common visual
features present in image attachments– Reveal common origins of image spam
![Page 3: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/3.jpg)
examples
3
These two spam images exemplify illustrations with similar color composition but different layouts.
This example demonstrates illustrations in spam with similar layouts but different color composition.
![Page 4: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/4.jpg)
• Ongoing projects:– Phishing website clustering by text and visual
similarity
![Page 5: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/5.jpg)
Nat West Helpful BonkingAccessibility I HelpGot a question? We can help…
Nat West Helpful Bonking Help 24x7can’t I log in?Accessibility I Help…
RBSThQ Roy& Bank cq3codandMake it happen…
Text Recognized by OCR
![Page 6: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/6.jpg)
A Sample Cluster for PayPal
![Page 7: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/7.jpg)
4 Clusters Relate to PayPalCluster ID: 15 (76 Images) Cluster ID: 28 (20 Images) Cluster ID: 49 (13 Images) Cluster ID: 57 (22 Images)
![Page 8: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/8.jpg)
Dataset Statistics• 8 Days (7-10,17-19 & 22 Feb., 2011)• Total number of phishing website screen-shot images:
1461• Total number of produced clusters (cutoff similarity value = 60%):
156 + 1(ungrouped)
2 3 4 5 6 7 8 9 10 11 13 15 17 18 20 21 22 28 29 32 34 38 42 76 1160
5
10
15
20
25
30
35
40
45
50
Cluster Size (Number of Images)
Coun
t (N
umbe
r of C
lust
ers)
![Page 9: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/9.jpg)
• Observations: high cluster purity• Hard to measure completeness• Next step:– Incorporate visual features such as visual layout – Brand
![Page 10: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/10.jpg)
• Ongoing projects: – Uncovering auction fraud from eBay transaction
graph - Initial study
![Page 11: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/11.jpg)
• Data set: eBay transaction feedbacks– A total of 220,000 (two-hundred and twenty
thousand) users are crawled.• Idea of belief propagation: – Fraudsters create two types of identities - fraud and accomplice, where fraud identities are the ones used eventually to carry out the actual fraud, and the accomplice identities are the ones used to help build the reputation for the fraud identities. This pattern forms a near bipartite core in the transaction graph.
![Page 12: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/12.jpg)
• Algorithm:– Each vertex in the transaction graph is labeled by
one of {fraud, accomplice, honest} based on their pattern of interaction with other vertexes.
– Belief propagation (BP) is used to optimize the labeling across the entire graph by maximizing the joint probabilities of all the vertexes.
– Honest user model: Barabasi-Albert model
![Page 13: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/13.jpg)
![Page 14: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/14.jpg)
• Evaluation results on the sparse eBay transaction dataset– 20% accomplice– 50% fraud???
• What can be improved:– Network too sparse (average degree is ~5, ideally
>=10)– Initial probabilities (1/3, 1/3, 1/3) may not make
sense.– BP seems not to scale well with large graphs.
![Page 15: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/15.jpg)
• Projects under plan:– Modeling online user navigation patterns and
detecting anomalies using click stream data
![Page 16: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/16.jpg)
• Idea #1: Each user session is represented by an n-dimensional feature vector, where n is the number of Web pages in the session.– The value of each feature is a weight, indicating
the degree of interest of the user in the particular Web page.
– Based on these vectors, clusters of similar sessions are produced and characterized by the Web pages with the highest associated weights.
![Page 17: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/17.jpg)
• Idea #2: Markov Model– Pages (or page categories) as states
• Or page+parameters as nodes
– Transition probabilities between nodes• Idea #3: Graph partitioning– Pages as nodes– Edges as connectivity/weight between a pair of pages
• Co-occurrence, time difference, etc.
– Graph partitioning to find groups of strongly correlated pages
![Page 18: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/18.jpg)
• Projects under plan:– Novel biometrics
![Page 19: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/19.jpg)
• Palm print photo
![Page 20: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e155503460f94b00160/html5/thumbnails/20.jpg)
• Touch panel: handdrawing