feasibility of using machine learning to access control_revds

14
Feasibility of using Machine Learning to Access Control in Squid Proxy Server Kanchana Ihalagedara Rajitha Kithuldeniya Supun weerasekara 05/15/2022 Escape 2015 1 Supervised by Mr.Sampath Deegalla

Upload: rajitha-kithuldeniya

Post on 14-Apr-2017

94 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Feasibility of Using Machine Learning to Access Control_revDS

05/03/2023 1

Feasibility of using Machine Learning to Access Control in

Squid Proxy Server

Kanchana Ihalagedara Rajitha Kithuldeniya Supun weerasekara

Escape 2015

Supervised by Mr.Sampath Deegalla

Page 2: Feasibility of Using Machine Learning to Access Control_revDS

05/03/2023 2

Internet in Educational Institutes

Mainly for educational purposes.What happens if users priority is not the intended purpose.

Network congestionsWastage of resourcesAffects individual user performance negatively

Escape 2015

Page 3: Feasibility of Using Machine Learning to Access Control_revDS

05/03/2023 3

Blocking Web Sites in Proxy ServerSquid ACLs - Text file of blacklists

SquidGuard - External databasesDansGuardian - Content filter

Escape 2015

Page 4: Feasibility of Using Machine Learning to Access Control_revDS

05/03/2023 4

World Wide Web is Growing

Manually blacklisting web sites is impossibleRelated products are not updated with the

growing web

Escape 2015

672,985,183 - 2013968,882,453 - 2014 295,897,270

From www.internetlivestats.com

Page 5: Feasibility of Using Machine Learning to Access Control_revDS

05/03/2023 Escape 2015 5

Dynamic automated method Automated web classification is

required

Machine Learning is used in automated web classification

Page 6: Feasibility of Using Machine Learning to Access Control_revDS

05/03/2023 6

Over View of Our Solution

Copy client

requestCheck URL

Get web content

Classify web

content

Escape 2015

Update the blacklist

Page 7: Feasibility of Using Machine Learning to Access Control_revDS

05/03/2023 7

Machine Learning in Web ClassificationSeveral web classification researches can be

foundFrequently used algorithms

Naïve Byes Support vector machine Nearest neighbor

Classification requires a data setSet of URLs labeled as educational or non

educational

Escape 2015

Page 8: Feasibility of Using Machine Learning to Access Control_revDS

05/03/2023 8

Data Collection & Preprocessing

Preprocess Squid server log

Preprocess DMOZ data set

Create labeled URLs

Get web content

Create training data set

Escape 2015

Page 9: Feasibility of Using Machine Learning to Access Control_revDS

05/03/2023 9

Model Creation & Testing

Four models were created from WEKA(small data set)

Data set with two hundred records 10 – fold cross validation for testingAlgorithm Accuracy(%)

PRISM 74.5

C4.5 (J48 in WEKA) 83.0

Naïve bayes 95.0

Support Vector Machines

95.5

Escape 2015

Page 10: Feasibility of Using Machine Learning to Access Control_revDS

05/03/2023 10

Model Creation & Testing

Three models using Python (larger dataset) Data set of 4000 records Separate data set of 1000 records for Testing

Algorithm Accuracy

Naïve Bayes multinomial 92.9%

SVC 77.5%

Linear SVC 98.9%

Escape 2015

Page 11: Feasibility of Using Machine Learning to Access Control_revDS

05/03/2023 11

Feature Selection in Linear SVC

10 25 50 100

500

1000

2000

5000

1000

020

00030

00040

00050

00055

686

8486889092949698

100

No. of features

Acc

urac

y / %

Escape 2015

Page 12: Feasibility of Using Machine Learning to Access Control_revDS

05/03/2023 12

Principal Component Analysis

Escape 2015

Page 13: Feasibility of Using Machine Learning to Access Control_revDS

05/03/2023 13

Future WorkConsider more content (Meta data)Other Languages (Sinhala)Image processing can be added

Escape 2015

Page 14: Feasibility of Using Machine Learning to Access Control_revDS

05/03/2023 14

Thank You!

Escape 2015