evaluating role mining algorithms
DESCRIPTION
Presentation of the paper: Evaluating role mining algorithms. In Proceedings of the 14th ACM symposium on Access control models and technologies (SACMAT '09) Ian Molloy, Ninghui Li, Tiancheng Li, Ziqing Mao, Qihua Wang, and Jorge Lobo. 2009. ACM, New York, NY, USA, 95-104. DOI=10.1145/1542207.1542224 http://doi.acm.org/10.1145/1542207.1542224TRANSCRIPT
Evaluating Role Mining AlgorithmsSACMAT’09, June 3 - 5, 2009, Stresa, Italy.
Ian Molloy, Ninghui Li, Tiancheng Li, Ziqing Mao, Qihua Wang
@ CERIAS Research Center Department of Computer Science, Purdue University
Jorge Lobo @ IBM T.J. Watson Research Center
Presentation by Onur Yılmaz - [email protected]
Outline
Introduction
Overview
Role Mining Algorithms
Evaluation Results
Analysis
Conclusion
Future Work
Introduction
Aim of the study
Comprehensive study to compare role mining algorithms
What is presented?
Two new methods for generating datasets
Analysis of nine role mining algorithms
Introduction
Role Mining
Using data mining techniques to discover roles from existing system configuration data
Overview
3 key points:Output of a role mining algorithm
Criteria to compare outputs of algorithms
Input datasets
OverviewOutput of Role Mining Algorithm
Existing algorithms based on their outputs:
Class 1: Outputting prioritized roles
Class 2: Outputting RBAC states
OverviewOutput of Role Mining Algorithm
Class 1: Outputting prioritized roles Prioritized list of candidate roles, each of which is a set of permissions
CompleteMiner and Fast-Miner
Candidaterole
generation
Candidaterole
prioritization
a set of candidateroles from the user-
permission assignment data
OverviewOutput of Role Mining Algorithm
Class 2: Outputting RBAC states
ρ = <User, Permission, UP >
RBAC state γ = <Roles, UserRoleAss, RolePermissionAss,
RoleHierarchy, DirectUserPermissionAss>
OverviewOutput of Role Mining Algorithm
Class 2: Outputting RBAC states
Minimize some cost measure while finding RBAC output
Number of roles, number of user assignmentsetc..
OverviewOutput of Role Mining Algorithm
Class 2: Outputting RBAC states
Weighted Structural Complexity (WSC)
Sums up the number of relationships in an RBAC state, with
possibly different weights for each relationship.
OverviewOutput of Role Mining Algorithm
Class 2: Outputting RBAC states
Weighted Structural Complexity (WSC)
Given a weight vector W = < wr, wu, wp, wh, wd >
wsc(γ,W) = wr ∗ |R| + wu ∗ |UA| + wp ∗ |PA|+wh ∗ |transitive_reduce(RH)| + wd∗ |DUPA|
OverviewOutput of Role Mining Algorithm
Class 2: Outputting RBAC states
Weighted Structural Complexity (WSC)
Different weight vectors encode different mining objectives and
minimization goals
HierarchicalMiner takes both a configuration ρ and a weight vector and
aims at outputting an RBAC state with low WSC.
Graph optimization minimizes the number of edges
OverviewOutput of Role Mining Algorithm
Class 1 vs Class 2 Algorithms
RBAC states are easy to compare
List of candidate roles can be more useful in practice
Administrator examines the role mining results and
determine whether to adopt some part of it.
In practice, whether role mining algorithms can suggest the
best candidate roles.
OverviewMetrics for Comparing Algorithms
Two metrics:
Complexity of the RBAC state
Quality of roles
OverviewMetrics for Comparing Algorithms
Complexity of the RBAC state
Using WSC, how well each algorithm performs
under a variety of mining objectives
OverviewMetrics for Comparing Algorithms
Quality of Roles
For each weight vector W, evaluate the complexity of the optimal
RBAC state using only the top k roles.
Among the top k roles, how quickly do the mined roles cover the UP
relation?
Among the top k roles, how well do they «resemble» the original
roles?
OverviewInput Data Type
Access Control Configuration
ρ = <User, Permission, UserPermissionRelation >
OverviewInput Data Type
Datasets from literature
OverviewInput Data Type
Generated Datasets
Random Data Generator
Tree-Based Data Generator
ERBAC Data Generator
OverviewInput Data Type
Random Data Generator
Permission Role
Roles Users
User – PermissionAssignment
Number of Users, Number of Roles, Number of Permissions,
Maximum Number of Roles for Users,Maximum Number of Permissions for Role
OverviewInput Data Type
Tree-Based Data Generator
Number of Users, Number of Permissions,Height of Tree
Upper bound on number of children node,Lower bound on number of children node
Randomlygenerate a tree
Assignpermissions to
nodes in thetree
Assign users toleaf nodes
OverviewInput Data Type
ERBAC Data Generator
Number of Users, Number of Business Roles, Number of Functional Roles,Number of Permissions
Maximum # of Business Roles,Maximum # of Functional Roles, Maximum # of Permissions
PermissionsFunctional
Roles
Business Roles
FunctionalRoles
Business Roles
Users
Role Mining Algorithms
Class 1 Class 2
CompleteMiner (CM) ORCA
FastMiner (FM) Graph Optimization (GO)
DynamicMiner (DM) HP Role Minimization (HPr)
PairCount (PC) HP Edge Minimization (HPe)
HierarchicalMiner (HM)
Role Mining AlgorithmsCompleteMiner (CM)
Initial set of roles
All possibleintersections
Prioritizationof roles
from userpermission sets Candidate roles
Exponential Time
Based on number of
exact matches
Role Mining AlgorithmsFastMiner (FM)
Initial set of roles
Onlyintersection
between pairs of initial roles
Prioritization of roles
from userpermission sets Candidate roles
O (n2m)
n: users, m: permissions
Role Mining AlgorithmsDynamicMiner (DM)
CM and FM -> static prioritization (does not consider candidateroles that been already chosen)
Initial set of roles
All possibleintersections
Prioritizationof roles
from userpermission sets Candidate roles
with the highestpriority first
O (n * |C| * min{n,m} )
C: Set of candidate roles
Role Mining AlgorithmsPairCount (PC)
Newly proposed method
CM -> Prioritization based on exact numbers
In reality, multiple roles are assigned to a user
Pair Count: Pairs of users that share the only role, but no other
PC(P) = | { (ui, uj ) | ui = uj ∧ P(ui) ∩ P(uj) = P } |
O (n2m)
Role Mining AlgorithmsPairCount (PC)
O (n2m)
Initial set of roles
All possibleintersections
Prioritizationof roles
from userpermission sets Candidate roles
Based on PairCounts
Role Mining AlgorithmsORCA
Hierarchical clustering on permissions
O (m2n)
Set of clusters of permissions
Find pairs of clusters
Continueuntil
The number of users
having both permissions is
the largest
One clusteror
No user withpermissions in
two clusters
Role Mining AlgorithmsHP Role Minimization (HPr)
Minimal set of roles to cover the user-permission assignmentrelation
O (nm)
Select a user u and finds a pair <U(u), P(u)>
All user-permission assignments between U(u) and P(u) are removed
This pair forms a «role»
P(u): Permissions of user uU(u): All users have all the permissions of u
Selecting the next user with
the fewest uncovered
permissions
Role Mining AlgorithmsHP Edge Minimization (HPe)
Finding a RBAC state with minimal number of edges, called edge concentration
Similar to Graph Optimization algorithm, except this does not create a
role hierarchy
O (k2m)
k : number of iterations
HPr
Greedilyimproveobjectivefunction
Converge
If two roles have overlap in the permission or
user sets ->restructuring
Role Mining AlgorithmsHierarchicalMiner (HM)
Concept: < P, U > such that
U contains all the users that have all permissions in P,
P contains all the permissions that are shared by all users in U
Similar to GraphOptimization but
uses conceptlattice.
Reducedfamily of concepts
Remove a role if RBAC stateis improved
Heuristicallycontinue
Removing a role:- Redistribution of users down the hierarchy- Permissions up the hierarchy
Evaluation Results
For each dataset, each algorithm
Ranked according to their ability to optimize evaluation criteria
1 to N
Two metrics mentioned before:
Comparing Complexity of the RBAC States
Comparing Prioritized Role Quality
Evaluation ResultsComparing Complexity of the RBAC States
Role Minimization
Evaluation ResultsComparing Complexity of the RBAC States
Edge Concentration
HM has an advantage in this test because its roles are designed for a role-hierarchy
Evaluation ResultsComparing Complexity of the RBAC States
Allowed Noise at Direct Assignments
Dataset contains errors that should not be covered by roles.
Evaluation ResultsComparing Complexity of the RBAC States
Discovering Original Roles
Similarity of mined roles to original data
Used metric is average maximal Jaccard
HM: The top 40+ rolesare more or less the ones generated
PC: Performed the worst, generating roles farthestfrom the original data
Evaluation ResultsComparing Prioritized Role Quality
Quality of WSC over k-roles
Evaluation ResultsComparing Prioritized Role Quality
Quality of Coverage
How well the algorithm at quickly covering the UP relation?
Analysis
Algorithms that minimize the number of roles often generate RBAC states
with a larger number of edges, resulting in increased complexity.
GO generates large role hierarchies when the number of users is greater
than the number of permissions.
DM is over-fitting some of the roles to cover users, and does not consider
the entire resulting RBAC state.
HM is computationally and memory intensive.
Conclusion
Aim of the study
Comprehensive study to compare role mining algorithms
What is presented?
Two new methods for generating datasets
Analysis of nine role mining algorithms
Future Work
Handling data with attribute information
In addition to the user-permission data, attribute
information may also be available.
Handling noisy data
In some scenarios, the input user-permission data
may contain noises.
Evaluating Role Mining AlgorithmsSACMAT’09, June 3 - 5, 2009, Stresa, Italy.
Ian Molloy, Ninghui Li, Tiancheng Li, Ziqing Mao, Qihua Wang
@ CERIAS Research Center Department of Computer Science, Purdue University
Jorge Lobo @ IBM T.J. Watson Research Center
Presentation by Onur Yılmaz - [email protected]