symantec data insight sdk guide
DESCRIPTION
symantec di sdk guideTRANSCRIPT
Symantec Data InsightProgrammer's ReferenceGuide
4.0
June 2013Symantec Proprietary and Confidential
Symantec Data Insight Programmer's Reference GuideThe software described in this book is furnished under a license agreement andmay be usedonly in accordance with the terms of the agreement.
4.0
Documentation version: 4.0.0
Legal NoticeCopyright © 2013 Symantec Corporation. All rights reserved.
Symantec, the Symantec Logo, the Checkmark Logo and are trademarks or registeredtrademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Othernames may be trademarks of their respective owners.
This Symantec product may contain third party software for which Symantec is requiredto provide attribution to the third party (“Third Party Programs”). Some of the Third PartyPrograms are available under open source or free software licenses. The LicenseAgreementaccompanying the Software does not alter any rights or obligations you may have underthose open source or free software licenses. Please see theThird Party LegalNoticeAppendixto this Documentation or TPIP ReadMe File accompanying this Symantec product for moreinformation on the Third Party Programs.
The product described in this document is distributed under licenses restricting its use,copying, distribution, and decompilation/reverse engineering. No part of this documentmay be reproduced in any form by any means without prior written authorization ofSymantec Corporation and its licensors, if any.
THEDOCUMENTATIONISPROVIDED"ASIS"ANDALLEXPRESSORIMPLIEDCONDITIONS,REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OFMERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT,ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TOBELEGALLYINVALID.SYMANTECCORPORATIONSHALLNOTBELIABLEFORINCIDENTALOR CONSEQUENTIAL DAMAGES IN CONNECTION WITH THE FURNISHING,PERFORMANCE, OR USE OF THIS DOCUMENTATION. THE INFORMATION CONTAINEDIN THIS DOCUMENTATION IS SUBJECT TO CHANGE WITHOUT NOTICE.
The Licensed Software andDocumentation are deemed to be commercial computer softwareas defined in FAR12.212 and subject to restricted rights as defined in FARSection 52.227-19"Commercial Computer Software - Restricted Rights" and DFARS 227.7202, "Rights inCommercial Computer Software or Commercial Computer Software Documentation", asapplicable, and any successor regulations. Any use, modification, reproduction release,performance, display or disclosure of the Licensed Software andDocumentation by theU.S.Government shall be solely in accordance with the terms of this Agreement.
Symantec Proprietary and Confidential
Symantec Corporation350 Ellis StreetMountain View, CA 94043
http://www.symantec.com
Symantec Proprietary and Confidential
Technical SupportSymantec Technical Support maintains support centers globally. TechnicalSupport’s primary role is to respond to specific queries about product featuresand functionality. TheTechnical Support group also creates content for our onlineKnowledge Base. The Technical Support group works collaboratively with theother functional areas within Symantec to answer your questions in a timelyfashion. For example, theTechnical Support groupworkswithProductEngineeringand Symantec Security Response to provide alerting services and virus definitionupdates.
Symantec’s support offerings include the following:
■ A range of support options that give you the flexibility to select the rightamount of service for any size organization
■ Telephone and/or Web-based support that provides rapid response andup-to-the-minute information
■ Upgrade assurance that delivers software upgrades
■ Global support purchased on a regional business hours or 24 hours a day, 7days a week basis
■ Premium service offerings that include Account Management Services
For information about Symantec’s support offerings, you can visit our website atthe following URL:
www.symantec.com/business/support/
All support services will be delivered in accordance with your support agreementand the then-current enterprise technical support policy.
Contacting Technical SupportCustomers with a current support agreement may access Technical Supportinformation at the following URL:
www.symantec.com/business/support/
Before contacting Technical Support, make sure you have satisfied the systemrequirements that are listed in your product documentation. Also, you should beat the computer onwhich the problemoccurred, in case it is necessary to replicatethe problem.
When you contact Technical Support, please have the following informationavailable:
■ Product release level
Symantec Proprietary and Confidential
■ Hardware information
■ Available memory, disk space, and NIC information
■ Operating system
■ Version and patch level
■ Network topology
■ Router, gateway, and IP address information
■ Problem description:
■ Error messages and log files
■ Troubleshooting that was performed before contacting Symantec
■ Recent software configuration changes and network changes
Licensing and registrationIf yourSymantecproduct requires registrationor a licensekey, access our technicalsupport Web page at the following URL:
www.symantec.com/business/support/
Customer serviceCustomer service information is available at the following URL:
www.symantec.com/business/support/
Customer Service is available to assist with non-technical questions, such as thefollowing types of issues:
■ Questions regarding product licensing or serialization
■ Product registration updates, such as address or name changes
■ General product information (features, language availability, local dealers)
■ Latest information about product updates and upgrades
■ Information about upgrade assurance and support contracts
■ Information about the Symantec Buying Programs
■ Advice about Symantec's technical support options
■ Nontechnical presales questions
■ Issues that are related to CD-ROMs, DVDs, or manuals
Symantec Proprietary and Confidential
Support agreement resourcesIf youwant to contact Symantec regarding an existing support agreement, pleasecontact the support agreement administration team for your region as follows:
[email protected] and Japan
[email protected], Middle-East, and Africa
[email protected] America and Latin America
Symantec Proprietary and Confidential
Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 1 About this guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
How this guide is organized .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Chapter 2 DataInsight Query Language (DQL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
About Data Insight Query Language (DQL) ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11DQL Objects/Tables ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11About DQL Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
device Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13msu Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14user Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15groups Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16path Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17dfspath Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19owner Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22activity Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23permission Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24custodian Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
DQL Query Syntax .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26FROM clause .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26GET clause .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26FORMAT clause .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27IF clause .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30USING clause .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32HAVING clause .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35GROUPBY clause .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35SORTBY clause .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36LIMIT clause .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
DQL functions .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Example DQL queries ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Chapter 3 Web API Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Web API specification for generic Collector service ... . . . . . . . . . . . . . . . . . . . . . . . . . 41
Contents
Symantec Proprietary and Confidential
Chapter 4 Creating custom scripts for remediationactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
About custom scripts ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Chapter 5 Data Inventory Report schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Data Inventory report schema .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57file_inventory table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57lob table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59user_lob table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59user_totals table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59user_interval_totals table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60lob_totals table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60lob_interval_totals table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60intervals table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61msu_info table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61dashboard_info table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61Report configuration parameters ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Contents8
Symantec Proprietary and Confidential
About this guide
This chapter includes the following topics:
■ How this guide is organized
How this guide is organizedThis document contains a general description of the content and usage of theData InsightSoftwareDeveloper’sKit (SDK). Eachchapter introduces anddiscussesa Data Insight feature, its possible uses, and a description of how to use theapplication programming interface for custom operations. The SDK containsspecific programming examples using these interfaces.
This guide provides an overview of the following Data Insight features that areaccessible with the SDK:
■ DataInsightQueryLanguage (DQL) -UseDQL to create queries for thepurposeof creating customized reports.See “About Data Insight Query Language (DQL)” on page 11.
■ The generic device web API - Use the API to extend platform support for thestorage devices that Data Insight monitors.See “Web API specification for generic Collector service” on page 41.For information about configuring a generic device in Data Insight andcredentials required to monitor the device, see the Symantec Data InsightAdministrator's Guide.
■ Customscripts - Create scripts to define specific actions to handle remediation.See “About custom scripts” on page 53.To configure Data Insight to invoke these scripts to complete the customactions, see the Symantec Data Insight Administrator's Guide.
■ Schema of the Data Inventory Report.
1Chapter
Symantec Proprietary and Confidential
About this guideHow this guide is organized
10
Symantec Proprietary and Confidential
DataInsight QueryLanguage (DQL)
This chapter includes the following topics:
■ About Data Insight Query Language (DQL)
■ DQL Objects/Tables
■ About DQL Columns
■ DQL Query Syntax
■ DQL functions
■ Example DQL queries
About Data Insight Query Language (DQL)Data Insight Query Language(DQL) is a structured language to retrieve theinformation that is stored in the Data Insight indices. Indices are the proprietaryinternal data stores, that Data Insight use for storing information. DQL does notprovide the full functional capability of SQL, but it is expressive enough to allowthe users to easily extract, group, sort, and aggregate data.
DQL is a query-only language. You cannot use DQL to modify the Data Insightindices. DQLqueries are also protected by role-based-access-control, whichmeansthat you can only see the information that you have access to.
DQL Objects/TablesWith DQL, you can run a query on objects and retrieve other objects as results. Ifyou are familiar with the SQL language, an object in DQL is similar to a table in
2Chapter
Symantec Proprietary and Confidential
SQL. The attributes of an object are similar to the columnsof the table. The outputof a DQL query is a relational database table with attribute values as columnvalues.
The complete list of DQL tables and their brief description is as shown:
Describes the details of the storage devices or content repositoryservers that Data Insight monitors. For example, a NetApp or EMCfiler, a Windows File Server or a SharePoint web application.
device
Describes the details of the Data Insight storage units. An msu is aunit of storage space which can be a file share (in case of CIFS orNFS) or a site-collection (in case of SharePoint
msu
Describes the details of the file or directory paths to individualmsus.path
Describes the details of the DFS file or directory paths to individualmsus
dfspath
Describes the details of the computed owners of a file or directorypaths.
owner
Describes the details of the users that are listed in directory servicessuch as Active Directory, LDAP, or NIS+ directory server.
user
Describes the details of the groups that are listed in directory servicessuch as Active Directory, LDAP, or NIS+.
groups
Describes the details of the activity events on specific paths, thataremade by specific users, at specific times. For example, an activityobject can describe the following: file \\netapp1\mydocs\MarketResearch.docwas read byuser JohnSmithat 1334123700 (Wed, 11thApril 2012 05:55:00 GMT).
activity
Describes the details of the NTFS or UNIX permissions that are seton directory or file paths.
permission
Describes the details of the custodians that are assigned to devices,msus, directories, or files.
custodian
In the above mentioned list of objects, the owner object differs from the rest ofthe object – it is a computed object. Owner objects are not first class objects thatare stored in the Data Insight indices. They are computed at run-time dependingon the method that is to be used to calculate file ownership.
DataInsight Query Language (DQL)DQL Objects/Tables
12
Symantec Proprietary and Confidential
About DQL ColumnsUnlike a SQL table whose columns can only contain a single value, a DQL tablecanhave columnswithmultiple values. For example, the groupDomainUsershasmultiple values for its columnmemberusers. A pair of square brackets around thecolumn name is used to indicate that the column is multi-valued.
With DQL, you can have a table with the columns that refer to other tables. Forexample, the table groups has a columnmemberusers which refers to rows fromthe user table. When you retrieve such reference columns, you need to specifywhat columns you want to retrieve from the referred table. For example, youcannot retrievememberusers from groups without specifying which columns ofthe user table you are interested in. So, you can selectmemberusers.name ormemberusers.sid but not justmemberusers.
device Columns
DescriptionTypeColumn
Unique identifier for thisdevice.
Integerid
Name of this device.Stringname
Type of device (NetApp,Celerra, WinNAS,SharePoint).
Stringtype
Name of Collector node.Stringcollector
Name of Indexer node.Stringindexer
List of custodians for thisdevice.
[Custodian Object][custodians]
Storage capacity of thisdevice.
Integercapacity
The total amount of spacethat all files and folders onthis device consume.
Integerused_space
Number of shares on thisdevice.
Integershare_count
Number of shares on thisdevice that are marked asopen.
Integeropen_share_count
13DataInsight Query Language (DQL)About DQL Columns
Symantec Proprietary and Confidential
DescriptionTypeColumn
Total size of all open shareson this device.
Integeropen_share_data_size
Total file count of all openshares on this device.
Integeropen_share_file_count
Total file count of all shareson this device.
Integerfile_count
Number of sensitive filesacross all shares on thisdevice.
Integersensitive_file_count
Total folder count of allshares on this device.
Integerfolder_count
Total activity count on thisdevice (the activity count iscalculated for the last sixmonths).
Integeractivity_count
msu Columns
DescriptionTypeColumn
Unique identifier for thismsu.
Integerid
Name of this msu.Stringname
Type of msu (CIFS, NFSv3,SharePoint).
Stringtype
Device that thismsu belongsto.
Device Objectdevice
Name of Indexer node.Stringindexer
Path to index directory.Stringindexdir
List of custodians for thismsu.
[Custodian Object][custodians]
List of permissions for thismsu (Share-levelpermissions).
[Permission Object][permissions]
DataInsight Query Language (DQL)About DQL Columns
14
Symantec Proprietary and Confidential
DescriptionTypeColumn
1 ifmsu is open, otherwise 0.Integerisopen
Total activity count in thelast six months.
Integeractivity_count
Number of users who wereactive in the last six months.
Integeractive_user_count
Timeof last recorded activityon this msu.
Integerlast_activity_time
Total size of this msu.Integersize
Total size of all active fileson this msu.
Integeractive_data_size
Number of files on this msu.Integerfile_count
Number of sensitive files onthis msu.
Integersensitive_file_count
Number of folders on thismsu.
Integerfolder_count
User who is most active onthis msu.
User Objectmost_active_user
user Columns
DescriptionTypeColumn
Unique identifier of the user.Stringsid
Full name of the user (e.g.,John Smith).
Stringname
Login of the user.Stringlogin
Domain that theuser belongsto
Stringdomain
First name of the user.Stringfirstname
Last name of the user.Stringlastname
1 if the user is disabled, 0otherwise.
Integerisdisabled
15DataInsight Query Language (DQL)About DQL Columns
Symantec Proprietary and Confidential
DescriptionTypeColumn
1 if the user is deleted fromAD/LDAP, 0 otherwise.
Integerisdeleted
Name of the business unitthat this user belongs to.
Stringbuname
Owner of the business unitthat user belongs to.
Stringbuowner
Groups of which this user isa member of.
[Group Object][memberof]
Customattribute of the user.Replace <custom-attr> withnameof customattribute, forexample, department. If thename contains specialcharacters like -,*,%,^,/, etc.enclose the name in quotes.For example, "E-mail".
[String]<custom-attr>
groups Columns
DescriptionTypeColumn
Unique identifier of thisgroup.
Stringsid
Name of this group.Stringname
Domain of this group.Stringdomain
1 if the Group is disabled, 0otherwise.
Integerisdisabled
1 if the Group is deleted, 0otherwise.
Integerisdeleted
Groups of which this groupis a member of.
[Group Object][memberof]
Users who are members ofthis group.
[User Object][memberusers]
Groups who are members ofthis group.
[Group Object][membergroups]
DataInsight Query Language (DQL)About DQL Columns
16
Symantec Proprietary and Confidential
DescriptionTypeColumn
Custom attribute of Group.Replace <custom-attr> withnameof customattribute, forexample, location. If thename contains specialcharacters like -,*,%,^,/, etc.enclose the name in quotes.For example, "E-mail".
[String]<custom-attr>
path Columns
DescriptionTypeColumn
Name of path relative to themsu.
Stringname
Absolute name of the pathcontaining device and sharenames – for example,\\filer1\share100\a\b.
Stringabsname
Unique identifier for thispath within the msu.
Integerid
Parent path of this path.Path Objectparent
DIR for directory, FILE forfile.
Stringtype
Thedevice towhich this pathbelongs.
Device Objectdevice
The msu to which this pathbelongs.
msu Objectmsu
Size of path in bytes. Fordirectories it is the size of allfiles under the entire subtree.
Integersize
Timestampofwhen this pathwas last accessed.Timestampis measured as the numberof seconds that have elapsedsincemidnightUTC, January1st, 1970.
Integerlast_accessed
17DataInsight Query Language (DQL)About DQL Columns
Symantec Proprietary and Confidential
DescriptionTypeColumn
Timestampofwhen this pathwas last modified.
Integerlast_modified
Timestampofwhen this pathwas created.
Integercreated_on
User who last accessed thispath.
User Objectlast_accessor
User who last modified thispath.
User Objectlast_modifier
User who created this path.User Objectcreator
Group creator of this path.Group Objectcreator_group
Computed Owner of thispath.
Owner Objectowner
1 if the path is deleted, 0otherwise.
Integerisdeleted
Depth of the path from theroot of the share. Forexample, ‘/’ has a depth of 0,‘/a’ has a depth of 1, ‘/a/b’has a depth of 2.
Integerdepth
Total activity count on thispath.
Integeractivity_count
1 if the path is open, 0otherwise.
Integerisopen
Reasons why the path isconsidered open.
[String][open_reasons]
List of permissions on thispath.
[Permission Object][permissions]
List of custodians for thispath.
[Custodian Object][custodians]
1 if the path is sensitive, 0otherwise.
Integerissensitive
List of filegroups for thispath.
[String][filegroups]
DataInsight Query Language (DQL)About DQL Columns
18
Symantec Proprietary and Confidential
DescriptionTypeColumn
File extension for this path.For example PST, DOC etc.
Stringextension
List of DFS names for thispath.
[String][dfsnames]
List of users who havepermissions to access thispath.
[User Object][permitted_users]
Number of users who havepermissions to access thispath.
Integerpermitted_users_count
List of users who are activeon this path.
[User Object][active_users]
Number of users who areactive on this path.
Integeractive_users_count
List of userswho are inactiveon this path.
[User Object][inactive_users]
Number of users who areinactive on this path.
Integerinactive_users_count
List of DLP policies violatedby this path.
[String][dlp_policies]
1 if the path is a controlpoint, 0 otherwise.
Integeriscontrol_point
Reasons why the path isconsidered a control point.
[String][control_point_reasons]
Owner specified by theNTFSfile system.
User Objectfilesystem_owner
dfspath Columns
DescriptionTypeColumn
Name of DFS path relative tothe msu.
Stringname
19DataInsight Query Language (DQL)About DQL Columns
Symantec Proprietary and Confidential
DescriptionTypeColumn
Absolute name of the DFSpath containing device andshare names – for example,\\dfsfiler1\dfsshare100\a\b.
Stringabsname
Unique identifier for thispath within the msu.
Integerid
Parent DFS path of this DFSpath.
DFS Path Objectparent
Absolute name of thephysical path that this DFSpath maps to.
Stringphysicalname
DIR for directory, FILE forfile.
Stringtype
The device towhich this DFSpath belongs.
Device Objectdevice
The msu to which this DFSpath belongs.
msu Objectmsu
Size of path in bytes. Fordirectories it is the size of allfiles under the entire subtree.
Integersize
Timestampofwhen this pathwas last accessed.Timestampis measured as the numberof seconds that have elapsedsincemidnightUTC, January1st, 1970.
Integerlast_accessed
Timestampofwhen this pathwas last modified.
Integerlast_modified
Timestampofwhen this pathwas created.
Integercreated_on
User who last accessed thispath.
User Objectlast_accessor
User who last modified thispath.
User Objectlast_modifier
User who created this path.User Objectcreator
DataInsight Query Language (DQL)About DQL Columns
20
Symantec Proprietary and Confidential
DescriptionTypeColumn
Group creator of this path.Group Objectcreator_group
Computed Owner of thispath.
Owner Objectowner
1 if the path is deleted, 0otherwise
Integerisdeleted
Depth of the path from theroot of the share. Forexample, ‘/’ has a depth of 0,‘/a’ has a depth of 1, ‘/a/b’has a depth of 2.
Integerdepth
Total activity count on thispath.
Integeractivity_count
1 if the path is open, 0otherwise.
Integerisopen
Reasons why the path isconsidered open.
[String][open_reasons]
List of permissions on thispath.
[Permission Object][permissions]
List of custodians for thispath.
[Custodian Object][custodians]
1 if the path is sensitive, 0otherwise.
Integerissensitive
List of filegroups for thispath.
[String][filegroups]
File extension for this path.For example PST, DOC etc.
Stringextension
List of users who havepermissions to access thispath.
[User Object][permitted_users]
Number of users who havepermissions to access thispath.
Integerpermitted_users_count
List of users who are activeon this path.
[User Object][active_users]
21DataInsight Query Language (DQL)About DQL Columns
Symantec Proprietary and Confidential
DescriptionTypeColumn
Number of users who areactive on this path.
Integeractive_users_count
List of userswho are inactiveon this path.
[User Object][inactive_users]
Number of users who areinactive on this path.
Integerinactive_users_count
List of DLP policies violatedby this path.
[String][dlp_policies]
1 if the path is a controlpoint, 0 otherwise.
Integeriscontrol_point
Reasons why the path isconsidered as a control point.
[String][control_point_reasons]
Owner specified by theNTFSfilesystem.
User Objectfilesystem_owner
owner Columns
DescriptionTypeColumn
Path for which the owner iscomputed.
Path Objectpath
DFS path for which theowner is computed.
DFS Path Objectdfspath
The computed owner of thepath.
User Objectuser
Number of read accessesmade by this user.
Integerread_count
Number of write accessesmade by this user.
Integerwrite_count
DataInsight Query Language (DQL)About DQL Columns
22
Symantec Proprietary and Confidential
DescriptionTypeColumn
Themethod that was used tocompute this owner. Possiblevalues are creator,read_count, write_count,rw_count, last_accessor,last_modifier, andparent_owner.
Stringmethod
activity Columns
DescriptionTypeColumn
Timestamp of the activity.Timestamp is measured asthe number of seconds thathave elapsed since midnightUTC, January 1st, 1970.
Integertimestamp
Number of seconds sincetimestamp that this activityevent might have happened.
Integertimerange
User who initiated thisactivity event.
User Objectuser
Path on which this activityevent occurred.
Path Objectpath
DFS path on which thisactivity event occurred.
DFS Path Objectdfspath
Integer representing theactivity event.
Integeropcode
String notation of theactivity event (e.g., read,write, create, delete,mkdir,rmdir etc.).
Stringoperation
Number of times thisoperation was performed inthe timerange
Integercount
IP address from where theoperation was performed.
Stringipaddr
23DataInsight Query Language (DQL)About DQL Columns
Symantec Proprietary and Confidential
DescriptionTypeColumn
For rename or moveoperations the target path towhich this path wasrenamed.
Path Objectrename_target
For rename or moveoperations the target DFSpath to which this DFS pathwas renamed.
DFS Path Objectdfs_rename_target
permission Columns
DescriptionTypeColumn
Type of object on which thispermission is set (msu, DIR).
Stringobject_type
Path on which thepermission is set.
Path Objectpath
DFS path on which thepermission is set.
DFS Path Objectdfspath
The msu on which thepermission is set
msu Objectmsu
Type of trustee (user, group).Stringtrustee_type
Trustee of this permission.User Objectuser_trustee
Trustee of this permission.Group Objectgroup_trustee
Permission bitmask.Integerpermission_mask
List of readable permissions– read,write, full control etc.
Stringreadable_permission
Type of permission (GRANT,DENY).
Stringtype
1 if the permission isinherited from parent.
Integerisinherited
Type of object from whichthis permission is inherited(msu, DIR).
Stringinheriting_type
DataInsight Query Language (DQL)About DQL Columns
24
Symantec Proprietary and Confidential
DescriptionTypeColumn
Path from which thispermission is inherited.
Path Objectinheriting_path
DFS path from which thispermission is inherited.
DFS Path Objectinheriting_dfspath
msu from which thispermission is inherited.
msu Objectinheriting_msu
Inheritance settings for thispermission (e.g. this folder,all subfolders,only immediatefiles).
Stringappliesto
custodian Columns
DescriptionTypeColumn
Path on which the custodianis assigned.
Path Objectpath
DFS path on which thecustodian is assigned.
DFS Path Objectdfspath
msu on which the custodianis assigned.
msu Objectmsu
Device on which thecustodian is assigned.
Device Objectdevice
DFS link on which thecustodian is assigned.
Stringdfslink
The assigned custodian ofthe path.
User Objectuser
1 if the custodian is inheritedfrom a parent (device, msu,dir, dfslink).
Integerisinherited
Type of object from whichthe custodian is inherited(device, msu, dir, dfslink).
Stringinheriting_type
Path from which thecustodian is inherited.
Path Objectinheriting_path
25DataInsight Query Language (DQL)About DQL Columns
Symantec Proprietary and Confidential
DescriptionTypeColumn
DFS path from which thecustodian is inherited.
DFS Path Objectinheriting_dfspath
msu from which thecustodian is inherited.
msu Objectinheriting_msu
Device from which thecustodian is inherited.
Device Objectinheriting_device
DFS link from which thecustodian is inherited.
Stringinheriting_dfslink
DQL Query SyntaxThe DQL query syntax and top-level grammatical constructs are as shown:
FROM <table>
GET <column expression> [AS alias], <column expression> [AS alias], ...
[IF <condition>]
[USING <definition>]
[FORMAT <column> AS (CSV|TABLE <tablename>) [<count>]]
[GROUPBY <column expression>, <column expression>, ...]
[HAVING <aggregate-condition>]
[SORTBY <column expression> [ASC|DESC]]
[LIMIT [<offset>,]<count>];
FROM clauseThe FROM specifies the table from which DQL retrieves the data. DQL does notsupport joins as in SQL. You can only specify one table in the FROM clause.
GET clauseTheGET clause specifies the columns (or expressions on columns) that you wantto retrieve from the table that you specify in the FROM clause.
DQL tables can have columns that refer to other tables. For example, the tablegroups has a columnmemberusers which refers to rows from table user. Whenyou retrieve such reference columns, you need to specify what columns youwantto retrieve from the referred table. For example, you cannot retrievememberusersfrom groupswithout specifyingwhich columns of theuser table you are interested
DataInsight Query Language (DQL)DQL Query Syntax
26
Symantec Proprietary and Confidential
in. So, you can selectmemberusers.name ormemberusers.sid but not justmemberusers.
The column names in the output table are decided by the expressions used in theGET clause. While displaying the output, DQL may optionally replace the period( . ) with the underscore ( _ ). For example, for GET path.name, the output columnname in the SQLite database becomes path_name.
FORMAT clauseData Insight tables can containmulti-valued columns. For example, path containsa multivalued column permissions. When you specify the columns in the GETclause, you also need to specify the manner in which you want their values toappear in the output database table. Use the FORMAT clause to control the formatof the output in case ofmulti-valued columns. You can use two formatting optionsas shown below:
FORMAT <column> AS CSV
The above syntax displays the output values for a multi-valued column as acomma-separated list in a single column.
FORMAT <column> AS TABLE <tablename>
The above syntax displays the output values for a multi-valued column in aseparate table. Each row of this table contains a reference to its correspondingrow in the parent table.
The default value for the FORMAT clause is a TABLE. If you do not provide aFORMAT clause in your query, DQL displays the contents of the multi-valuedcolumns in separate tables. And thenameof themulti-valued column is displayedas the default name of the table. For example, if you want to retrieve pathpermissions and you do not specify the FORMAT clause, DQL displays the outputthe permissions of a path in a separate table called permissions.
Consider this example:
FROM groups
GET name, memberusers.sid, memberusers.name
FORMAT memberusers AS CSV
Sincememberusers is amulti-valued column, theFORMATclause onmemberusersneeds to be specified
The above query creates an output table groups containing four columns –groups_rowid, name,memberusers_sid,memberusers_name. The columngroups_rowid is a default column present in all DQL output tables, containing an
27DataInsight Query Language (DQL)DQL Query Syntax
Symantec Proprietary and Confidential
identification number for each rows. The columnsmemberusers_sid andmemberusers_name contains a comma-separated list of member user sids andnames.
Example output table is as shown below:
groups
memberusers_namememberusers_sidnamegroups_rowid
John,Jim,Paul,SteveS-1,S-2,S-10,S-11Domain Users1
Paul,JaneS-10,S-12HR_Global2
PaulS-10HR_US3
Suppose that you change the query to:
FROM groups
GET name, memberusers.sid, memberusers.name
FORMAT memberusers AS TABLE memberusers
In this case, the output database contains two tables – groups andmemberusers.The groups table has two columns – groups_rowid and name. Thememberuserstable has three columns groups_rowid,memberusers_sid,memberusers_name.The groups_rowid column in thememberusers table is a reference to thegroups_rowid column from the groups table.
Example output tables are as shown below:
groups
namegroups_rowid
Domain Users1
HR_Global2
HR_US3
memberusers
memberusers_namememberusers_sidgroups_rowid
JohnS-11
DataInsight Query Language (DQL)DQL Query Syntax
28
Symantec Proprietary and Confidential
JimS-21
PaulS-101
SteveS-111
PaulS-102
JaneS-122
PaulS-103
By default, DQL lists allmemberusers of a group. Optionally, you can limit thenumber ofmemberusers listed using the FORMAT clause. This is as shown in thefollowing query:
FROM group
GET name, memberusers.sid, memberusers.name
FORMAT memberusers AS CSV 4
This limits the output table to a maximum of four member user values for eachgroup. These four values are the first four members of the list.
Nested multi-valued columnsThere may be situations where you need to specify nested multi-valued columns.For example, the path table has a multi-valued column active_users, which is areference to user table. The table user, in turn, has a multi-valued columnmemberofwhich indicates the groups that a user belongs to. If you want to get allactive users for a path and the groups that each active user belongs to, write yourquery as shown below.
FROM path
GET name, active_users.name, active_users.memberof.name
FORMAT active_users AS CSV AND
active_users.memberof AS CSV;
In this query’s output table, the third column active_users_memberof_name listsall the groups of all the path’s active users. For example, suppose that path /foohas active users Joe and Jane. Suppose that Joe belongs to groups HR andALL-Employees, while Jane belongs to groups Finance and ALL-Employees. Theoutput column for this query will then be HR, ALL-Employees, Finance,ALL-Employees.
29DataInsight Query Language (DQL)DQL Query Syntax
Symantec Proprietary and Confidential
Notice that you have a flat list of all group names in this column. You have lostinformation aboutwhat groups each of the active users belongs to. You only knowthat there is one active userwhobelongs toHR, twowhobelong toALL-Employeesand one who belongs to Finance.
IF clauseThe IF clause is an optional clause that you can use to specify a set of conditionson the rows that you want to retrieve. It is similar to the WHERE clause of SQL.DQL retrieves only those rows whose columns satisfy the condition(s) providedunder the IF clause.
OperatorsDQL supports the following binary operators that you can use to specify acondition:
■ Comparison operators: >, <, >=, <=, =, ==, !=, <>
■ Logical operators: AND, &&, OR, ||
■ Arithmetic operators: +, -, *, /, %
■ List containment operators: IN, NOT IN
ConstantsDQL’s IF clause supports specification of constants in operations. Constants canbe eithernumeric or string. Someexample of supported column-related operationsare as shown below:
■ IF size/1024 > 10
■ IF size = 10
■ IF name IN (“John”, “Joe”)
Note that string comparisons are case insensitive by default. To specify casesensitive or case insensitive comparisons, you can use the CASE SENSITIVE andCASE INSENSITIVE keywords.
■ IF name IN (“John”, “Joe”) CASE SENSITIVE
■ IF name = “John” CASE INSENSITIVE
Conditions on multi-valued columnsYou can use EACH or ANY prefixes to specify the conditions on multi-valuedcolumns.EACH specifies that each value of themulti-valued columnshould satisfy
DataInsight Query Language (DQL)DQL Query Syntax
30
Symantec Proprietary and Confidential
the condition while ANY specifies that any value of the multi-valued columnshould satisfy the condition.
Suppose that you want to retrieve only those paths on which the user John isactive. You can write a query as shown below.
FROM path
GET name, active_users.name
IF ANY active_users.name = "John"
FORMAT active_users AS CSV;
Suppose that you want to retrieve paths on which either John or Joe are active.You can write a query (query a) as shown below.
FROM path
GET name, active_users.name
IF ANY active_users.name IN ("John","Joe")
FORMAT active_users AS CSV;
The above query retrieves the paths onwhich either John is one of the active usersand/or Joe is one of the active users.
Suppose that you want to retrieve the paths that only have John and Joe as activeusers. You can write a query (query b) as shown below.
FROM path
GET name, active_users.name
IF EACH active_users.name IN ("John","Joe")
FORMAT active_users AS CSV;
The above query retrieves paths where the only active users are John and/or Joe.
Note that in query (a), you get the paths on which John or Joe is one of the activeusers whereas in query (b), you get the paths on which John and/or Joe are theonly active users.
Conditions on nested multi-valued columnsSincenestedmulti-valued columns evaluate to a flat list, you can specify conditionson them using the ANY and EACH constructs as above.
For example, suppose that you want to retrieve those paths containing at leastone active user belonging to group HR. You can write a query (query b) as shownbelow.
31DataInsight Query Language (DQL)DQL Query Syntax
Symantec Proprietary and Confidential
FROM path
GET name, active_users.memberof.name
IF ANY active_users.memberof.name = "HR"
FORMAT active_users.memberof AS CSV;
Suppose that youwant to retrieve those paths containing active userswho belongonly to groups HR and/or FINANCE. You can write a query (query b) as shownbellow.
FROM path
GET name, active_users.memberof.name
IF EACH active_users.memberof.name IN ("HR", "FINANCE")
FORMAT active_users.memberof AS CSV;
Note that DQL by default uses the ANY construct if you do not specify anANY/EACH construct.
USING clauseValues of certain columns like owner are computed at run-time based on somecriteria. For example, to compute an owner of a file, you need to specify whatmethods (like read_count, rw_count, parent_owner etc.) you want to use todetermine the owner. When you determine active users of a path, you need tospecify the time range you want to consider for the activity.
You can use the USING clause to specify such functions that can be applied toobtain a column value.
The details of the DQL USING functions are as shown below.
Calculating the owner
calc_owner(start_time TEXT, end_time TEXT, date_format TEXT,
ordered_list_of_owner_methods TEXT)
Example usage in query:
FROM path
GET name, owner.user.name, owner.method, owner.read_count
USING owner AS calc_owner("2012-01-01", "2012-06-01", "YYYY-MM-DD",
"rw_count, read_count, last_accessor");
DataInsight Query Language (DQL)DQL Query Syntax
32
Symantec Proprietary and Confidential
If you don’t specify a USING function for owner, DQL uses a default time rangeof last 6 months and uses a data owner ordering of rw_count, write_count,read_count, last_modifier, last_accessor, creator, parent_owner.
Calculating the active_users
calc_active_users(start_time TEXT, end_time TEXT, date_format TEXT)
Example usage in query:
FROM path
GET name, active_users.name
USING active_users AS
calc_active_users("2012-01-01", "2012-06-01", "YYYY-MM-DD")
FORMAT active_users AS CSV;
If you don’t specify a USING function for active_users, DQL uses a default timerange of last 6 months.
Calculating the active_users_count
get_active_users_count(start_time TEXT, end_time TEXT, date_format TEXT)
Example usage in query:
FROM path
GET name, active_users_count
USING active_users_count AS
get_active_users_count("2012-01-01", "2012-06-01", "YYYY-MM-DD");
If you don’t specify a USING function for active_users_count, DQL uses a defaulttime range of last 6 months.
Calculating the inactive_users
calc_inactive_users(start_time TEXT, end_time TEXT, date_format TEXT)
Example usage in query:
FROM path
33DataInsight Query Language (DQL)DQL Query Syntax
Symantec Proprietary and Confidential
GET name, inactive_users.name
USING inactive_users AS
calc_inactive_users("2012-01-01", "2012-06-01", "YYYY-MM-DD")
FORMAT inactive_users AS CSV;
If you don’t specify aUSING function for inactive_users, DQL uses a default timerange of last 6 months for calculating inactivity.
Calculating the inactive_users_count
get_inactive_users_count(start_time TEXT, end_time TEXT, date_format TEXT)
Example usage in query:
FROM path
GET name, inactive_users_count
USING inactive_users_count AS
get_inactive_users_count("2012-01-01", "2012-06-01", "YYYY-MM-DD");
If youdon’t specify aUSING function for inactive_users_count, DQLuses a defaulttime range of last 6 months for calculating inactivity.
Calculating the activity_count
get_activity_count(start_time TEXT, end_time TEXT, date_format TEXT)
Example usage in query:
FROM path
GET name, activity_count
USING activity_count AS
get_activity_count("2012-01-01 10:00", "2012-01-01 15:00",
"YYYY-MM-DD HH:mm");
If you don’t specify a USING function for activity_count, DQL uses a default timerange of last 6 months for calculating activity.
DataInsight Query Language (DQL)DQL Query Syntax
34
Symantec Proprietary and Confidential
HAVING clauseTheHAVING clause is similar to the SQLHAVING clause and allows specificationof conditions onaggregate functions. The syntaxof conditions that canbe specifiedin the HAVING clause is the same as that of the DQL IF clause.
Suppose that you want to retrieve the sum of the sizes of all shares for each filer.You can write a query for this as shown bellow:
FROM msu
GET filer.name, sum(size)
GROUPBY filer.name;
Now suppose that you want to select only those filers whose sum of share sizesis greater than 1 GB (1,073,741,824 bytes). Then you need to modify the previousquery as:
FROM msu
GET filer.name, sum(size)
GROUPBY filer.name
HAVING sum(size) > 1073741824;
GROUPBY clauseThe GROUPBY clause is similar to the SQL GROUP BY clause. It enables you toaggregate the output rows into groups. Suppose that youwant to retrieve the sumof the sizes of all shares for each filer. You can write a query for this as shownbelow.
FROM msu
GET filer.name, sum(size)
GROUPBY filer.name;
DQL supports the following aggregation functions:
■ sum
■ count
■ max
■ min
35DataInsight Query Language (DQL)DQL Query Syntax
Symantec Proprietary and Confidential
SORTBY clauseThe SORTBY clause is similar to the SQL ORDER BY clause. It enables you to sortof the rows of the output table based on their column values.
FROM msu
GET name, size
SORTBY size DESC;
If no sort order is specified, DQL defaults to ASC.
LIMIT clauseTheLIMIT clause is similar to the SQLLIMIT clause and is used to limit thenumberof output rows.
LIMIT count [This will retrieve the first "count" rows]
LIMIT offset, count [This will retrieve "count" rows starting from
"offset"]
offset values start from 1.
DQL functionsDQL supports the following built-in functions:
Converts string X to uppercase.upper(X)
Converts string X to lowercase.lower(X)
Returns length of string X.strlen(X)
Returns number of items in list X.length(X)
Returns true if Y is a substring of X. The comparison iscase-sensitive.
substr(X, Y)
Returns true if Y is a substring of X. The comparison iscase insensitive.
substri(X, Y)
DataInsight Query Language (DQL)DQL functions
36
Symantec Proprietary and Confidential
Returns true if Xmatches the regular expressionpatternP. Regular expression matching is case-sensitive.PatternP canbe specified as Patternsmatching a singlecharacter or Patterns matching multiple characters.
You can refer to the following URLs for information onpattern matching:
■ http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13_01
■ http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13_02
match(X, P)
Returns true if Xmatches the regular expressionpatternP. Regular expression matching is case insensitive.PatternP canbe specified as Patternsmatching a singlecharacter or Patterns matching multiple characters.
You can refer to the following URLs for information onpattern matching:
■ http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13_01
■ http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13_02
matchi(X, P)
Returns time in epoch for the string date D. The formatin which date D is specified is indicated by the formatstring F. The options for F are:
YYYY – 4 digit year
MM - month of year (01 – 12)
DD - date of month
HH - hour (00 – 24)
mm - minutes (00 – 59)
ss – seconds (00 – 59)
Z – timezone
Example: datetime(“2012-01-10 -0800”, “YYYY-MM-DDZ”)
datetime(D, F)
Converts time T in epoch to a string whose format isspecified with string F. The options for F are the sameas those used by datetime(D, F).
formatdate(T, F)
37DataInsight Query Language (DQL)DQL functions
Symantec Proprietary and Confidential
Example DQL queries■ Get the name, size, active data size, percentage of data size that is active,
openness, and number of active users for each share
FROM msu
GET name, size, active_data_size,
(active_data_size*100/size) AS active_data_percent,
isopen, active_user_count;
■ Get the activity for all paths of share, share1, on March 4, 2012 between 9:00A.M. and 5:00 P.M..
FROM activity
GET path.name, user.name, operation,
formatdate(timestamp, "YYYY/MM/DD HH:mm")
IF path.msu.name = "share1" AND
timestamp >= datetime("2012/03/04 09:00", "YYYY/MM/DD HH:mm")
AND timestamp <= datetime("2012/03/04 17:00",
"YYYY/MM/DD HH:mm");
Since the timestamp columnof activity is epoch, convert it to a readable formatusing formatdate().
■ Get a list of all sensitive files from all shares of filer, filer1, sorted by size.
FROM path
GET name, issensitive, size
IF issensitive = 1 AND type = "FILE" AND device.name = "filer1"
SORTBY size DESC;
■ Get a list of all open paths and the reason why they are marked as open.
FROM path
GET name, msu.name, isopen, open_reasons
IF isopen = 1
FORMAT open_reasons AS CSV;
■ Get a list of all open paths and the reason why they are marked as open. Also,list the permissions on each open path.
DataInsight Query Language (DQL)Example DQL queries
38
Symantec Proprietary and Confidential
FROM path
GET name, msu.name, isopen, open_reasons,
permissions.user_trustee.name, permissions.group_trustee.name,
permissions.readable_permission, permissions.isinherited,
permissions.inheriting_path.name
IF isopen = 1
FORMAT permissions AS TABLE permissions
AND open_reasons AS CSV;
■ Get a list of all users, their e-mail and department (custom attributes) and thegroups that they belong to.
FROM user
GET name, sid, login, domain, "E-mail", department,
memberof.sid, memberof.name
FROM memberof AS table memberof_groups;
■ Get a list of all directories and their owners.
FROM path
GET name, msu.name, owner.user.name, owner.method,
owner.read_count, owner.write_count
IF type = "DIR"
USING owner AS calc_owner("2012-01-01", "2012-06-01",
"YYYY-MM-DD","rw_count, last_modifier");
■ Get a list of all open paths and their inactive users.
FROM path
GET name, msu.name, isopen, inactive_users.name
IF isopen = 1
USING inactive_users AS calc_inactive_users("2012-01-01",
"2012-06-01","YYYY-MM-DD");
■ For each share, get the count of paths that have permissions set on Everyone
FROM permissions
GET msu.name, count(path.id) AS risk_path_count
IF object_type = "DIR" AND group_trustee.name = "Everyone"
39DataInsight Query Language (DQL)Example DQL queries
Symantec Proprietary and Confidential
AND isinherited = 0
GROUPBY msu.name
SORTBY risk_path_count DESC;
The condition isinherited = 0 ensures that we only get the paths that havepermissions explicitly defined on Everyone and not populate all paths thatsimply inherit those permissions.
DataInsight Query Language (DQL)Example DQL queries
40
Symantec Proprietary and Confidential
Web API Specification
This chapter includes the following topics:
■ Web API specification for generic Collector service
Web API specification for generic Collector serviceThe web API for the Data Insight generic collector allows web clients to pushevents for the generic device filers configured in the Data Insight deployment. Italso provides a method to add shares for the configured filers.
The web client communicates with the Data Insight Collector node using HTTPSrequests. The HTTPS communication is based on one-way SSL authentication.The HTTP server runs with its unique self-signed SSL certificate. The SSLcertificate is created on the server when DataInsightGenericCollector service isconfigured on it. The authentication is complete when the Data Insight Collectornode verifies the identity of the web client.
Data Insight Collector node uses the following mechanism to communicate withthe web client:
1. The Data Insight server identifies the client using a login API request.
2. On successful log in, the Data Insight server returns an authentication tokenas the response. The same token is inserted into an HTTP cookie calledMATRIX_AUTH which is valid for 30 minutes. If the log in attempt isunsuccessful, an HTTP response code 401 is returned.
3. Youmust include the authentication token in each subsequent request to theData Insight server either in anHTTP request header calledMATRIX_AUTH,or in a cookie with the same name, or as an HTTP request input parameterwith the same name.
4. Each tokenhas an inactivity timeout interval of 30minutes. The token expiresif the client does not send a request for 30 minutes. In case the Data Insight
3Chapter
Symantec Proprietary and Confidential
server restarts, the client must obtain the authentication token by using thelogin API. Data Insight uses the standard HTTP status code 401 to conveythat login is required. Data Insight returns the HTTP status code 401(Unauthorized), if the client does not have the correct privileges.
5. The user principal against which log in is performed can be any valid DataInsight user with the Server Administrator role.
All URLs referenced in the documentation have the following base:
https://<hostname> :<port> /api
where <port> is the port number for DataInsightGenericCollector service. Thedefault value for port is 8585, and the port number is configurable through DataInsight Management Console.
Use the following request calls to push events to the Data Insight Collector nodeand to add the shares that you want Data Insight to monitor:
1. Login
POST /api?function=LOGIN
Request parameters
CommentDescriptionName
Data Insight user nameusername
The domain to which theuser belongs
domain
The user's passwordpassword
Optional format=jsonFormat of the responseoutput
format
Request body
Do not supply a request body for this method.
Response
Login Success
If format=json is specified, then the authentication token iswritten onHTTPresponse output in JSON format.
HTTP/1.1 200 OK
Content-Type: application/json
Web API SpecificationWeb API specification for generic Collector service
42
Symantec Proprietary and Confidential
Status: 200 OK
{"auth_token":"A2360DD2D9BB7284EF8BEB40E8DBA63F"}
If no format is specified, the authentication token iswritten onHTTP responseoutput.
If login fails, HTTP status code 401 (Unauthorized) is returned.
2. Upload Events
POST /api?function=COLLECTOR&cmd=upload_events_sqlite&event_type=<type>
Request parameters
CommentDescriptionName
Authentication tokenMATRIX_AUTH
Optional (cifs|nfs)The type of events in thefile that is uploaded on theCollector.
event_type
Request body
The request can be anHTTPmulti-part request or the request body can havethe contents of the file.
Response
If the file upload is successful, returns a response with following structure:
HTTP/1.1 200 OK
Content-Type: application/json
Status: 200 OK
{"status_code":<code>,"status_msg":"<msg>"}
Status code 0 indicates success.
On failure, returns status code 500 (Internal Server error) in case of anunexpected error.
Details of the file to be uploaded
The events filemust be a SQLiteDB file that has a single table, named events.
Table schema
43Web API SpecificationWeb API specification for generic Collector service
Symantec Proprietary and Confidential
DescriptionConstraintsTypeColumn name
Filer's address asadded to the DataInsightconfiguration.
NOT NULLTEXTfiler
An integerdescribing the eventoperation (Forexample,READ=3,WRITE=4) Pleaserefer to theProtobuf format fora complete set ofvalues.
NOT NULLINTEGERopcode
Username of theuser for CIFS(Optional). UID ofthe user in case ofan NFS event.
TEXTusername
Domain of the userfor CIFS (Optional).Blank for NFS.
TEXTdomainname
SID of the user forCIFS. Blank in caseof NFS.
TEXTsid
Path where theevent occurred.Refer to note belowfor format.
NOT NULLTEXTpathname
Applicable in caseofrename event.
TEXTrenamepath
Type of path.(FOLDER=1,FILE=2)
TEXTtype
IP address fromwhere the path wasaccessed (optional).
TEXTipaddr
Web API SpecificationWeb API specification for generic Collector service
44
Symantec Proprietary and Confidential
DescriptionConstraintsTypeColumn name
Timestampof eventin seconds as UNIXepoch.
NOT NULLINTEGERtimestamp
CREATE TABLE events (filer TEXT NOT NULL, opcode INTEGER NOT NULL, username TEXT,
domainname TEXT, sid TEXT,
pathname TEXT NOT NULL, renamepath TEXT, type TEXT, ipaddr TEXT,
timestamp INTEGER NOT NULL);
Note: For CIFS events, the SID value is mandatory; user name and domainname are optional.
ForNFSevents, SID should beblank, usernameshould be theUID, anddomainname should be blank.
For CIFS events, pathname should be the UNC path.
For NFS events, the pathname should be the absolute path of the file or thefolder.
3. Push events in JSON or Google Protocol Buffers format
POST /api?function=COLLECTOR&cmd=push_events&input_format=<format>
Request parameters
CommentDescriptionName
Authentication tokenMATRIX_AUTH
json|protoThe format in which theevents are pushed to theCollector.
input_format
Request body
The request bodymust contain the events list in the specified format, GoogleProtocol Buffers or JSON.
Response
Returns a response with following structure:
HTTP/1.1 200 OK
45Web API SpecificationWeb API specification for generic Collector service
Symantec Proprietary and Confidential
Content-Type: application/json
Status: 200 OK
{"status_code":<code>,"status_msg":"<msg>"}
Status code 0 indicates success.
On failure, returns status code400 (BAD_REQUEST) for incorrect input formatparameter.
Returns the status code 500 (Internal Server Error) in case of an unexpectederror.
Google Protocol Buffers format for pushing events to the Collector
message AuditEventsListMessage {
optional int64 device_id = 1;
optional string device_name = 2;
repeated CifsEventMessage cifs_events = 3;
repeated NfsEventMessage nfs_events = 4;
}
message CifsEventMessage {
required AccessType opcode = 1;
required string unc_path = 2;
optional string rename_path = 3;
required PathType path_type = 4;
optional string sid = 5;
optional string username = 6;
optional string domain = 7;
required uint64 timestamp_msec = 8;
optional string ip_address = 9;
}
message NfsEventMessage {
required AccessType opcode = 1;
required string path = 2;
optional string rename_path = 3;
required PathType path_type = 4;
required int64 uid = 5;
optional int64 gid = 6;
optional string domain = 7;
required uint64 timestamp_msec = 8;
optional string ip_address = 9;
}
Web API SpecificationWeb API specification for generic Collector service
46
Symantec Proprietary and Confidential
enum PathType {
UNKNOWN_PATHTYPE = -1;
FOLDER = 1;
FILE = 2;
}
enum AccessType {
CREATE = 1;
DELETE = 2;
READ = 3;
WRITE = 4;
RENAME = 5;
MKDIR = 8;
RMDIR = 9;
RENAMEDIR = 10;
SECURITY = 18;
SYMLINK = 19;
LINK = 20;
READLINK = 21;
OPEN = 200000;
}
Note: device_name is the name of the filer as added in Data Insightconfiguration.
CIFS and NFS events for a filer can be pushed by a singleAuditEventsListMessage.
For CIFS event, SID ismandatory; user name, and domain name are optional.For NFS event, SID should be blank, UID is mandatory, and domain shouldbe blank.
LINK, SYMLINK, and READLINK are specific to NFS events only.
The AccessType parameter for events like permission change or ACL changeis SECURITY.
JSON format for pushing events to the Collector
{
"deviceId": <Number>,
"deviceName": <String>,
"cifsEvents": [
<CIFS Event>
],
47Web API SpecificationWeb API specification for generic Collector service
Symantec Proprietary and Confidential
"nfsEvents": [
<NFS Event>
]
}
<CIFS Event>
{
"opcode": <String>,
"uncPath": <String>,
"renamePath": <String>,
"pathType": <String>,
"sid": <String>,
“username”: <String>,
“domain”: <String>,
"timestampMsec": <Number>,
"ipAddress": <String>
}
<NFS Event> -
{
"opcode": <String>,
"path": <String>,
"renamePath": <String>,
"pathType": <String>,
"uid": <Number>,
"gid": <Number>,
“domain”: <String>,
"timestampMsec": <Number>,
"ipAddress": <String>
}
Note: opcode and pathType fields can take only a specific set of values. Referto the protobuf enums for a description of values for each field; enumAccessType for the field opcode and enum PathType for the field pathType.
Example
{
"deviceId": 0,
"deviceName": "10.209.89.3",
"cifsEvents": [
{
Web API SpecificationWeb API specification for generic Collector service
48
Symantec Proprietary and Confidential
"opcode": "RENAME",
"uncPath": "\\\\NAMEMCCIFS1\\TestShare\\DI3.0RU1\\Data1",
"renamePath": "\\\\NAMEMCCIFS1\\TestShare\\DI3.0RU1\\Data2 ",
"pathType": "FOLDER",
"sid": "S-1-5-21-617441397-4198358099-2716562547-104771",
"timestampMsec": 1340003837,
"ipAddress": "172.31.163.29"
},
{
"opcode": "CREATE",
"uncPath": "\\\\NAMEMCCIFS1\\TestShare\\DI3.0RU1\\New Folder",
"pathType": "FOLDER",
"sid": "S-1-5-21-617441397-4198358099-2716562547-104771",
"timestampMsec": 1340003847,
"ipAddress": "172.31.163.29"
}
],
"nfsEvents": [
{
"opcode": "MKDIR",
"path": "\/openldaphome\/DIRU1",
"pathType": "FOLDER",
"uid": 0,
"gid": 0,
"domain": "0",
"timestampMsec": 1339680545
}
]
}
4. Add shares
POST /api?function=COLLECTOR&cmd=add_shares&format=<format>
Request parameters
CommentDescriptionName
Authentication tokenMATRIX_AUTH
proto|jsonFormat of the responseoutput
format
Request body
49Web API SpecificationWeb API specification for generic Collector service
Symantec Proprietary and Confidential
Supply JSON or Google Protocol Buffers formatted list of shares as input.
Response
On success, HTTP status code 200 is returned.
On failure to add shares, HTTP status code 500 (Internal server error) isreturned.
ProtoBuf format for adding shares
message SharesListMessage
{
optional int64 device_id = 1;
optional string device_name = 2;
repeated ShareMessage shares = 3;
}
message ShareMessage
{
enum ShareType
{
CIFS = 0;
NFS = 1;
}
optional string shareName = 1;
optional string sharePath = 2;
optional ShareType shareType = 3 [default = CIFS];
}
JSON format for adding shares
Shares list
{
"deviceId": {number},
"deviceName": {string},
"shares": [
]
}
JSON format for adding shares
{
"shareName": {string},
"sharePath": {string},
Web API SpecificationWeb API specification for generic Collector service
50
Symantec Proprietary and Confidential
"shareType": {string}
}
Note: The shareType parameter accepts only specific set of values. For thepossible set of values, refer enum ShareType in the Protobuf definition.
Example
{
"deviceId": 0,
"deviceName": "10.209.111.193",
"shares": [
{
"shareName": "/openldaphome",
"sharePath": "/openldaphome",
"shareType": "NFS"
},
{
"shareName": "/nfstest",
"sharePath": "/nfstest",
"shareType": "NFS"
}
]
}
Note:Data Insight scans the shares that are added onlywhen the user enablesscanning and provides the Scanner credentials for the filer.
51Web API SpecificationWeb API specification for generic Collector service
Symantec Proprietary and Confidential
Web API SpecificationWeb API specification for generic Collector service
52
Symantec Proprietary and Confidential
Creating custom scripts forremediation actions
This chapter includes the following topics:
■ About custom scripts
About custom scriptsYou can use custom scripts to extend Data Insight functionality. You can use thecustom scripts to perform the following actions:
■ To create a remediation ticket.
■ To apply remediation actions based on Data Insight recommendations.
■ To define actions to manage data.
Data is supplied to the scripts via command line arguments. Arguments varybased on what the script is used for. The scripts can be created in the .exe, .bat,.pl, or .vbs formats.
Data Insighthandles customscripts differently dependingon the typeof operation.Following list shows how Data Insight handles various types of scripts:
■ Custom scripts to create a remediation request.Data Insight invokes the script by passing in two arguments:custom_script.pl file_name <path_to_file_with_recommendation>.
For example,ticketing.pl file_name
C:\DataInsight\data\workflow\tmp\PR_ticketing_1.txt
The second argument is full path to a text file containing the permissionrecommendations. Each line in the text file contains oneactionand the required
4Chapter
Symantec Proprietary and Confidential
variables to perform that action. Lines are separated by a new line character.The script should read each line of the input file and open one or moreremediation tickets as needed. If script exits with a non-0 exit code, the actionis considered to have failed. Each line in the file is of the following format:OP:<OPCODE> PARAM:VALUE; PARAM:VALUE; ...
For example,OP:REMOVE_ACE USER:[email protected];
PATH:\\fileserver1\share1\path;
Refer to the next section for possible values for opcodes and their parameters.
■ Custom scripts to apply permission recommendations.You can specify custom scripts to directly commit changes toActiveDirectoryandCIFS file systems. Youneed to specify one script tomake changes toActiveDirectory and one script to make changes to CIFS permissions. Therecommendation is passed to the custom script as command line argumentswith following format: script.pl OP:<OPCODE> PARAM:VALUEPARAM:VALUE... Exact PARAMandVALUE depends on opcode being passed.If the script exits with non-0 code, Data Insight considers the operation tohave failed. For this release, Data Insight recommendations only consist ofremoving user or group ACE for paths, and removing user or members fromAD groups. More operations will be supported in future releases.For example,AD.pl OP:DEL_GROUP_MEMBER AD_USER:user@domain
TARGET_GROUP:group@domain;.
Data Insight will supply the following opcode and arguments for ActiveDirectory remediation:OP:DEL_GROUP_MEMBER AD_GROUP:<group@domain>|AD_USER:<user@domain>
TARGET_GROUP:<target_group@domain>
Data Insight will supply the following opcode and arguments for CIFSremediation:OP:REMOVE_ACE GROUP:<group@domain>|USER:<user@domain>
PATH:<unc_path>
■ Custom scripts to define specific tasks to manage data.Data Insight invokes the script directly passing the operation and variablesas a part of command line arguments. Path is themandatory argument passedto the script. Other parameters passed to the script depend onhow theCustomAction has been configured in the Management Console. Format of commandline arguments passed to the custom script is:script.pl path:<path> prop:val prop:val ....
For example,archive_files.pl path:\\filer\share\path.txt size:25KB
Creating custom scripts for remediation actionsAbout custom scripts
54
Symantec Proprietary and Confidential
Data Insight supports the followingproperties that canbepassed to the customscripts:
FormatProperties
NNN KB|MB|GB. E.g. 34 KBsize
NNN KB|MB|GB. E.g. 34 KBsize_on_disk
user@domain. SID if user name cannot beresolved
created_by
milliseconds since Jan 1st 1970created_on
user@domain. SID if user name cannot beresolved
last_modified_by
milliseconds since Jan 1st 1970last_modified_on
user@domain. SID if user name cannot beresolved
last_accessed_by
milliseconds since Jan 1st 1970last_accessed_on
user@domain. SID if user name cannot beresolved
data_owner
user@domain. SID if user name cannot beresolved. Multiple custodians arecomma-separated
custodian
For detailed information about how to use custom scripts for data and permissionremediation, see the Symantec Data Insight Administrator's Guide.
55Creating custom scripts for remediation actionsAbout custom scripts
Symantec Proprietary and Confidential
Creating custom scripts for remediation actionsAbout custom scripts
56
Symantec Proprietary and Confidential
Data Inventory Reportschema
This chapter includes the following topics:
■ Data Inventory report schema
Data Inventory report schemaThe Data Inventory Report is used to extract information about paths from theData Insight index. Output of this report is a sqlite database, which can be usedfor post processing as needed. When configuring report of this type, you canchoose to have the output database copied to some external location where youplan to post process the output.
file_inventory tableIn this table, there is one row for each matching file that is found in the specifiedindex dbs.
CREATE TABLE file_inventory (
xid INTEGER,
sid TEXT,
user_id INTEGER,
owner_account TEXT,
displayname TEXT,
owner_method TEXT,
bu_name TEXT,
bu_owner TEXT,
filer TEXT,
5Chapter
Symantec Proprietary and Confidential
share TEXT,
dfs_server TEXT,
dfs_share TEXT,
dfs_path TEXT,
fid INTEGER,
path TEXT,
msu_type INTEGER,
interval INTEGER,
sensitive INTEGER,
msu_id INTEGER,
read_count INTEGER,
write_count INTEGER,
file_size INTEGER,
atime INTEGER,
ctime INTEGER,
mtime INTEGER,
fs_sid TEXT);
■ The xid column can be ignored, and should always be 1.
■ The sid is typically the Windows SID of the calculated owner of the file.
■ The owner_method column indicates the ownermethod thatData Insight usedto calculate the owner.
■ The user_id is the foreign key into the fileuser table of the current version ofthe users.db stored in theDataInsight\Data\users folder. This is used for debugpurposes only.
■ The owner_account, displayname, bu_name and bu_owner columns are othercolumns from the fileuser table.
■ The filer, share, path, dfs_server, dfs_share and dfs_path columns combine togive the path to the file. The fid column is the foreign key into the fentry tableof the latest version of the index.db for this share. fid is used for debugpurposesonly.
■ Themsu_type is an integer value describing the type of share. There are fourpossible values:
■ 1 – CIFS
■ 2 – SharePoint
■ 3 – NFS
■ 8 – DFS
Data Inventory Report schemaData Inventory report schema
58
Symantec Proprietary and Confidential
■ The interval column is the foreign key into the intervals table below, based onthe last access time of the file.
■ Themsu_id is the foreign key into themsu table of the latest version of theconfig.db stored in the DataInsight\Data\conf folder.
■ Read count and write count are the aggregate number of audit events of eachtime of events over the total time period specified for this run of the report.
■ File_size is the logical file size from the file system. Atime, ctime, andmtimeare the metadata for the file also pulled from the file system.
■ The fs_sid is the SID of the file system “owner” value from the file systemmetadata.
lob tableThis table consists of a list of distinct Lines of Businesses (LOBs). Other tables usethis table in a foreign key manner.
CREATE TABLE lob (
lob_id INTEGER PRIMARY KEY,
lob_name TEXT);
user_lob tableThis table gives the mapping from users to the associated LOBs.
CREATE TABLE user_lob (
user_id INTEGER PRIMARY KEY,
lob_id INTEGER);
user_totals tableThis table gives the total numbers of files, sensitive files etc. for each user. In thefinal output, themsu_id column is displayed as empty. The user_id is the foreignkey into the fileuser table of the current version of the users.db stored in theDataInsight\Data\users folder.
CREATE TABLE user_totals (
user_id INTEGER,
msu_id INTEGER,
total_files INTEGER,
59Data Inventory Report schemaData Inventory report schema
Symantec Proprietary and Confidential
total_bytes INTEGER,
sensitive_files INTEGER,
sensitive_bytes INTEGER);
user_interval_totals tableThis table breaks out the information from theuser_totals table over each intervalspecified from the input database. The interval_id is a foreign key to the intervalstable.
CREATE TABLE user_interval_totals (
user_id INTEGER,
msu_id INTEGER,
interval_id INTEGER,
total_files INTEGER,
total_bytes INTEGER,
sensitive_files INTEGER,
sensitive_bytes INTEGER);
lob_totals tableBased on the mapping specified in the User_lob table, this table gives the totalnumbers for each LOBs. In the final output, themsu_id column will be empty.
CREATE TABLE lob_totals (
lob_id INTEGER PRIMARY KEY,
msu_id INTEGER,
total_files INTEGER,
total_bytes INTEGER,
sensitive_files INTEGER,
sensitive_bytes INTEGER);
lob_interval_totals tableThis table breaks out the information from the lob_totals table over each intervalspecified from the input database. The interval_id is a foreignkey into the intervalstable.
CREATE TABLE lob_interval_totals (
lob_id INTEGER,
msu_id INTEGER,
Data Inventory Report schemaData Inventory report schema
60
Symantec Proprietary and Confidential
interval_id INTEGER,
total_files INTEGER,
total_bytes INTEGER,
sensitive_files INTEGER,
sensitive_bytes INTEGER);
intervals tableThis table gives the beginning and end of each interval as specified in the inputdatabase. The beginning and end times are specified as epoch numbers. Forexample, the time 0 would be Midnight at Jan 1, 1970, and each higher number isone second after that.
CREATE TABLE IF NOT EXISTS intervals(
interval INTEGER, ///< 4 => most recent
///< 0 => before interval
start INTEGER, ///< start month of interval
end INTEGER); ///< end month of interval
msu_info tableThis table copies the data from the Dashboard database to specify if the msu isopen. Themsu_id column is a foreign key to the table of the latest version of theconfig.db stored in the DataInsight\Data\conf folder.
CREATE TABLE msu_info (
msu_id INTEGER PRIMARY KEY,
is_open INTEGER);
dashboard_info tableThis table is similar to themsu_info table in that it copies information from thelatest version of the Dashboard database into the report output database. Theremay be a slight mismatch in the numbers here versus the totals from theuser_totals table. This difference happens due to the difference in the time atwhich each set of numbers are calculated.
CREATE TABLE dashboard_info (
msu_id INTEGER PRIMARY KEY,
dir_files INTEGER,
dir_sens_files INTEGER,
61Data Inventory Report schemaData Inventory report schema
Symantec Proprietary and Confidential
dash_files INTEGER,
dash_sens_files INTEGER);
Report configuration parametersOne important setting for the Data Inventory report is the separate_dbsconfiguration setting. The separate_dbs setting forces the report to start a newdb file after the specified number of rows have been inserted into the detail table.The separate_dbs setting indicates how many rows should be inserted into thereport output database details section before the db is closed, renamed and a newdb is started. If the output file name specified to the report process isreport_output.db, then the separate_dbs parameter will create files namedreport_output.db.0, report_output.db.1, etc. every time the limit specified in thesetting is reached. The current db file beingwritten to is always report_output.db,and this file is where all of the summary data is written to.Whenmerge_rpt runs,it will no longer copy rows from the file_inventory table into the final output db.It will only copy rows from the user_totals, etc. tables, and then create thelob_totals, etc. tables. As in the log_level setting, report.separate_dbs is checkedfirst, and if not found, then the separate_dbs setting is checked.
You need to set this property for each Indexer node including the ManagementServer node. For example, if the ID of your Indexer node is 3, issue the followingcommands on your Management Server to set these properties for each node:
configdb -o -T node -k <nodeid> -J report.separate_dbs -j true
configdb -o -T node -k <nodeid> -J report.chunk_size -j 1000000
Data Inventory Report schemaData Inventory report schema
62
Symantec Proprietary and Confidential