symantec data insight sdk guide

62
Symantec Data Insight Programmer's Reference Guide 4.0 June 2013 Symantec Proprietary and Confidential

Upload: jarihd

Post on 12-Jan-2016

163 views

Category:

Documents


26 download

DESCRIPTION

symantec di sdk guide

TRANSCRIPT

Page 1: Symantec Data Insight SDK Guide

Symantec Data InsightProgrammer's ReferenceGuide

4.0

June 2013Symantec Proprietary and Confidential

Page 2: Symantec Data Insight SDK Guide

Symantec Data Insight Programmer's Reference GuideThe software described in this book is furnished under a license agreement andmay be usedonly in accordance with the terms of the agreement.

4.0

Documentation version: 4.0.0

Legal NoticeCopyright © 2013 Symantec Corporation. All rights reserved.

Symantec, the Symantec Logo, the Checkmark Logo and are trademarks or registeredtrademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Othernames may be trademarks of their respective owners.

This Symantec product may contain third party software for which Symantec is requiredto provide attribution to the third party (“Third Party Programs”). Some of the Third PartyPrograms are available under open source or free software licenses. The LicenseAgreementaccompanying the Software does not alter any rights or obligations you may have underthose open source or free software licenses. Please see theThird Party LegalNoticeAppendixto this Documentation or TPIP ReadMe File accompanying this Symantec product for moreinformation on the Third Party Programs.

The product described in this document is distributed under licenses restricting its use,copying, distribution, and decompilation/reverse engineering. No part of this documentmay be reproduced in any form by any means without prior written authorization ofSymantec Corporation and its licensors, if any.

THEDOCUMENTATIONISPROVIDED"ASIS"ANDALLEXPRESSORIMPLIEDCONDITIONS,REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OFMERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT,ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TOBELEGALLYINVALID.SYMANTECCORPORATIONSHALLNOTBELIABLEFORINCIDENTALOR CONSEQUENTIAL DAMAGES IN CONNECTION WITH THE FURNISHING,PERFORMANCE, OR USE OF THIS DOCUMENTATION. THE INFORMATION CONTAINEDIN THIS DOCUMENTATION IS SUBJECT TO CHANGE WITHOUT NOTICE.

The Licensed Software andDocumentation are deemed to be commercial computer softwareas defined in FAR12.212 and subject to restricted rights as defined in FARSection 52.227-19"Commercial Computer Software - Restricted Rights" and DFARS 227.7202, "Rights inCommercial Computer Software or Commercial Computer Software Documentation", asapplicable, and any successor regulations. Any use, modification, reproduction release,performance, display or disclosure of the Licensed Software andDocumentation by theU.S.Government shall be solely in accordance with the terms of this Agreement.

Symantec Proprietary and Confidential

Page 3: Symantec Data Insight SDK Guide

Symantec Corporation350 Ellis StreetMountain View, CA 94043

http://www.symantec.com

Symantec Proprietary and Confidential

Page 4: Symantec Data Insight SDK Guide

Technical SupportSymantec Technical Support maintains support centers globally. TechnicalSupport’s primary role is to respond to specific queries about product featuresand functionality. TheTechnical Support group also creates content for our onlineKnowledge Base. The Technical Support group works collaboratively with theother functional areas within Symantec to answer your questions in a timelyfashion. For example, theTechnical Support groupworkswithProductEngineeringand Symantec Security Response to provide alerting services and virus definitionupdates.

Symantec’s support offerings include the following:

■ A range of support options that give you the flexibility to select the rightamount of service for any size organization

■ Telephone and/or Web-based support that provides rapid response andup-to-the-minute information

■ Upgrade assurance that delivers software upgrades

■ Global support purchased on a regional business hours or 24 hours a day, 7days a week basis

■ Premium service offerings that include Account Management Services

For information about Symantec’s support offerings, you can visit our website atthe following URL:

www.symantec.com/business/support/

All support services will be delivered in accordance with your support agreementand the then-current enterprise technical support policy.

Contacting Technical SupportCustomers with a current support agreement may access Technical Supportinformation at the following URL:

www.symantec.com/business/support/

Before contacting Technical Support, make sure you have satisfied the systemrequirements that are listed in your product documentation. Also, you should beat the computer onwhich the problemoccurred, in case it is necessary to replicatethe problem.

When you contact Technical Support, please have the following informationavailable:

■ Product release level

Symantec Proprietary and Confidential

Page 5: Symantec Data Insight SDK Guide

■ Hardware information

■ Available memory, disk space, and NIC information

■ Operating system

■ Version and patch level

■ Network topology

■ Router, gateway, and IP address information

■ Problem description:

■ Error messages and log files

■ Troubleshooting that was performed before contacting Symantec

■ Recent software configuration changes and network changes

Licensing and registrationIf yourSymantecproduct requires registrationor a licensekey, access our technicalsupport Web page at the following URL:

www.symantec.com/business/support/

Customer serviceCustomer service information is available at the following URL:

www.symantec.com/business/support/

Customer Service is available to assist with non-technical questions, such as thefollowing types of issues:

■ Questions regarding product licensing or serialization

■ Product registration updates, such as address or name changes

■ General product information (features, language availability, local dealers)

■ Latest information about product updates and upgrades

■ Information about upgrade assurance and support contracts

■ Information about the Symantec Buying Programs

■ Advice about Symantec's technical support options

■ Nontechnical presales questions

■ Issues that are related to CD-ROMs, DVDs, or manuals

Symantec Proprietary and Confidential

Page 6: Symantec Data Insight SDK Guide

Support agreement resourcesIf youwant to contact Symantec regarding an existing support agreement, pleasecontact the support agreement administration team for your region as follows:

[email protected] and Japan

[email protected], Middle-East, and Africa

[email protected] America and Latin America

Symantec Proprietary and Confidential

Page 7: Symantec Data Insight SDK Guide

Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Chapter 1 About this guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

How this guide is organized .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Chapter 2 DataInsight Query Language (DQL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

About Data Insight Query Language (DQL) ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11DQL Objects/Tables ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11About DQL Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

device Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13msu Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14user Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15groups Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16path Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17dfspath Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19owner Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22activity Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23permission Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24custodian Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

DQL Query Syntax .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26FROM clause .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26GET clause .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26FORMAT clause .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27IF clause .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30USING clause .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32HAVING clause .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35GROUPBY clause .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35SORTBY clause .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36LIMIT clause .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

DQL functions .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Example DQL queries ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Chapter 3 Web API Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Web API specification for generic Collector service ... . . . . . . . . . . . . . . . . . . . . . . . . . 41

Contents

Symantec Proprietary and Confidential

Page 8: Symantec Data Insight SDK Guide

Chapter 4 Creating custom scripts for remediationactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

About custom scripts ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Chapter 5 Data Inventory Report schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Data Inventory report schema .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57file_inventory table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57lob table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59user_lob table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59user_totals table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59user_interval_totals table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60lob_totals table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60lob_interval_totals table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60intervals table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61msu_info table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61dashboard_info table ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61Report configuration parameters ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Contents8

Symantec Proprietary and Confidential

Page 9: Symantec Data Insight SDK Guide

About this guide

This chapter includes the following topics:

■ How this guide is organized

How this guide is organizedThis document contains a general description of the content and usage of theData InsightSoftwareDeveloper’sKit (SDK). Eachchapter introduces anddiscussesa Data Insight feature, its possible uses, and a description of how to use theapplication programming interface for custom operations. The SDK containsspecific programming examples using these interfaces.

This guide provides an overview of the following Data Insight features that areaccessible with the SDK:

■ DataInsightQueryLanguage (DQL) -UseDQL to create queries for thepurposeof creating customized reports.See “About Data Insight Query Language (DQL)” on page 11.

■ The generic device web API - Use the API to extend platform support for thestorage devices that Data Insight monitors.See “Web API specification for generic Collector service” on page 41.For information about configuring a generic device in Data Insight andcredentials required to monitor the device, see the Symantec Data InsightAdministrator's Guide.

■ Customscripts - Create scripts to define specific actions to handle remediation.See “About custom scripts” on page 53.To configure Data Insight to invoke these scripts to complete the customactions, see the Symantec Data Insight Administrator's Guide.

■ Schema of the Data Inventory Report.

1Chapter

Symantec Proprietary and Confidential

Page 10: Symantec Data Insight SDK Guide

About this guideHow this guide is organized

10

Symantec Proprietary and Confidential

Page 11: Symantec Data Insight SDK Guide

DataInsight QueryLanguage (DQL)

This chapter includes the following topics:

■ About Data Insight Query Language (DQL)

■ DQL Objects/Tables

■ About DQL Columns

■ DQL Query Syntax

■ DQL functions

■ Example DQL queries

About Data Insight Query Language (DQL)Data Insight Query Language(DQL) is a structured language to retrieve theinformation that is stored in the Data Insight indices. Indices are the proprietaryinternal data stores, that Data Insight use for storing information. DQL does notprovide the full functional capability of SQL, but it is expressive enough to allowthe users to easily extract, group, sort, and aggregate data.

DQL is a query-only language. You cannot use DQL to modify the Data Insightindices. DQLqueries are also protected by role-based-access-control, whichmeansthat you can only see the information that you have access to.

DQL Objects/TablesWith DQL, you can run a query on objects and retrieve other objects as results. Ifyou are familiar with the SQL language, an object in DQL is similar to a table in

2Chapter

Symantec Proprietary and Confidential

Page 12: Symantec Data Insight SDK Guide

SQL. The attributes of an object are similar to the columnsof the table. The outputof a DQL query is a relational database table with attribute values as columnvalues.

The complete list of DQL tables and their brief description is as shown:

Describes the details of the storage devices or content repositoryservers that Data Insight monitors. For example, a NetApp or EMCfiler, a Windows File Server or a SharePoint web application.

device

Describes the details of the Data Insight storage units. An msu is aunit of storage space which can be a file share (in case of CIFS orNFS) or a site-collection (in case of SharePoint

msu

Describes the details of the file or directory paths to individualmsus.path

Describes the details of the DFS file or directory paths to individualmsus

dfspath

Describes the details of the computed owners of a file or directorypaths.

owner

Describes the details of the users that are listed in directory servicessuch as Active Directory, LDAP, or NIS+ directory server.

user

Describes the details of the groups that are listed in directory servicessuch as Active Directory, LDAP, or NIS+.

groups

Describes the details of the activity events on specific paths, thataremade by specific users, at specific times. For example, an activityobject can describe the following: file \\netapp1\mydocs\MarketResearch.docwas read byuser JohnSmithat 1334123700 (Wed, 11thApril 2012 05:55:00 GMT).

activity

Describes the details of the NTFS or UNIX permissions that are seton directory or file paths.

permission

Describes the details of the custodians that are assigned to devices,msus, directories, or files.

custodian

In the above mentioned list of objects, the owner object differs from the rest ofthe object – it is a computed object. Owner objects are not first class objects thatare stored in the Data Insight indices. They are computed at run-time dependingon the method that is to be used to calculate file ownership.

DataInsight Query Language (DQL)DQL Objects/Tables

12

Symantec Proprietary and Confidential

Page 13: Symantec Data Insight SDK Guide

About DQL ColumnsUnlike a SQL table whose columns can only contain a single value, a DQL tablecanhave columnswithmultiple values. For example, the groupDomainUsershasmultiple values for its columnmemberusers. A pair of square brackets around thecolumn name is used to indicate that the column is multi-valued.

With DQL, you can have a table with the columns that refer to other tables. Forexample, the table groups has a columnmemberusers which refers to rows fromthe user table. When you retrieve such reference columns, you need to specifywhat columns you want to retrieve from the referred table. For example, youcannot retrievememberusers from groups without specifying which columns ofthe user table you are interested in. So, you can selectmemberusers.name ormemberusers.sid but not justmemberusers.

device Columns

DescriptionTypeColumn

Unique identifier for thisdevice.

Integerid

Name of this device.Stringname

Type of device (NetApp,Celerra, WinNAS,SharePoint).

Stringtype

Name of Collector node.Stringcollector

Name of Indexer node.Stringindexer

List of custodians for thisdevice.

[Custodian Object][custodians]

Storage capacity of thisdevice.

Integercapacity

The total amount of spacethat all files and folders onthis device consume.

Integerused_space

Number of shares on thisdevice.

Integershare_count

Number of shares on thisdevice that are marked asopen.

Integeropen_share_count

13DataInsight Query Language (DQL)About DQL Columns

Symantec Proprietary and Confidential

Page 14: Symantec Data Insight SDK Guide

DescriptionTypeColumn

Total size of all open shareson this device.

Integeropen_share_data_size

Total file count of all openshares on this device.

Integeropen_share_file_count

Total file count of all shareson this device.

Integerfile_count

Number of sensitive filesacross all shares on thisdevice.

Integersensitive_file_count

Total folder count of allshares on this device.

Integerfolder_count

Total activity count on thisdevice (the activity count iscalculated for the last sixmonths).

Integeractivity_count

msu Columns

DescriptionTypeColumn

Unique identifier for thismsu.

Integerid

Name of this msu.Stringname

Type of msu (CIFS, NFSv3,SharePoint).

Stringtype

Device that thismsu belongsto.

Device Objectdevice

Name of Indexer node.Stringindexer

Path to index directory.Stringindexdir

List of custodians for thismsu.

[Custodian Object][custodians]

List of permissions for thismsu (Share-levelpermissions).

[Permission Object][permissions]

DataInsight Query Language (DQL)About DQL Columns

14

Symantec Proprietary and Confidential

Page 15: Symantec Data Insight SDK Guide

DescriptionTypeColumn

1 ifmsu is open, otherwise 0.Integerisopen

Total activity count in thelast six months.

Integeractivity_count

Number of users who wereactive in the last six months.

Integeractive_user_count

Timeof last recorded activityon this msu.

Integerlast_activity_time

Total size of this msu.Integersize

Total size of all active fileson this msu.

Integeractive_data_size

Number of files on this msu.Integerfile_count

Number of sensitive files onthis msu.

Integersensitive_file_count

Number of folders on thismsu.

Integerfolder_count

User who is most active onthis msu.

User Objectmost_active_user

user Columns

DescriptionTypeColumn

Unique identifier of the user.Stringsid

Full name of the user (e.g.,John Smith).

Stringname

Login of the user.Stringlogin

Domain that theuser belongsto

Stringdomain

First name of the user.Stringfirstname

Last name of the user.Stringlastname

1 if the user is disabled, 0otherwise.

Integerisdisabled

15DataInsight Query Language (DQL)About DQL Columns

Symantec Proprietary and Confidential

Page 16: Symantec Data Insight SDK Guide

DescriptionTypeColumn

1 if the user is deleted fromAD/LDAP, 0 otherwise.

Integerisdeleted

Name of the business unitthat this user belongs to.

Stringbuname

Owner of the business unitthat user belongs to.

Stringbuowner

Groups of which this user isa member of.

[Group Object][memberof]

Customattribute of the user.Replace <custom-attr> withnameof customattribute, forexample, department. If thename contains specialcharacters like -,*,%,^,/, etc.enclose the name in quotes.For example, "E-mail".

[String]<custom-attr>

groups Columns

DescriptionTypeColumn

Unique identifier of thisgroup.

Stringsid

Name of this group.Stringname

Domain of this group.Stringdomain

1 if the Group is disabled, 0otherwise.

Integerisdisabled

1 if the Group is deleted, 0otherwise.

Integerisdeleted

Groups of which this groupis a member of.

[Group Object][memberof]

Users who are members ofthis group.

[User Object][memberusers]

Groups who are members ofthis group.

[Group Object][membergroups]

DataInsight Query Language (DQL)About DQL Columns

16

Symantec Proprietary and Confidential

Page 17: Symantec Data Insight SDK Guide

DescriptionTypeColumn

Custom attribute of Group.Replace <custom-attr> withnameof customattribute, forexample, location. If thename contains specialcharacters like -,*,%,^,/, etc.enclose the name in quotes.For example, "E-mail".

[String]<custom-attr>

path Columns

DescriptionTypeColumn

Name of path relative to themsu.

Stringname

Absolute name of the pathcontaining device and sharenames – for example,\\filer1\share100\a\b.

Stringabsname

Unique identifier for thispath within the msu.

Integerid

Parent path of this path.Path Objectparent

DIR for directory, FILE forfile.

Stringtype

Thedevice towhich this pathbelongs.

Device Objectdevice

The msu to which this pathbelongs.

msu Objectmsu

Size of path in bytes. Fordirectories it is the size of allfiles under the entire subtree.

Integersize

Timestampofwhen this pathwas last accessed.Timestampis measured as the numberof seconds that have elapsedsincemidnightUTC, January1st, 1970.

Integerlast_accessed

17DataInsight Query Language (DQL)About DQL Columns

Symantec Proprietary and Confidential

Page 18: Symantec Data Insight SDK Guide

DescriptionTypeColumn

Timestampofwhen this pathwas last modified.

Integerlast_modified

Timestampofwhen this pathwas created.

Integercreated_on

User who last accessed thispath.

User Objectlast_accessor

User who last modified thispath.

User Objectlast_modifier

User who created this path.User Objectcreator

Group creator of this path.Group Objectcreator_group

Computed Owner of thispath.

Owner Objectowner

1 if the path is deleted, 0otherwise.

Integerisdeleted

Depth of the path from theroot of the share. Forexample, ‘/’ has a depth of 0,‘/a’ has a depth of 1, ‘/a/b’has a depth of 2.

Integerdepth

Total activity count on thispath.

Integeractivity_count

1 if the path is open, 0otherwise.

Integerisopen

Reasons why the path isconsidered open.

[String][open_reasons]

List of permissions on thispath.

[Permission Object][permissions]

List of custodians for thispath.

[Custodian Object][custodians]

1 if the path is sensitive, 0otherwise.

Integerissensitive

List of filegroups for thispath.

[String][filegroups]

DataInsight Query Language (DQL)About DQL Columns

18

Symantec Proprietary and Confidential

Page 19: Symantec Data Insight SDK Guide

DescriptionTypeColumn

File extension for this path.For example PST, DOC etc.

Stringextension

List of DFS names for thispath.

[String][dfsnames]

List of users who havepermissions to access thispath.

[User Object][permitted_users]

Number of users who havepermissions to access thispath.

Integerpermitted_users_count

List of users who are activeon this path.

[User Object][active_users]

Number of users who areactive on this path.

Integeractive_users_count

List of userswho are inactiveon this path.

[User Object][inactive_users]

Number of users who areinactive on this path.

Integerinactive_users_count

List of DLP policies violatedby this path.

[String][dlp_policies]

1 if the path is a controlpoint, 0 otherwise.

Integeriscontrol_point

Reasons why the path isconsidered a control point.

[String][control_point_reasons]

Owner specified by theNTFSfile system.

User Objectfilesystem_owner

dfspath Columns

DescriptionTypeColumn

Name of DFS path relative tothe msu.

Stringname

19DataInsight Query Language (DQL)About DQL Columns

Symantec Proprietary and Confidential

Page 20: Symantec Data Insight SDK Guide

DescriptionTypeColumn

Absolute name of the DFSpath containing device andshare names – for example,\\dfsfiler1\dfsshare100\a\b.

Stringabsname

Unique identifier for thispath within the msu.

Integerid

Parent DFS path of this DFSpath.

DFS Path Objectparent

Absolute name of thephysical path that this DFSpath maps to.

Stringphysicalname

DIR for directory, FILE forfile.

Stringtype

The device towhich this DFSpath belongs.

Device Objectdevice

The msu to which this DFSpath belongs.

msu Objectmsu

Size of path in bytes. Fordirectories it is the size of allfiles under the entire subtree.

Integersize

Timestampofwhen this pathwas last accessed.Timestampis measured as the numberof seconds that have elapsedsincemidnightUTC, January1st, 1970.

Integerlast_accessed

Timestampofwhen this pathwas last modified.

Integerlast_modified

Timestampofwhen this pathwas created.

Integercreated_on

User who last accessed thispath.

User Objectlast_accessor

User who last modified thispath.

User Objectlast_modifier

User who created this path.User Objectcreator

DataInsight Query Language (DQL)About DQL Columns

20

Symantec Proprietary and Confidential

Page 21: Symantec Data Insight SDK Guide

DescriptionTypeColumn

Group creator of this path.Group Objectcreator_group

Computed Owner of thispath.

Owner Objectowner

1 if the path is deleted, 0otherwise

Integerisdeleted

Depth of the path from theroot of the share. Forexample, ‘/’ has a depth of 0,‘/a’ has a depth of 1, ‘/a/b’has a depth of 2.

Integerdepth

Total activity count on thispath.

Integeractivity_count

1 if the path is open, 0otherwise.

Integerisopen

Reasons why the path isconsidered open.

[String][open_reasons]

List of permissions on thispath.

[Permission Object][permissions]

List of custodians for thispath.

[Custodian Object][custodians]

1 if the path is sensitive, 0otherwise.

Integerissensitive

List of filegroups for thispath.

[String][filegroups]

File extension for this path.For example PST, DOC etc.

Stringextension

List of users who havepermissions to access thispath.

[User Object][permitted_users]

Number of users who havepermissions to access thispath.

Integerpermitted_users_count

List of users who are activeon this path.

[User Object][active_users]

21DataInsight Query Language (DQL)About DQL Columns

Symantec Proprietary and Confidential

Page 22: Symantec Data Insight SDK Guide

DescriptionTypeColumn

Number of users who areactive on this path.

Integeractive_users_count

List of userswho are inactiveon this path.

[User Object][inactive_users]

Number of users who areinactive on this path.

Integerinactive_users_count

List of DLP policies violatedby this path.

[String][dlp_policies]

1 if the path is a controlpoint, 0 otherwise.

Integeriscontrol_point

Reasons why the path isconsidered as a control point.

[String][control_point_reasons]

Owner specified by theNTFSfilesystem.

User Objectfilesystem_owner

owner Columns

DescriptionTypeColumn

Path for which the owner iscomputed.

Path Objectpath

DFS path for which theowner is computed.

DFS Path Objectdfspath

The computed owner of thepath.

User Objectuser

Number of read accessesmade by this user.

Integerread_count

Number of write accessesmade by this user.

Integerwrite_count

DataInsight Query Language (DQL)About DQL Columns

22

Symantec Proprietary and Confidential

Page 23: Symantec Data Insight SDK Guide

DescriptionTypeColumn

Themethod that was used tocompute this owner. Possiblevalues are creator,read_count, write_count,rw_count, last_accessor,last_modifier, andparent_owner.

Stringmethod

activity Columns

DescriptionTypeColumn

Timestamp of the activity.Timestamp is measured asthe number of seconds thathave elapsed since midnightUTC, January 1st, 1970.

Integertimestamp

Number of seconds sincetimestamp that this activityevent might have happened.

Integertimerange

User who initiated thisactivity event.

User Objectuser

Path on which this activityevent occurred.

Path Objectpath

DFS path on which thisactivity event occurred.

DFS Path Objectdfspath

Integer representing theactivity event.

Integeropcode

String notation of theactivity event (e.g., read,write, create, delete,mkdir,rmdir etc.).

Stringoperation

Number of times thisoperation was performed inthe timerange

Integercount

IP address from where theoperation was performed.

Stringipaddr

23DataInsight Query Language (DQL)About DQL Columns

Symantec Proprietary and Confidential

Page 24: Symantec Data Insight SDK Guide

DescriptionTypeColumn

For rename or moveoperations the target path towhich this path wasrenamed.

Path Objectrename_target

For rename or moveoperations the target DFSpath to which this DFS pathwas renamed.

DFS Path Objectdfs_rename_target

permission Columns

DescriptionTypeColumn

Type of object on which thispermission is set (msu, DIR).

Stringobject_type

Path on which thepermission is set.

Path Objectpath

DFS path on which thepermission is set.

DFS Path Objectdfspath

The msu on which thepermission is set

msu Objectmsu

Type of trustee (user, group).Stringtrustee_type

Trustee of this permission.User Objectuser_trustee

Trustee of this permission.Group Objectgroup_trustee

Permission bitmask.Integerpermission_mask

List of readable permissions– read,write, full control etc.

Stringreadable_permission

Type of permission (GRANT,DENY).

Stringtype

1 if the permission isinherited from parent.

Integerisinherited

Type of object from whichthis permission is inherited(msu, DIR).

Stringinheriting_type

DataInsight Query Language (DQL)About DQL Columns

24

Symantec Proprietary and Confidential

Page 25: Symantec Data Insight SDK Guide

DescriptionTypeColumn

Path from which thispermission is inherited.

Path Objectinheriting_path

DFS path from which thispermission is inherited.

DFS Path Objectinheriting_dfspath

msu from which thispermission is inherited.

msu Objectinheriting_msu

Inheritance settings for thispermission (e.g. this folder,all subfolders,only immediatefiles).

Stringappliesto

custodian Columns

DescriptionTypeColumn

Path on which the custodianis assigned.

Path Objectpath

DFS path on which thecustodian is assigned.

DFS Path Objectdfspath

msu on which the custodianis assigned.

msu Objectmsu

Device on which thecustodian is assigned.

Device Objectdevice

DFS link on which thecustodian is assigned.

Stringdfslink

The assigned custodian ofthe path.

User Objectuser

1 if the custodian is inheritedfrom a parent (device, msu,dir, dfslink).

Integerisinherited

Type of object from whichthe custodian is inherited(device, msu, dir, dfslink).

Stringinheriting_type

Path from which thecustodian is inherited.

Path Objectinheriting_path

25DataInsight Query Language (DQL)About DQL Columns

Symantec Proprietary and Confidential

Page 26: Symantec Data Insight SDK Guide

DescriptionTypeColumn

DFS path from which thecustodian is inherited.

DFS Path Objectinheriting_dfspath

msu from which thecustodian is inherited.

msu Objectinheriting_msu

Device from which thecustodian is inherited.

Device Objectinheriting_device

DFS link from which thecustodian is inherited.

Stringinheriting_dfslink

DQL Query SyntaxThe DQL query syntax and top-level grammatical constructs are as shown:

FROM <table>

GET <column expression> [AS alias], <column expression> [AS alias], ...

[IF <condition>]

[USING <definition>]

[FORMAT <column> AS (CSV|TABLE <tablename>) [<count>]]

[GROUPBY <column expression>, <column expression>, ...]

[HAVING <aggregate-condition>]

[SORTBY <column expression> [ASC|DESC]]

[LIMIT [<offset>,]<count>];

FROM clauseThe FROM specifies the table from which DQL retrieves the data. DQL does notsupport joins as in SQL. You can only specify one table in the FROM clause.

GET clauseTheGET clause specifies the columns (or expressions on columns) that you wantto retrieve from the table that you specify in the FROM clause.

DQL tables can have columns that refer to other tables. For example, the tablegroups has a columnmemberusers which refers to rows from table user. Whenyou retrieve such reference columns, you need to specify what columns youwantto retrieve from the referred table. For example, you cannot retrievememberusersfrom groupswithout specifyingwhich columns of theuser table you are interested

DataInsight Query Language (DQL)DQL Query Syntax

26

Symantec Proprietary and Confidential

Page 27: Symantec Data Insight SDK Guide

in. So, you can selectmemberusers.name ormemberusers.sid but not justmemberusers.

The column names in the output table are decided by the expressions used in theGET clause. While displaying the output, DQL may optionally replace the period( . ) with the underscore ( _ ). For example, for GET path.name, the output columnname in the SQLite database becomes path_name.

FORMAT clauseData Insight tables can containmulti-valued columns. For example, path containsa multivalued column permissions. When you specify the columns in the GETclause, you also need to specify the manner in which you want their values toappear in the output database table. Use the FORMAT clause to control the formatof the output in case ofmulti-valued columns. You can use two formatting optionsas shown below:

FORMAT <column> AS CSV

The above syntax displays the output values for a multi-valued column as acomma-separated list in a single column.

FORMAT <column> AS TABLE <tablename>

The above syntax displays the output values for a multi-valued column in aseparate table. Each row of this table contains a reference to its correspondingrow in the parent table.

The default value for the FORMAT clause is a TABLE. If you do not provide aFORMAT clause in your query, DQL displays the contents of the multi-valuedcolumns in separate tables. And thenameof themulti-valued column is displayedas the default name of the table. For example, if you want to retrieve pathpermissions and you do not specify the FORMAT clause, DQL displays the outputthe permissions of a path in a separate table called permissions.

Consider this example:

FROM groups

GET name, memberusers.sid, memberusers.name

FORMAT memberusers AS CSV

Sincememberusers is amulti-valued column, theFORMATclause onmemberusersneeds to be specified

The above query creates an output table groups containing four columns –groups_rowid, name,memberusers_sid,memberusers_name. The columngroups_rowid is a default column present in all DQL output tables, containing an

27DataInsight Query Language (DQL)DQL Query Syntax

Symantec Proprietary and Confidential

Page 28: Symantec Data Insight SDK Guide

identification number for each rows. The columnsmemberusers_sid andmemberusers_name contains a comma-separated list of member user sids andnames.

Example output table is as shown below:

groups

memberusers_namememberusers_sidnamegroups_rowid

John,Jim,Paul,SteveS-1,S-2,S-10,S-11Domain Users1

Paul,JaneS-10,S-12HR_Global2

PaulS-10HR_US3

Suppose that you change the query to:

FROM groups

GET name, memberusers.sid, memberusers.name

FORMAT memberusers AS TABLE memberusers

In this case, the output database contains two tables – groups andmemberusers.The groups table has two columns – groups_rowid and name. Thememberuserstable has three columns groups_rowid,memberusers_sid,memberusers_name.The groups_rowid column in thememberusers table is a reference to thegroups_rowid column from the groups table.

Example output tables are as shown below:

groups

namegroups_rowid

Domain Users1

HR_Global2

HR_US3

memberusers

memberusers_namememberusers_sidgroups_rowid

JohnS-11

DataInsight Query Language (DQL)DQL Query Syntax

28

Symantec Proprietary and Confidential

Page 29: Symantec Data Insight SDK Guide

JimS-21

PaulS-101

SteveS-111

PaulS-102

JaneS-122

PaulS-103

By default, DQL lists allmemberusers of a group. Optionally, you can limit thenumber ofmemberusers listed using the FORMAT clause. This is as shown in thefollowing query:

FROM group

GET name, memberusers.sid, memberusers.name

FORMAT memberusers AS CSV 4

This limits the output table to a maximum of four member user values for eachgroup. These four values are the first four members of the list.

Nested multi-valued columnsThere may be situations where you need to specify nested multi-valued columns.For example, the path table has a multi-valued column active_users, which is areference to user table. The table user, in turn, has a multi-valued columnmemberofwhich indicates the groups that a user belongs to. If you want to get allactive users for a path and the groups that each active user belongs to, write yourquery as shown below.

FROM path

GET name, active_users.name, active_users.memberof.name

FORMAT active_users AS CSV AND

active_users.memberof AS CSV;

In this query’s output table, the third column active_users_memberof_name listsall the groups of all the path’s active users. For example, suppose that path /foohas active users Joe and Jane. Suppose that Joe belongs to groups HR andALL-Employees, while Jane belongs to groups Finance and ALL-Employees. Theoutput column for this query will then be HR, ALL-Employees, Finance,ALL-Employees.

29DataInsight Query Language (DQL)DQL Query Syntax

Symantec Proprietary and Confidential

Page 30: Symantec Data Insight SDK Guide

Notice that you have a flat list of all group names in this column. You have lostinformation aboutwhat groups each of the active users belongs to. You only knowthat there is one active userwhobelongs toHR, twowhobelong toALL-Employeesand one who belongs to Finance.

IF clauseThe IF clause is an optional clause that you can use to specify a set of conditionson the rows that you want to retrieve. It is similar to the WHERE clause of SQL.DQL retrieves only those rows whose columns satisfy the condition(s) providedunder the IF clause.

OperatorsDQL supports the following binary operators that you can use to specify acondition:

■ Comparison operators: >, <, >=, <=, =, ==, !=, <>

■ Logical operators: AND, &&, OR, ||

■ Arithmetic operators: +, -, *, /, %

■ List containment operators: IN, NOT IN

ConstantsDQL’s IF clause supports specification of constants in operations. Constants canbe eithernumeric or string. Someexample of supported column-related operationsare as shown below:

■ IF size/1024 > 10

■ IF size = 10

■ IF name IN (“John”, “Joe”)

Note that string comparisons are case insensitive by default. To specify casesensitive or case insensitive comparisons, you can use the CASE SENSITIVE andCASE INSENSITIVE keywords.

■ IF name IN (“John”, “Joe”) CASE SENSITIVE

■ IF name = “John” CASE INSENSITIVE

Conditions on multi-valued columnsYou can use EACH or ANY prefixes to specify the conditions on multi-valuedcolumns.EACH specifies that each value of themulti-valued columnshould satisfy

DataInsight Query Language (DQL)DQL Query Syntax

30

Symantec Proprietary and Confidential

Page 31: Symantec Data Insight SDK Guide

the condition while ANY specifies that any value of the multi-valued columnshould satisfy the condition.

Suppose that you want to retrieve only those paths on which the user John isactive. You can write a query as shown below.

FROM path

GET name, active_users.name

IF ANY active_users.name = "John"

FORMAT active_users AS CSV;

Suppose that you want to retrieve paths on which either John or Joe are active.You can write a query (query a) as shown below.

FROM path

GET name, active_users.name

IF ANY active_users.name IN ("John","Joe")

FORMAT active_users AS CSV;

The above query retrieves the paths onwhich either John is one of the active usersand/or Joe is one of the active users.

Suppose that you want to retrieve the paths that only have John and Joe as activeusers. You can write a query (query b) as shown below.

FROM path

GET name, active_users.name

IF EACH active_users.name IN ("John","Joe")

FORMAT active_users AS CSV;

The above query retrieves paths where the only active users are John and/or Joe.

Note that in query (a), you get the paths on which John or Joe is one of the activeusers whereas in query (b), you get the paths on which John and/or Joe are theonly active users.

Conditions on nested multi-valued columnsSincenestedmulti-valued columns evaluate to a flat list, you can specify conditionson them using the ANY and EACH constructs as above.

For example, suppose that you want to retrieve those paths containing at leastone active user belonging to group HR. You can write a query (query b) as shownbelow.

31DataInsight Query Language (DQL)DQL Query Syntax

Symantec Proprietary and Confidential

Page 32: Symantec Data Insight SDK Guide

FROM path

GET name, active_users.memberof.name

IF ANY active_users.memberof.name = "HR"

FORMAT active_users.memberof AS CSV;

Suppose that youwant to retrieve those paths containing active userswho belongonly to groups HR and/or FINANCE. You can write a query (query b) as shownbellow.

FROM path

GET name, active_users.memberof.name

IF EACH active_users.memberof.name IN ("HR", "FINANCE")

FORMAT active_users.memberof AS CSV;

Note that DQL by default uses the ANY construct if you do not specify anANY/EACH construct.

USING clauseValues of certain columns like owner are computed at run-time based on somecriteria. For example, to compute an owner of a file, you need to specify whatmethods (like read_count, rw_count, parent_owner etc.) you want to use todetermine the owner. When you determine active users of a path, you need tospecify the time range you want to consider for the activity.

You can use the USING clause to specify such functions that can be applied toobtain a column value.

The details of the DQL USING functions are as shown below.

Calculating the owner

calc_owner(start_time TEXT, end_time TEXT, date_format TEXT,

ordered_list_of_owner_methods TEXT)

Example usage in query:

FROM path

GET name, owner.user.name, owner.method, owner.read_count

USING owner AS calc_owner("2012-01-01", "2012-06-01", "YYYY-MM-DD",

"rw_count, read_count, last_accessor");

DataInsight Query Language (DQL)DQL Query Syntax

32

Symantec Proprietary and Confidential

Page 33: Symantec Data Insight SDK Guide

If you don’t specify a USING function for owner, DQL uses a default time rangeof last 6 months and uses a data owner ordering of rw_count, write_count,read_count, last_modifier, last_accessor, creator, parent_owner.

Calculating the active_users

calc_active_users(start_time TEXT, end_time TEXT, date_format TEXT)

Example usage in query:

FROM path

GET name, active_users.name

USING active_users AS

calc_active_users("2012-01-01", "2012-06-01", "YYYY-MM-DD")

FORMAT active_users AS CSV;

If you don’t specify a USING function for active_users, DQL uses a default timerange of last 6 months.

Calculating the active_users_count

get_active_users_count(start_time TEXT, end_time TEXT, date_format TEXT)

Example usage in query:

FROM path

GET name, active_users_count

USING active_users_count AS

get_active_users_count("2012-01-01", "2012-06-01", "YYYY-MM-DD");

If you don’t specify a USING function for active_users_count, DQL uses a defaulttime range of last 6 months.

Calculating the inactive_users

calc_inactive_users(start_time TEXT, end_time TEXT, date_format TEXT)

Example usage in query:

FROM path

33DataInsight Query Language (DQL)DQL Query Syntax

Symantec Proprietary and Confidential

Page 34: Symantec Data Insight SDK Guide

GET name, inactive_users.name

USING inactive_users AS

calc_inactive_users("2012-01-01", "2012-06-01", "YYYY-MM-DD")

FORMAT inactive_users AS CSV;

If you don’t specify aUSING function for inactive_users, DQL uses a default timerange of last 6 months for calculating inactivity.

Calculating the inactive_users_count

get_inactive_users_count(start_time TEXT, end_time TEXT, date_format TEXT)

Example usage in query:

FROM path

GET name, inactive_users_count

USING inactive_users_count AS

get_inactive_users_count("2012-01-01", "2012-06-01", "YYYY-MM-DD");

If youdon’t specify aUSING function for inactive_users_count, DQLuses a defaulttime range of last 6 months for calculating inactivity.

Calculating the activity_count

get_activity_count(start_time TEXT, end_time TEXT, date_format TEXT)

Example usage in query:

FROM path

GET name, activity_count

USING activity_count AS

get_activity_count("2012-01-01 10:00", "2012-01-01 15:00",

"YYYY-MM-DD HH:mm");

If you don’t specify a USING function for activity_count, DQL uses a default timerange of last 6 months for calculating activity.

DataInsight Query Language (DQL)DQL Query Syntax

34

Symantec Proprietary and Confidential

Page 35: Symantec Data Insight SDK Guide

HAVING clauseTheHAVING clause is similar to the SQLHAVING clause and allows specificationof conditions onaggregate functions. The syntaxof conditions that canbe specifiedin the HAVING clause is the same as that of the DQL IF clause.

Suppose that you want to retrieve the sum of the sizes of all shares for each filer.You can write a query for this as shown bellow:

FROM msu

GET filer.name, sum(size)

GROUPBY filer.name;

Now suppose that you want to select only those filers whose sum of share sizesis greater than 1 GB (1,073,741,824 bytes). Then you need to modify the previousquery as:

FROM msu

GET filer.name, sum(size)

GROUPBY filer.name

HAVING sum(size) > 1073741824;

GROUPBY clauseThe GROUPBY clause is similar to the SQL GROUP BY clause. It enables you toaggregate the output rows into groups. Suppose that youwant to retrieve the sumof the sizes of all shares for each filer. You can write a query for this as shownbelow.

FROM msu

GET filer.name, sum(size)

GROUPBY filer.name;

DQL supports the following aggregation functions:

■ sum

■ count

■ max

■ min

35DataInsight Query Language (DQL)DQL Query Syntax

Symantec Proprietary and Confidential

Page 36: Symantec Data Insight SDK Guide

SORTBY clauseThe SORTBY clause is similar to the SQL ORDER BY clause. It enables you to sortof the rows of the output table based on their column values.

FROM msu

GET name, size

SORTBY size DESC;

If no sort order is specified, DQL defaults to ASC.

LIMIT clauseTheLIMIT clause is similar to the SQLLIMIT clause and is used to limit thenumberof output rows.

LIMIT count [This will retrieve the first "count" rows]

LIMIT offset, count [This will retrieve "count" rows starting from

"offset"]

offset values start from 1.

DQL functionsDQL supports the following built-in functions:

Converts string X to uppercase.upper(X)

Converts string X to lowercase.lower(X)

Returns length of string X.strlen(X)

Returns number of items in list X.length(X)

Returns true if Y is a substring of X. The comparison iscase-sensitive.

substr(X, Y)

Returns true if Y is a substring of X. The comparison iscase insensitive.

substri(X, Y)

DataInsight Query Language (DQL)DQL functions

36

Symantec Proprietary and Confidential

Page 37: Symantec Data Insight SDK Guide

Returns true if Xmatches the regular expressionpatternP. Regular expression matching is case-sensitive.PatternP canbe specified as Patternsmatching a singlecharacter or Patterns matching multiple characters.

You can refer to the following URLs for information onpattern matching:

■ http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13_01

■ http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13_02

match(X, P)

Returns true if Xmatches the regular expressionpatternP. Regular expression matching is case insensitive.PatternP canbe specified as Patternsmatching a singlecharacter or Patterns matching multiple characters.

You can refer to the following URLs for information onpattern matching:

■ http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13_01

■ http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13_02

matchi(X, P)

Returns time in epoch for the string date D. The formatin which date D is specified is indicated by the formatstring F. The options for F are:

YYYY – 4 digit year

MM - month of year (01 – 12)

DD - date of month

HH - hour (00 – 24)

mm - minutes (00 – 59)

ss – seconds (00 – 59)

Z – timezone

Example: datetime(“2012-01-10 -0800”, “YYYY-MM-DDZ”)

datetime(D, F)

Converts time T in epoch to a string whose format isspecified with string F. The options for F are the sameas those used by datetime(D, F).

formatdate(T, F)

37DataInsight Query Language (DQL)DQL functions

Symantec Proprietary and Confidential

Page 38: Symantec Data Insight SDK Guide

Example DQL queries■ Get the name, size, active data size, percentage of data size that is active,

openness, and number of active users for each share

FROM msu

GET name, size, active_data_size,

(active_data_size*100/size) AS active_data_percent,

isopen, active_user_count;

■ Get the activity for all paths of share, share1, on March 4, 2012 between 9:00A.M. and 5:00 P.M..

FROM activity

GET path.name, user.name, operation,

formatdate(timestamp, "YYYY/MM/DD HH:mm")

IF path.msu.name = "share1" AND

timestamp >= datetime("2012/03/04 09:00", "YYYY/MM/DD HH:mm")

AND timestamp <= datetime("2012/03/04 17:00",

"YYYY/MM/DD HH:mm");

Since the timestamp columnof activity is epoch, convert it to a readable formatusing formatdate().

■ Get a list of all sensitive files from all shares of filer, filer1, sorted by size.

FROM path

GET name, issensitive, size

IF issensitive = 1 AND type = "FILE" AND device.name = "filer1"

SORTBY size DESC;

■ Get a list of all open paths and the reason why they are marked as open.

FROM path

GET name, msu.name, isopen, open_reasons

IF isopen = 1

FORMAT open_reasons AS CSV;

■ Get a list of all open paths and the reason why they are marked as open. Also,list the permissions on each open path.

DataInsight Query Language (DQL)Example DQL queries

38

Symantec Proprietary and Confidential

Page 39: Symantec Data Insight SDK Guide

FROM path

GET name, msu.name, isopen, open_reasons,

permissions.user_trustee.name, permissions.group_trustee.name,

permissions.readable_permission, permissions.isinherited,

permissions.inheriting_path.name

IF isopen = 1

FORMAT permissions AS TABLE permissions

AND open_reasons AS CSV;

■ Get a list of all users, their e-mail and department (custom attributes) and thegroups that they belong to.

FROM user

GET name, sid, login, domain, "E-mail", department,

memberof.sid, memberof.name

FROM memberof AS table memberof_groups;

■ Get a list of all directories and their owners.

FROM path

GET name, msu.name, owner.user.name, owner.method,

owner.read_count, owner.write_count

IF type = "DIR"

USING owner AS calc_owner("2012-01-01", "2012-06-01",

"YYYY-MM-DD","rw_count, last_modifier");

■ Get a list of all open paths and their inactive users.

FROM path

GET name, msu.name, isopen, inactive_users.name

IF isopen = 1

USING inactive_users AS calc_inactive_users("2012-01-01",

"2012-06-01","YYYY-MM-DD");

■ For each share, get the count of paths that have permissions set on Everyone

FROM permissions

GET msu.name, count(path.id) AS risk_path_count

IF object_type = "DIR" AND group_trustee.name = "Everyone"

39DataInsight Query Language (DQL)Example DQL queries

Symantec Proprietary and Confidential

Page 40: Symantec Data Insight SDK Guide

AND isinherited = 0

GROUPBY msu.name

SORTBY risk_path_count DESC;

The condition isinherited = 0 ensures that we only get the paths that havepermissions explicitly defined on Everyone and not populate all paths thatsimply inherit those permissions.

DataInsight Query Language (DQL)Example DQL queries

40

Symantec Proprietary and Confidential

Page 41: Symantec Data Insight SDK Guide

Web API Specification

This chapter includes the following topics:

■ Web API specification for generic Collector service

Web API specification for generic Collector serviceThe web API for the Data Insight generic collector allows web clients to pushevents for the generic device filers configured in the Data Insight deployment. Italso provides a method to add shares for the configured filers.

The web client communicates with the Data Insight Collector node using HTTPSrequests. The HTTPS communication is based on one-way SSL authentication.The HTTP server runs with its unique self-signed SSL certificate. The SSLcertificate is created on the server when DataInsightGenericCollector service isconfigured on it. The authentication is complete when the Data Insight Collectornode verifies the identity of the web client.

Data Insight Collector node uses the following mechanism to communicate withthe web client:

1. The Data Insight server identifies the client using a login API request.

2. On successful log in, the Data Insight server returns an authentication tokenas the response. The same token is inserted into an HTTP cookie calledMATRIX_AUTH which is valid for 30 minutes. If the log in attempt isunsuccessful, an HTTP response code 401 is returned.

3. Youmust include the authentication token in each subsequent request to theData Insight server either in anHTTP request header calledMATRIX_AUTH,or in a cookie with the same name, or as an HTTP request input parameterwith the same name.

4. Each tokenhas an inactivity timeout interval of 30minutes. The token expiresif the client does not send a request for 30 minutes. In case the Data Insight

3Chapter

Symantec Proprietary and Confidential

Page 42: Symantec Data Insight SDK Guide

server restarts, the client must obtain the authentication token by using thelogin API. Data Insight uses the standard HTTP status code 401 to conveythat login is required. Data Insight returns the HTTP status code 401(Unauthorized), if the client does not have the correct privileges.

5. The user principal against which log in is performed can be any valid DataInsight user with the Server Administrator role.

All URLs referenced in the documentation have the following base:

https://<hostname> :<port> /api

where <port> is the port number for DataInsightGenericCollector service. Thedefault value for port is 8585, and the port number is configurable through DataInsight Management Console.

Use the following request calls to push events to the Data Insight Collector nodeand to add the shares that you want Data Insight to monitor:

1. Login

POST /api?function=LOGIN

Request parameters

CommentDescriptionName

Data Insight user nameusername

The domain to which theuser belongs

domain

The user's passwordpassword

Optional format=jsonFormat of the responseoutput

format

Request body

Do not supply a request body for this method.

Response

Login Success

If format=json is specified, then the authentication token iswritten onHTTPresponse output in JSON format.

HTTP/1.1 200 OK

Content-Type: application/json

Web API SpecificationWeb API specification for generic Collector service

42

Symantec Proprietary and Confidential

Page 43: Symantec Data Insight SDK Guide

Status: 200 OK

{"auth_token":"A2360DD2D9BB7284EF8BEB40E8DBA63F"}

If no format is specified, the authentication token iswritten onHTTP responseoutput.

If login fails, HTTP status code 401 (Unauthorized) is returned.

2. Upload Events

POST /api?function=COLLECTOR&cmd=upload_events_sqlite&event_type=<type>

Request parameters

CommentDescriptionName

Authentication tokenMATRIX_AUTH

Optional (cifs|nfs)The type of events in thefile that is uploaded on theCollector.

event_type

Request body

The request can be anHTTPmulti-part request or the request body can havethe contents of the file.

Response

If the file upload is successful, returns a response with following structure:

HTTP/1.1 200 OK

Content-Type: application/json

Status: 200 OK

{"status_code":<code>,"status_msg":"<msg>"}

Status code 0 indicates success.

On failure, returns status code 500 (Internal Server error) in case of anunexpected error.

Details of the file to be uploaded

The events filemust be a SQLiteDB file that has a single table, named events.

Table schema

43Web API SpecificationWeb API specification for generic Collector service

Symantec Proprietary and Confidential

Page 44: Symantec Data Insight SDK Guide

DescriptionConstraintsTypeColumn name

Filer's address asadded to the DataInsightconfiguration.

NOT NULLTEXTfiler

An integerdescribing the eventoperation (Forexample,READ=3,WRITE=4) Pleaserefer to theProtobuf format fora complete set ofvalues.

NOT NULLINTEGERopcode

Username of theuser for CIFS(Optional). UID ofthe user in case ofan NFS event.

TEXTusername

Domain of the userfor CIFS (Optional).Blank for NFS.

TEXTdomainname

SID of the user forCIFS. Blank in caseof NFS.

TEXTsid

Path where theevent occurred.Refer to note belowfor format.

NOT NULLTEXTpathname

Applicable in caseofrename event.

TEXTrenamepath

Type of path.(FOLDER=1,FILE=2)

TEXTtype

IP address fromwhere the path wasaccessed (optional).

TEXTipaddr

Web API SpecificationWeb API specification for generic Collector service

44

Symantec Proprietary and Confidential

Page 45: Symantec Data Insight SDK Guide

DescriptionConstraintsTypeColumn name

Timestampof eventin seconds as UNIXepoch.

NOT NULLINTEGERtimestamp

CREATE TABLE events (filer TEXT NOT NULL, opcode INTEGER NOT NULL, username TEXT,

domainname TEXT, sid TEXT,

pathname TEXT NOT NULL, renamepath TEXT, type TEXT, ipaddr TEXT,

timestamp INTEGER NOT NULL);

Note: For CIFS events, the SID value is mandatory; user name and domainname are optional.

ForNFSevents, SID should beblank, usernameshould be theUID, anddomainname should be blank.

For CIFS events, pathname should be the UNC path.

For NFS events, the pathname should be the absolute path of the file or thefolder.

3. Push events in JSON or Google Protocol Buffers format

POST /api?function=COLLECTOR&cmd=push_events&input_format=<format>

Request parameters

CommentDescriptionName

Authentication tokenMATRIX_AUTH

json|protoThe format in which theevents are pushed to theCollector.

input_format

Request body

The request bodymust contain the events list in the specified format, GoogleProtocol Buffers or JSON.

Response

Returns a response with following structure:

HTTP/1.1 200 OK

45Web API SpecificationWeb API specification for generic Collector service

Symantec Proprietary and Confidential

Page 46: Symantec Data Insight SDK Guide

Content-Type: application/json

Status: 200 OK

{"status_code":<code>,"status_msg":"<msg>"}

Status code 0 indicates success.

On failure, returns status code400 (BAD_REQUEST) for incorrect input formatparameter.

Returns the status code 500 (Internal Server Error) in case of an unexpectederror.

Google Protocol Buffers format for pushing events to the Collector

message AuditEventsListMessage {

optional int64 device_id = 1;

optional string device_name = 2;

repeated CifsEventMessage cifs_events = 3;

repeated NfsEventMessage nfs_events = 4;

}

message CifsEventMessage {

required AccessType opcode = 1;

required string unc_path = 2;

optional string rename_path = 3;

required PathType path_type = 4;

optional string sid = 5;

optional string username = 6;

optional string domain = 7;

required uint64 timestamp_msec = 8;

optional string ip_address = 9;

}

message NfsEventMessage {

required AccessType opcode = 1;

required string path = 2;

optional string rename_path = 3;

required PathType path_type = 4;

required int64 uid = 5;

optional int64 gid = 6;

optional string domain = 7;

required uint64 timestamp_msec = 8;

optional string ip_address = 9;

}

Web API SpecificationWeb API specification for generic Collector service

46

Symantec Proprietary and Confidential

Page 47: Symantec Data Insight SDK Guide

enum PathType {

UNKNOWN_PATHTYPE = -1;

FOLDER = 1;

FILE = 2;

}

enum AccessType {

CREATE = 1;

DELETE = 2;

READ = 3;

WRITE = 4;

RENAME = 5;

MKDIR = 8;

RMDIR = 9;

RENAMEDIR = 10;

SECURITY = 18;

SYMLINK = 19;

LINK = 20;

READLINK = 21;

OPEN = 200000;

}

Note: device_name is the name of the filer as added in Data Insightconfiguration.

CIFS and NFS events for a filer can be pushed by a singleAuditEventsListMessage.

For CIFS event, SID ismandatory; user name, and domain name are optional.For NFS event, SID should be blank, UID is mandatory, and domain shouldbe blank.

LINK, SYMLINK, and READLINK are specific to NFS events only.

The AccessType parameter for events like permission change or ACL changeis SECURITY.

JSON format for pushing events to the Collector

{

"deviceId": <Number>,

"deviceName": <String>,

"cifsEvents": [

<CIFS Event>

],

47Web API SpecificationWeb API specification for generic Collector service

Symantec Proprietary and Confidential

Page 48: Symantec Data Insight SDK Guide

"nfsEvents": [

<NFS Event>

]

}

<CIFS Event>

{

"opcode": <String>,

"uncPath": <String>,

"renamePath": <String>,

"pathType": <String>,

"sid": <String>,

“username”: <String>,

“domain”: <String>,

"timestampMsec": <Number>,

"ipAddress": <String>

}

<NFS Event> -

{

"opcode": <String>,

"path": <String>,

"renamePath": <String>,

"pathType": <String>,

"uid": <Number>,

"gid": <Number>,

“domain”: <String>,

"timestampMsec": <Number>,

"ipAddress": <String>

}

Note: opcode and pathType fields can take only a specific set of values. Referto the protobuf enums for a description of values for each field; enumAccessType for the field opcode and enum PathType for the field pathType.

Example

{

"deviceId": 0,

"deviceName": "10.209.89.3",

"cifsEvents": [

{

Web API SpecificationWeb API specification for generic Collector service

48

Symantec Proprietary and Confidential

Page 49: Symantec Data Insight SDK Guide

"opcode": "RENAME",

"uncPath": "\\\\NAMEMCCIFS1\\TestShare\\DI3.0RU1\\Data1",

"renamePath": "\\\\NAMEMCCIFS1\\TestShare\\DI3.0RU1\\Data2 ",

"pathType": "FOLDER",

"sid": "S-1-5-21-617441397-4198358099-2716562547-104771",

"timestampMsec": 1340003837,

"ipAddress": "172.31.163.29"

},

{

"opcode": "CREATE",

"uncPath": "\\\\NAMEMCCIFS1\\TestShare\\DI3.0RU1\\New Folder",

"pathType": "FOLDER",

"sid": "S-1-5-21-617441397-4198358099-2716562547-104771",

"timestampMsec": 1340003847,

"ipAddress": "172.31.163.29"

}

],

"nfsEvents": [

{

"opcode": "MKDIR",

"path": "\/openldaphome\/DIRU1",

"pathType": "FOLDER",

"uid": 0,

"gid": 0,

"domain": "0",

"timestampMsec": 1339680545

}

]

}

4. Add shares

POST /api?function=COLLECTOR&cmd=add_shares&format=<format>

Request parameters

CommentDescriptionName

Authentication tokenMATRIX_AUTH

proto|jsonFormat of the responseoutput

format

Request body

49Web API SpecificationWeb API specification for generic Collector service

Symantec Proprietary and Confidential

Page 50: Symantec Data Insight SDK Guide

Supply JSON or Google Protocol Buffers formatted list of shares as input.

Response

On success, HTTP status code 200 is returned.

On failure to add shares, HTTP status code 500 (Internal server error) isreturned.

ProtoBuf format for adding shares

message SharesListMessage

{

optional int64 device_id = 1;

optional string device_name = 2;

repeated ShareMessage shares = 3;

}

message ShareMessage

{

enum ShareType

{

CIFS = 0;

NFS = 1;

}

optional string shareName = 1;

optional string sharePath = 2;

optional ShareType shareType = 3 [default = CIFS];

}

JSON format for adding shares

Shares list

{

"deviceId": {number},

"deviceName": {string},

"shares": [

]

}

JSON format for adding shares

{

"shareName": {string},

"sharePath": {string},

Web API SpecificationWeb API specification for generic Collector service

50

Symantec Proprietary and Confidential

Page 51: Symantec Data Insight SDK Guide

"shareType": {string}

}

Note: The shareType parameter accepts only specific set of values. For thepossible set of values, refer enum ShareType in the Protobuf definition.

Example

{

"deviceId": 0,

"deviceName": "10.209.111.193",

"shares": [

{

"shareName": "/openldaphome",

"sharePath": "/openldaphome",

"shareType": "NFS"

},

{

"shareName": "/nfstest",

"sharePath": "/nfstest",

"shareType": "NFS"

}

]

}

Note:Data Insight scans the shares that are added onlywhen the user enablesscanning and provides the Scanner credentials for the filer.

51Web API SpecificationWeb API specification for generic Collector service

Symantec Proprietary and Confidential

Page 52: Symantec Data Insight SDK Guide

Web API SpecificationWeb API specification for generic Collector service

52

Symantec Proprietary and Confidential

Page 53: Symantec Data Insight SDK Guide

Creating custom scripts forremediation actions

This chapter includes the following topics:

■ About custom scripts

About custom scriptsYou can use custom scripts to extend Data Insight functionality. You can use thecustom scripts to perform the following actions:

■ To create a remediation ticket.

■ To apply remediation actions based on Data Insight recommendations.

■ To define actions to manage data.

Data is supplied to the scripts via command line arguments. Arguments varybased on what the script is used for. The scripts can be created in the .exe, .bat,.pl, or .vbs formats.

Data Insighthandles customscripts differently dependingon the typeof operation.Following list shows how Data Insight handles various types of scripts:

■ Custom scripts to create a remediation request.Data Insight invokes the script by passing in two arguments:custom_script.pl file_name <path_to_file_with_recommendation>.

For example,ticketing.pl file_name

C:\DataInsight\data\workflow\tmp\PR_ticketing_1.txt

The second argument is full path to a text file containing the permissionrecommendations. Each line in the text file contains oneactionand the required

4Chapter

Symantec Proprietary and Confidential

Page 54: Symantec Data Insight SDK Guide

variables to perform that action. Lines are separated by a new line character.The script should read each line of the input file and open one or moreremediation tickets as needed. If script exits with a non-0 exit code, the actionis considered to have failed. Each line in the file is of the following format:OP:<OPCODE> PARAM:VALUE; PARAM:VALUE; ...

For example,OP:REMOVE_ACE USER:[email protected];

PATH:\\fileserver1\share1\path;

Refer to the next section for possible values for opcodes and their parameters.

■ Custom scripts to apply permission recommendations.You can specify custom scripts to directly commit changes toActiveDirectoryandCIFS file systems. Youneed to specify one script tomake changes toActiveDirectory and one script to make changes to CIFS permissions. Therecommendation is passed to the custom script as command line argumentswith following format: script.pl OP:<OPCODE> PARAM:VALUEPARAM:VALUE... Exact PARAMandVALUE depends on opcode being passed.If the script exits with non-0 code, Data Insight considers the operation tohave failed. For this release, Data Insight recommendations only consist ofremoving user or group ACE for paths, and removing user or members fromAD groups. More operations will be supported in future releases.For example,AD.pl OP:DEL_GROUP_MEMBER AD_USER:user@domain

TARGET_GROUP:group@domain;.

Data Insight will supply the following opcode and arguments for ActiveDirectory remediation:OP:DEL_GROUP_MEMBER AD_GROUP:<group@domain>|AD_USER:<user@domain>

TARGET_GROUP:<target_group@domain>

Data Insight will supply the following opcode and arguments for CIFSremediation:OP:REMOVE_ACE GROUP:<group@domain>|USER:<user@domain>

PATH:<unc_path>

■ Custom scripts to define specific tasks to manage data.Data Insight invokes the script directly passing the operation and variablesas a part of command line arguments. Path is themandatory argument passedto the script. Other parameters passed to the script depend onhow theCustomAction has been configured in the Management Console. Format of commandline arguments passed to the custom script is:script.pl path:<path> prop:val prop:val ....

For example,archive_files.pl path:\\filer\share\path.txt size:25KB

Creating custom scripts for remediation actionsAbout custom scripts

54

Symantec Proprietary and Confidential

Page 55: Symantec Data Insight SDK Guide

Data Insight supports the followingproperties that canbepassed to the customscripts:

FormatProperties

NNN KB|MB|GB. E.g. 34 KBsize

NNN KB|MB|GB. E.g. 34 KBsize_on_disk

user@domain. SID if user name cannot beresolved

created_by

milliseconds since Jan 1st 1970created_on

user@domain. SID if user name cannot beresolved

last_modified_by

milliseconds since Jan 1st 1970last_modified_on

user@domain. SID if user name cannot beresolved

last_accessed_by

milliseconds since Jan 1st 1970last_accessed_on

user@domain. SID if user name cannot beresolved

data_owner

user@domain. SID if user name cannot beresolved. Multiple custodians arecomma-separated

custodian

For detailed information about how to use custom scripts for data and permissionremediation, see the Symantec Data Insight Administrator's Guide.

55Creating custom scripts for remediation actionsAbout custom scripts

Symantec Proprietary and Confidential

Page 56: Symantec Data Insight SDK Guide

Creating custom scripts for remediation actionsAbout custom scripts

56

Symantec Proprietary and Confidential

Page 57: Symantec Data Insight SDK Guide

Data Inventory Reportschema

This chapter includes the following topics:

■ Data Inventory report schema

Data Inventory report schemaThe Data Inventory Report is used to extract information about paths from theData Insight index. Output of this report is a sqlite database, which can be usedfor post processing as needed. When configuring report of this type, you canchoose to have the output database copied to some external location where youplan to post process the output.

file_inventory tableIn this table, there is one row for each matching file that is found in the specifiedindex dbs.

CREATE TABLE file_inventory (

xid INTEGER,

sid TEXT,

user_id INTEGER,

owner_account TEXT,

displayname TEXT,

owner_method TEXT,

bu_name TEXT,

bu_owner TEXT,

filer TEXT,

5Chapter

Symantec Proprietary and Confidential

Page 58: Symantec Data Insight SDK Guide

share TEXT,

dfs_server TEXT,

dfs_share TEXT,

dfs_path TEXT,

fid INTEGER,

path TEXT,

msu_type INTEGER,

interval INTEGER,

sensitive INTEGER,

msu_id INTEGER,

read_count INTEGER,

write_count INTEGER,

file_size INTEGER,

atime INTEGER,

ctime INTEGER,

mtime INTEGER,

fs_sid TEXT);

■ The xid column can be ignored, and should always be 1.

■ The sid is typically the Windows SID of the calculated owner of the file.

■ The owner_method column indicates the ownermethod thatData Insight usedto calculate the owner.

■ The user_id is the foreign key into the fileuser table of the current version ofthe users.db stored in theDataInsight\Data\users folder. This is used for debugpurposes only.

■ The owner_account, displayname, bu_name and bu_owner columns are othercolumns from the fileuser table.

■ The filer, share, path, dfs_server, dfs_share and dfs_path columns combine togive the path to the file. The fid column is the foreign key into the fentry tableof the latest version of the index.db for this share. fid is used for debugpurposesonly.

■ Themsu_type is an integer value describing the type of share. There are fourpossible values:

■ 1 – CIFS

■ 2 – SharePoint

■ 3 – NFS

■ 8 – DFS

Data Inventory Report schemaData Inventory report schema

58

Symantec Proprietary and Confidential

Page 59: Symantec Data Insight SDK Guide

■ The interval column is the foreign key into the intervals table below, based onthe last access time of the file.

■ Themsu_id is the foreign key into themsu table of the latest version of theconfig.db stored in the DataInsight\Data\conf folder.

■ Read count and write count are the aggregate number of audit events of eachtime of events over the total time period specified for this run of the report.

■ File_size is the logical file size from the file system. Atime, ctime, andmtimeare the metadata for the file also pulled from the file system.

■ The fs_sid is the SID of the file system “owner” value from the file systemmetadata.

lob tableThis table consists of a list of distinct Lines of Businesses (LOBs). Other tables usethis table in a foreign key manner.

CREATE TABLE lob (

lob_id INTEGER PRIMARY KEY,

lob_name TEXT);

user_lob tableThis table gives the mapping from users to the associated LOBs.

CREATE TABLE user_lob (

user_id INTEGER PRIMARY KEY,

lob_id INTEGER);

user_totals tableThis table gives the total numbers of files, sensitive files etc. for each user. In thefinal output, themsu_id column is displayed as empty. The user_id is the foreignkey into the fileuser table of the current version of the users.db stored in theDataInsight\Data\users folder.

CREATE TABLE user_totals (

user_id INTEGER,

msu_id INTEGER,

total_files INTEGER,

59Data Inventory Report schemaData Inventory report schema

Symantec Proprietary and Confidential

Page 60: Symantec Data Insight SDK Guide

total_bytes INTEGER,

sensitive_files INTEGER,

sensitive_bytes INTEGER);

user_interval_totals tableThis table breaks out the information from theuser_totals table over each intervalspecified from the input database. The interval_id is a foreign key to the intervalstable.

CREATE TABLE user_interval_totals (

user_id INTEGER,

msu_id INTEGER,

interval_id INTEGER,

total_files INTEGER,

total_bytes INTEGER,

sensitive_files INTEGER,

sensitive_bytes INTEGER);

lob_totals tableBased on the mapping specified in the User_lob table, this table gives the totalnumbers for each LOBs. In the final output, themsu_id column will be empty.

CREATE TABLE lob_totals (

lob_id INTEGER PRIMARY KEY,

msu_id INTEGER,

total_files INTEGER,

total_bytes INTEGER,

sensitive_files INTEGER,

sensitive_bytes INTEGER);

lob_interval_totals tableThis table breaks out the information from the lob_totals table over each intervalspecified from the input database. The interval_id is a foreignkey into the intervalstable.

CREATE TABLE lob_interval_totals (

lob_id INTEGER,

msu_id INTEGER,

Data Inventory Report schemaData Inventory report schema

60

Symantec Proprietary and Confidential

Page 61: Symantec Data Insight SDK Guide

interval_id INTEGER,

total_files INTEGER,

total_bytes INTEGER,

sensitive_files INTEGER,

sensitive_bytes INTEGER);

intervals tableThis table gives the beginning and end of each interval as specified in the inputdatabase. The beginning and end times are specified as epoch numbers. Forexample, the time 0 would be Midnight at Jan 1, 1970, and each higher number isone second after that.

CREATE TABLE IF NOT EXISTS intervals(

interval INTEGER, ///< 4 => most recent

///< 0 => before interval

start INTEGER, ///< start month of interval

end INTEGER); ///< end month of interval

msu_info tableThis table copies the data from the Dashboard database to specify if the msu isopen. Themsu_id column is a foreign key to the table of the latest version of theconfig.db stored in the DataInsight\Data\conf folder.

CREATE TABLE msu_info (

msu_id INTEGER PRIMARY KEY,

is_open INTEGER);

dashboard_info tableThis table is similar to themsu_info table in that it copies information from thelatest version of the Dashboard database into the report output database. Theremay be a slight mismatch in the numbers here versus the totals from theuser_totals table. This difference happens due to the difference in the time atwhich each set of numbers are calculated.

CREATE TABLE dashboard_info (

msu_id INTEGER PRIMARY KEY,

dir_files INTEGER,

dir_sens_files INTEGER,

61Data Inventory Report schemaData Inventory report schema

Symantec Proprietary and Confidential

Page 62: Symantec Data Insight SDK Guide

dash_files INTEGER,

dash_sens_files INTEGER);

Report configuration parametersOne important setting for the Data Inventory report is the separate_dbsconfiguration setting. The separate_dbs setting forces the report to start a newdb file after the specified number of rows have been inserted into the detail table.The separate_dbs setting indicates how many rows should be inserted into thereport output database details section before the db is closed, renamed and a newdb is started. If the output file name specified to the report process isreport_output.db, then the separate_dbs parameter will create files namedreport_output.db.0, report_output.db.1, etc. every time the limit specified in thesetting is reached. The current db file beingwritten to is always report_output.db,and this file is where all of the summary data is written to.Whenmerge_rpt runs,it will no longer copy rows from the file_inventory table into the final output db.It will only copy rows from the user_totals, etc. tables, and then create thelob_totals, etc. tables. As in the log_level setting, report.separate_dbs is checkedfirst, and if not found, then the separate_dbs setting is checked.

You need to set this property for each Indexer node including the ManagementServer node. For example, if the ID of your Indexer node is 3, issue the followingcommands on your Management Server to set these properties for each node:

configdb -o -T node -k <nodeid> -J report.separate_dbs -j true

configdb -o -T node -k <nodeid> -J report.chunk_size -j 1000000

Data Inventory Report schemaData Inventory report schema

62

Symantec Proprietary and Confidential