product introductiondata sources, and big data etl processing. large-scale log analysis gaming...

33
Data Lake Insight Product Introduction Issue 01 Date 2020-08-06 HUAWEI TECHNOLOGIES CO., LTD.

Upload: others

Post on 16-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

Data Lake Insight

Product Introduction

Issue 01

Date 2020-08-06

HUAWEI TECHNOLOGIES CO., LTD.

Page 2: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

Copyright © Huawei Technologies Co., Ltd. 2020. All rights reserved.

No part of this document may be reproduced or transmitted in any form or by any means without priorwritten consent of Huawei Technologies Co., Ltd. Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.All other trademarks and trade names mentioned in this document are the property of their respectiveholders. NoticeThe purchased products, services and features are stipulated by the contract made between Huawei andthe customer. All or part of the products, services and features described in this document may not bewithin the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,information, and recommendations in this document are provided "AS IS" without warranties, guaranteesor representations of any kind, either express or implied.

The information in this document is subject to change without notice. Every effort has been made in thepreparation of this document to ensure accuracy of the contents, but all statements, information, andrecommendations in this document do not constitute a warranty of any kind, express or implied.

Huawei Technologies Co., Ltd.Address: Huawei Industrial Base

Bantian, LonggangShenzhen 518129People's Republic of China

Website: https://www.huawei.com

Email: [email protected]

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. i

Page 3: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

Contents

1 What Is DLI?..............................................................................................................................1

2 Advantages of Serverless DLI Compared with Self-Built Hadoop............................... 3

3 Application Scenarios............................................................................................................. 5

4 Basic Concepts........................................................................................................................13

5 Constraints and Limitations on Using DLI.......................................................................15

6 Permissions Management................................................................................................... 16

7 Related Services.....................................................................................................................21

8 Quotas......................................................................................................................................24

9 Billing....................................................................................................................................... 25

Data Lake InsightProduct Introduction Contents

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. ii

Page 4: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

1 What Is DLI?

Data Lake Insight (DLI) is a Serverless big data compute and analysis service thatis fully compatible with Apache Spark and Apache Flink ecosystems and supportsbatch streaming. With multi-model engines, enterprises can use SQL statementsor programs to easily complete batch processing, stream processing, in-memorycomputing, and machine learning of heterogeneous data sources.

Description

You can query and analyze heterogeneous data sources such as CloudTable, RDS,and DWS on the cloud using multiple access methods, such as visualized interface,RESTful API, JDBC, ODBC, and Beeline. The data format is compatible with fivemainstream data formats: CSV, JSON, Parquet, Carbon, and ORC.

● Three basic functions– You can use standard SQL statements to query in SQL jobs. For details,

see Data Lake Insight SQL Syntax Reference.– Flink jobs support Flink SQL online analysis. Aggregation functions such

as Window and Join, geographic functions, and CEP functions aresupported. SQL is used to express service logic, facilitating serviceimplementation. For details, see Data Lake Insight SQL SyntaxReference.

– For spark jobs, fully-managed Spark computing can be performed. Youcan submit computing tasks through interactive sessions or in batchmode to analyze data in the fully managed Spark queues. For details, seeData Lake Insight API Reference.

● Federated analysis of heterogeneous data sources– Spark datasource connection: Data sources such as CloudTable, DWS,

RDS, and CSS can be accessed through DLI. For details, see Data LakeInsight User Guide.

– Interconnection with multiple cloud services is supported in Flink jobs toform a rich stream ecosystem. The DLI stream ecosystem consists of thecloud service ecosystems and open source ecosystems.

▪ Cloud service ecosystem: DLI can interconnect with other services inFlink SQL. You can directly use SQL statements to read and write

Data Lake InsightProduct Introduction 1 What Is DLI?

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 1

Page 5: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

data from various cloud services, such as DIS, OBS, CloudTable, MRS,RDS, SMN, and DCS.

▪ Open-source ecosystems: After connections to other VPCs areestablished through datasource connections, you can access all datasources and output targets (such as Kafka, HBase, and Elasticsearch)supported by Flink and Spark in your exclusive DLI queue.

For details, see the Data Lake Insight Developer Guide.● BI tool

– Interconnects with Yonghong BI to implement data analysis. For details,see the Data Lake Insight Developer Guide.

– Interconnects with the Tableau Desktop to implement data analysis. Fordetails, see the Data Lake Insight Developer Guide.

● Support of geometry query. For details, see the Data Lake Insight DeveloperGuide.

Accessing DLIA web-based service management platform is provided. You can access DLI usingthe management console or HTTPS-based application programming interfaces(APIs), or connect to the DLI server through a client such as JDBC or ODBC.

● Access through the management consoleYou can submit SQL, Spark, or Flink jobs on the DLI management console. Ifyou have registered with the public cloud, log in to the management consoleand choose Service List > EI Enterprise Intelligence > Data Lake Insight.

● Using APIsIf you need to integrate DLI on the public cloud into a third-party system forsecondary development, use APIs to access the service. For details, see DataLake Insight API Reference.

● JDBC or ODBCDLI can use JDBC or ODBC to connect to the server for data query. For details,see the Data Lake Insight Developer Guide.

● BeelineJobs can be submitted using Beeline. For details, see the Data Lake InsightDeveloper Guide.

● Spark-submitJobs can be submitted using Spark-submit. For details, see the Data LakeInsight Developer Guide.

Data Lake InsightProduct Introduction 1 What Is DLI?

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 2

Page 6: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

2 Advantages of Serverless DLI Comparedwith Self-Built Hadoop

DLI seamlessly migrates offline Spark applications to the cloud. Leveraging theopen-source Apache Spark and Flink ecosystems and APIs, DLI makes migrationeasier. DLI provides a high-scalable framework integrating batch and streamprocessing, allowing you to handle data analysis requests in countless scenarioswith ease. With a deeply optimized kernel and architecture, DLI delivers 100-foldperformance improvement compared with the MapReduce model. Your analysis isbacked by an industry-vetted 99.95% SLA. The storage and compute separationarchitecture enables flexible configuration of storage and compute resources ondemand, improving resource utilization and reducing costs.

Compared with self-built Hadoop clusters, Serverless DLI has the followingadvantages:

Table 2-1 Advantages comparison

Advantage

Dimension

Data Lake Insight Self-built Hadoop system

Lowcost

Capitalcost

Billing is based on the actualamount of data scanned orused CUH. Saving up to 50%costs.

Long-term resourceoccupation, causing severeresource waste and highcosts

Elasticscalability

Container-based Kubernetes,intelligent elastic scaling

N/A

O&Mfree

O&Mcost

Out-of-the-box, Serverlessarchitecture

Strong technical capabilitiesare required forconfiguration and O&M

Highavailability

Cross-AZ DR N/A

Data Lake InsightProduct Introduction

2 Advantages of Serverless DLI Compared with Self-Built Hadoop

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 3

Page 7: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

Advantage

Dimension

Data Lake Insight Self-built Hadoop system

Easytouse

Learningcost

Low. The optimizationparameters are standardizedbased on 10 years' experiencein thousands of projects. Inaddition, DLI provides a GUIfor intelligent optimization.

High. Hundreds of tuningparameters need to belearned.

Supported datasources

● Cloud: OBS, RDS, DWS, CSS,MongoDB, and Redis

● On-premises: self-builtdatabases, MongoDB, andRedis

● Cloud: OBS● On-premises: HDFS

Ecosystemcompatibility

DLV, Tableau, Superset,Yonghong BI, and Fanruan BI

Big data ecosystem tool

Customimage

Supported. Dependencies canbe added as required to meetservice diversity requirements.

Not supported.

Workflowscheduling

Scheduling through Data LakeFactory (DLF) in DAYU

Self-built scheduling tools,such as Airflow

Multipleenterprise-leveltenants

Table-based permissionmanagement, providingcolumn level permissiongranularity.

File-based permissionmanagement

Highperformance

Performance

Higher performance with in-depth software and hardwareoptimization

Performance is the same asthat of Hadoop open-source versions

Data Lake InsightProduct Introduction

2 Advantages of Serverless DLI Compared with Self-Built Hadoop

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 4

Page 8: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

3 Application Scenarios

DLI is applicable to large-scale log analysis, federated analysis of heterogeneousdata sources, and big data ETL processing.

Large-scale Log Analysis● Gaming operation data analysis

Different departments of a game company analyze daily new logs via thegame data analysis platform to obtain required metrics and make decisionaccording to the obtained metric data. For example, the operationdepartment obtains required metric data, such as new players, active players,retention rate, churn rate, and payment rate, through the platform to learnthe current game status and determine follow-up actions. The placementdepartment obtains the channel sources of new players and active playersthrough the platform to determine the platforms for placement in the nextcycle.

● Advantages– Efficient Spark programming model: DLI uses Spark Streaming to directly

ingest data from DIS and perform preprocessing such as data cleaning.You only need to edit the processing logic, without the need to payattention to the multi-thread model.

– Easy to use: You can use standard SQL statements to compile metricanalysis logic without paying attention to the complex distributedcomputing platform.

– Pay-per-use: Log analysis is scheduled periodically based on the time-critical requirements. There is a long idle period between each twoscheduling operations. DLI adopts the pay-per-use billing mode, whichsaves the cost by more than 50% compared with the exclusive queuemode. DLI only bills you for the resources used for scheduling.

● The following services are recommended in this scenario:OBS, DIS, DWS, RDS

Data Lake InsightProduct Introduction 3 Application Scenarios

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 5

Page 9: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

Figure 3-1 Gaming operation data analysis

Federated Analysis of Heterogeneous Data Sources● Digital service transformation for car company

Data Lake InsightProduct Introduction 3 Application Scenarios

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 6

Page 10: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

In the face of new competition pressures and changes in travel services, carcompanies build the IoV cloud platform and IVI OS to streamline Internetapplications and vehicle use scenarios, completing digital servicetransformation for car companies. This delivers better travel experience forvehicle owners, increases the competitiveness of car companies, and promotessales growth. For example, DLI can be used to collect and analyze dailyvehicle metric data (such as batteries, engines, tire pressure, and airbags), andgive feedback on maintenance suggestions to vehicle owners in time.

● Advantages– No need for migration in multi-source data analysis: RDS stores the basic

information about vehicles and vehicle owners, CloudTable stores real-time vehicle location and health status information, and DWS storesperiodic metric statistics. DLI allows federated analysis on data frommultiple sources without data migration.

– Tiered data storage: Car companies need to retain all historical data tosupport auditing and other services that requiring infrequent data access.Warm and cold data is stored in OBS and frequently accessed data isstored in CloudTable and DWS, reducing the overall storage cost.

– Rapid and agile alarm triggering: There are no special requirements forthe CPU, memory, hard disk space, and bandwidth.

● The following services are recommended in this scenario:DIS, CDM, OBS, DWS, RDS, and CloudTable

Data Lake InsightProduct Introduction 3 Application Scenarios

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 7

Page 11: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

Figure 3-2 Digital service transformation for car company

Big Data ETL● Carrier big data analysis

Carriers typically require petabytes, or even exabytes of data storage, for bothstructured (base station details) and unstructured (messages andcommunications) data. They need to be able to access the data with

Data Lake InsightProduct Introduction 3 Application Scenarios

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 8

Page 12: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

extremely low data latency. Extracting value from this data efficiently is amajor challenge. DLI provides multi-mode engines such as batch processingand stream processing to break down data silos and perform unified dataanalysis.

● Advantages– Big Data ETL: You can enjoy TB to EB-level data governance capabilities

to quickly perform ETL processing on massive carrier data. Distributeddatasets are provided for batch processing.

– High Throughput, Low Latency: DLI uses the Dataflow model of ApacheFlink, a real-time computing framework. High-performance computingresources are provided to consume data from your created Kafka, DMSKafka, and MRS Kafka clusters. A single CU processes 1,000 to 20,000messages per second.

– Fine-grained Permissions Management: Your company may havenumerous departments, where data needs to be shared and isolated.Using DLI, you can apply for resource queues by tenant to isolatecomputing resources (CPUs and memory), ensuring job SLA. DLI supportstable- or column-level data permission control, allowing for secure accessfor different departments.

● The following services are recommended in this scenario:OBS, DIS, and DAYU

Data Lake InsightProduct Introduction 3 Application Scenarios

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 9

Page 13: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

Figure 3-3 Carrier big data analysis

Geographic Big Data Analysis● Geographic big data analysis

Geographic big data usually has a large data volume. For example, globalsatellite remote sensing images might take up to petabytes of data. Besides,there are various types of data, including structured remote sensing imagegrid data, vector data, unstructured spatial location data, and 3D modelingdata. For this scenario, efficient mining tools or methods are essential.

● Advantages– Spatial Data Analysis Operators: With full-stack Spark capabilities and

rich Spark spatial data analysis Spatial Data Analysis Operators With full-stack Spark capabilities and rich Spark spatial data analysis algorithmoperators, DLI delivers comprehensive support for real-time processing ofdynamic streaming data with location attributes and offline batchprocessing. DLI can handle massive data, including structured remotesensing image data, unstructured 3D modeling, and laser point clouddata.

Data Lake InsightProduct Introduction 3 Application Scenarios

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 10

Page 14: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

– CEP SQL: DLI delivers geographical location analysis functions to analyzegeospatial data in real time. You can fulfill yaw detection and geo-fencing through SQL statements.

– Big Data Processing: DLI allows you to quickly migrate remote sensingimage data at the TB to EB scale to the cloud and perform image dataslicing to offer resilient distributed datasets (RDDs) for distributed batchcomputing.

● The following services are recommended in this scenario:DIS, CDM, DES, OBS, RDS, and CloudTable

Data Lake InsightProduct Introduction 3 Application Scenarios

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 11

Page 15: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

Figure 3-4 Geographic big data analysis

Data Lake InsightProduct Introduction 3 Application Scenarios

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 12

Page 16: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

4 Basic Concepts

Tenant

DLI allows multiple organizations, departments, or applications to share resources.A logical entity, also called a tenant, is provided to use diverse resources andservices. A mode involving different tenants is called multi-tenant mode. A tenantcorresponds to a company. Multiple sub-users can be created under a tenant andare assigned different permissions.

Project

A project is a collection of resources accessible to services. In a region, an accountcan create multiple projects and assign different permissions to different projects.Resources used for different projects are isolated from one another. A project caneither be a department or a project team.

Database

The basic concept and usage of databases in DLI are similar to those of the Oracledatabase. The database is the basic unit of DLI management permissions, whichare granted on a per database basis.

In DLI, tables and databases are metadata containers that define underlying data.The metadata in the table shows the location of the data and specifies the datastructure, such as the column name, data type, and table name. A database is acollection of tables.

Metadata

Metadata is used to define data types. It describes information about the data,including the source, size, format, and other data features. In database fields,metadata interprets data content in the data warehouse.

Computing Resource

Queues in DLI are computing resources, which are the basis for using DLI. SQLjobs and Spark jobs performed by users require computing resources.

Data Lake InsightProduct Introduction 4 Basic Concepts

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 13

Page 17: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

Storage ResourceStorage resources in DLI are used to store data of databases and DLI tables. Toimport data to DLI, storage resources must be prepared. The storage resourcesreflect the volume of data you are allowed to store in DLI.

SQL JobSQL job refers to the SQL statement executed in the SQL job editor. It serves asthe execution entity used for performing operations, such as importing andexporting data, in the SQL job editor.

Spark JobSpark jobs are those submitted by users through visualized interfaces and RESTfulAPIs. Full-stack Spark jobs are allowed, such as Spark Core, Dataset, Streaming,MLlib, and GraphX jobs.

CUCompute unit (CU) is the pricing unit of queues. 1 CU consists of 1 vCPU and 4 GBmemory. The computing capabilities of queues vary with queue specifications. Thehigher the specifications, the stronger the computing capability.

OBS Table, DLI Table, and CloudTable TableThe table type indicates the storage location of data.

● OBS table indicates that data is stored in the OBS bucket.● DLI table indicates that data is stored in the internal table of DLI.● CloudTable table indicates that data is stored in CloudTable.

You can create a table on DLI and associate the table with other services toachieve querying data from multiple data sources.

Data Lake InsightProduct Introduction 4 Basic Concepts

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 14

Page 18: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

5 Constraints and Limitations on UsingDLI

When using DLI, pay attention to the following restrictions:

● Recommended browsers for logging in to DLI:– Google Chrome 43.0 or later– Mozilla Firefox 38.0 or later– Internet Explorer 9.0 or later

● Restrictions on the SQL syntax:– You are not allowed to specify a storage path when creating a DLI table

using SQL statements.● Restrictions on the SQL statement length:

– Each SQL statement should contain less than 500,000 characters.– The size of each SQL statement must be less than 1 MB.

Data Lake InsightProduct Introduction 5 Constraints and Limitations on Using DLI

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 15

Page 19: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

6 Permissions Management

If you need to assign different permissions to employees in your enterprise toaccess your DLI resources, IAM is a good choice for fine-grained permissionsmanagement. IAM provides identity authentication, permissions management,and access control, helping you securely access your HUAWEI CLOUD resources.

With IAM, you can use your HUAWEI CLOUD account to create IAM users for youremployees, and assign permissions to the users to control their access to specificresource types. For example, some software developers in your enterprise need touse DLI resources but must not delete them or perform any high-risk operations.To achieve this result, you can create IAM users for the software developers andgrant them only the permissions required for using DLI resources.

If the HUAWEI CLOUD account has met your requirements, you do not need tocreate an independent IAM user for permission management. Then you can skipthis section. This will not affect other functions of DLI.

IAM can be used free of charge. You pay only for the resources in your account.For more information about IAM, see the IAM Service Overview.

DLI PermissionsBy default, new IAM users do not have permissions assigned. You need to add theusers to one or more groups, and attach permissions policies or roles to thesegroups. The users then inherit permissions from the groups to which they areadded. After authorization, the users can perform specified operations on DLIbased on the permissions.

DLI is a project-level service deployed and accessed in specific physical regions. Toassign DLI permissions to a user group, specify the scope as region-specificprojects and select projects for the permissions to take effect. If All projects isselected, the permissions will take effect for the user group in all region-specificprojects. When accessing DLI, the users need to switch to a region where theyhave been authorized to use cloud services.

Type: There are roles and policies.● Roles: A type of coarse-grained authorization mechanism that defines

permissions related to user responsibilities. This mechanism provides only alimited number of service-level roles for authorization. When using roles togrant permissions, you need to also assign other roles on which the

Data Lake InsightProduct Introduction 6 Permissions Management

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 16

Page 20: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

permissions depend to take effect. However, roles are not an ideal choice forfine-grained authorization and secure access control.

● Policies: A type of fine-grained authorization mechanism that definespermissions required to perform operations on specific cloud resources undercertain conditions. This mechanism allows for more flexible policy-basedauthorization, meeting requirements for secure access control. For example,you can grant DLI users only the permissions for managing a certain type ofcloud servers. For the API actions supported by DLI, see Permissions Policiesand Supported Actions.

Table 6-1 DLI system permissions

Role/PolicyName

Description PolicyType

TenantAdministrator

Tenant administrator● Job execution permissions for DLI resources.

After a database or a queue is created, the usercan use the ACL to assign rights to other users.

● Function scope: Project-level service.

System-definedrole

DLI ServiceAdmin

DLI administrator.● Job execution permissions for DLI resources.

After a database or a queue is created, the usercan use the ACL to assign rights to other users.

● Function scope: Project-level service.

System-definedrole

DLI ServiceUser

Common user of DLI.● Common user permissions for DLI. Users granted

these permissions cannot create databases orqueues. Other operations can be used only afterbeing assigned by the administrator.

● Function scope: Project-level service.

System-definedrole

Table 6-2 lists the common SQL operations supported by each system policy ofDLI. Choose proper system policies according to this table.

Table 6-2 Common operations supported by each system policy

Resources

Operation Description TenantAdministrator

DLIServiceAdmin

DLIServiceUser

Queue

DROP_QUEUE

Deleting aqueue

√ √ ×

SUBMIT_JOB Submitting thejob

√ √ ×

Data Lake InsightProduct Introduction 6 Permissions Management

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 17

Page 21: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

Resources

Operation Description TenantAdministrator

DLIServiceAdmin

DLIServiceUser

CANCEL_JOB Terminating thejob

√ √ ×

GRANT_PRIVILEGE

Grantingpermissions tothe queue

√ √ ×

REVOKE_PRIVILEGE

Revokingpermissionsfrom the queue

√ √ ×

SHOW_PRIVILEGES

Viewing thequeuepermissions ofother users

√ √ ×

Database

DROP_DATABASE

Deleting adatabase

√ √ ×

CREATE_TABLE

Creating a table √ √ ×

CREATE_VIEW

Creating a view √ √ ×

EXPLAIN Explaining theSQL statementas an executionplan

√ √ ×

CREATE_ROLE

Creating a role √ √ ×

DROP_ROLE Deleting a role √ √ ×

SHOW_ROLES

Displaying arole

√ √ ×

GRANT_ROLE Binding a role √ √ ×

REVOKE_ROLE

Unbinding therole

√ √ ×

SHOW_USERS

Displaying thebindingrelationshipsbetween allroles and users

√ √ ×

GRANT_PRIVILEGE

Grantingpermissions tothe database

√ √ ×

Data Lake InsightProduct Introduction 6 Permissions Management

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 18

Page 22: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

Resources

Operation Description TenantAdministrator

DLIServiceAdmin

DLIServiceUser

REVOKE_PRIVILEGE

Revokingpermissions tothe database

√ √ ×

SHOW_PRIVILEGES

Viewingdatabasepermissions ofother users

√ √ ×

DISPLAY_ALL_TABLES

Displaying tableinformation inthe database

√ √ ×

DISPLAY_DATABASE

Displayingdatabaseinformation

√ √ ×

CREATE_FUNCTION

Creating afunction

√ √ ×

DROP_FUNCTION

Deleting afunction

√ √ ×

SHOW_FUNCTIONS

Displaying allfunctions

√ √ ×

DESCRIBE_FUNCTION

Displayingfunction details

√ √ ×

Table

DROP_TABLE Deleting a table √ √ ×

SELECT Querying atable

√ √ ×

INSERT_INTO_TABLE

Inserting √ √ ×

ALTER_TABLE_ADD_COLUMNS

Adding acolumn

√ √ ×

INSERT_OVERWRITE_TABLE

Rewriting √ √ ×

ALTER_TABLE_RENAME

Renaming atable

√ √ ×

ALTER_TABLE_ADD_PARTITION

Addingpartitions to thepartition table

√ √ ×

Data Lake InsightProduct Introduction 6 Permissions Management

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 19

Page 23: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

Resources

Operation Description TenantAdministrator

DLIServiceAdmin

DLIServiceUser

ALTER_TABLE_RENAME_PARTITION

Renaming atable partition

√ √ ×

ALTER_TABLE_DROP_PARTITION

Deletingpartitions froma partition table

√ √ ×

SHOW_PARTITIONS

Displaying allpartitions

√ √ ×

ALTER_TABLE_RECOVER_PARTITION

Restoring tablepartitions

√ √ ×

ALTER_TABLE_SET_LOCATION

Setting thepartition path

√ √ ×

GRANT_PRIVILEGE

Grantingpermissions tothe table

√ √ ×

REVOKE_PRIVILEGE

Revokingpermissionsfrom the table

√ √ ×

SHOW_PRIVILEGES

Viewing tablepermissions ofother users

√ √ ×

DISPLAY_TABLE

Displaying atable

√ √ ×

DESCRIBE_TABLE

Displaying tableinformation

√ √ ×

Data Lake InsightProduct Introduction 6 Permissions Management

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 20

Page 24: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

7 Related Services

OBS

OBS works as the data source and data storage system for DLI, and delivers thefollowing capabilities:

● Data source: DLI provides an API for you to import data from correspondingOBS paths to DLI tables.

● Data storage: You can create OBS tables on DLI. However, such tables onlystore metadata while data content is stored in corresponding OBS paths.

● Data backup: DLI provides an API for you to export the data in DLI to OBS forbackup.

● Query result storage: DLI provides an API for you to save routine query resultdata on OBS.

IAM

IAM authenticates access to DLI.

CTS

Cloud Trace Service (CTS) audits performed DLI operations.

Cloud Eye

Cloud Eye helps monitor job metrics for DLI, delivering status information in aconcise and efficient manner.

SMN

Simple Message Notification (SMN) can send notifications to users when a jobrunning exception occurs on DLI.

CDM

CDM migrates OBS data to DLI.

Data Lake InsightProduct Introduction 7 Related Services

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 21

Page 25: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

DISDIS imports data to DLI through streams.

CloudTableCloudTable works as the data source and data storage system for DLI, and deliversthe following capabilities:

● Data source: DLI allows you to import CloudTable data using DataFrame orSQL.

● Query result storage: DLI uses the SQL INSERT syntax to store query resultdata to CloudTable tables.

RDSRelational Database Service (RDS) works as the data source and data storagesystem for DLI, and delivers the following capabilities:● Data source: DLI allows you to import RDS data using DataFrame or SQL.● Query result storage: DLI uses the SQL INSERT syntax to store query result

data to RDS tables.

DWSData Warehouse Service (DWS) works as the data source and data storage systemfor DLI, and delivers the following capabilities:● Data source: DLI allows you to import DWS data using DataFrame or SQL.● Query result storage: DLI uses the SQL INSERT syntax to store query result

data to DWS tables.

CSSCSS works as the data source and data storage system for DLI, and delivers thefollowing capabilities:

● Data source: DLI allows you to import CSS data using DataFrame or SQL.● Query result storage: DLI uses the SQL INSERT syntax to store query result

data to CSS tables.

DCSDistributed Cache Service (DCS) works as the data source and data storage systemfor DLI, and delivers the following capabilities:

● Data source: DLI allows you to import DCS data using DataFrame or SQL.● Query result storage: DLI uses the SQL INSERT syntax to store query result

data to DCS tables.

DDSDocument Database Service (DDS) works as the data source and data storagesystem for DLI, and delivers the following capabilities:

Data Lake InsightProduct Introduction 7 Related Services

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 22

Page 26: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

● Data source: DLI allows you to import DCS data using DataFrame or SQL.● Query result storage: DLI uses the SQL INSERT syntax to store query result

data to DCS tables.

Data Lake InsightProduct Introduction 7 Related Services

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 23

Page 27: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

8 Quotas

Quotas are enforced for service resources on the platform to prevent unforeseenspikes in resource usage. Quotas can limit the number or amount of resourcesavailable to users,

If the existing resource quota cannot meet your service requirements, you canapply for a higher quota.

For details about how to view and increase quotas, see Quotas.

Data Lake InsightProduct Introduction 8 Quotas

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 24

Page 28: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

9 Billing

Billing ItemsThe billing of DLI includes storage billing and compute billing. For details aboutthe pricing details and examples of DLI, see Product Pricing Details. You can usethe DLI price calculator to quickly calculate an estimate price of a queue with yourdesired specifications.

Table 9-1 DLI billing items

Category

Billing Items Description

Billingforstorageresources

Data volumestored in theDLI table

● Storage uses are billed based on the amount of datastored in DLI (unit: GB).

● When estimating the storage cost, note that DLI willgenerally compress the original file size to 1/5. Thestorage bill is based on the size of the compressedfile.

● If data is stored on OBS, then any usage of storageresources will be billed by OBS, instead of DLI.

Billingforcomputeresources

CUH ● The bill is based on the compute unit (CU) used perhour.

● Compute unit (CU) is the pricing unit of queues. 1CU consists of 1 vCPU and 4 GB memory. Thecomputing capabilities of queues vary with queuespecifications. The higher the specifications, thestronger the computing capability.

● Usage is billed by the hour. For example, 58 minutesof usage will be rounded to the hour and billed.

● If you submit jobs in a pay-per-use queue you built,the bill is based on the CUH used.

Data Lake InsightProduct Introduction 9 Billing

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 25

Page 29: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

Category

Billing Items Description

Volume ofdata scanned

● You are billed based on the amount of data scannedby each job (unit: GB).

● If you submit jobs in the default queue, the bill isbased on the amount of data scanned.

NO TE

● Billing by CUH and billing by data volume scanned are mutually exclusive. You canselect either of them as required. It is recommended that you choose the CUH billingmode to ensure clear cost accounting.

● Dedicated queues are in exclusive resource mode. You can use them when creatingenhanced datasource connection queues.

Billing ModeWith DLI, you can submit SQL, Flink, and Spark jobs.

Figure 9-1 Billing mode

● The billing of SQL jobs includes the storage billing and computing billing. Forpricing details and examples, see Product Pricing Details.

Data Lake InsightProduct Introduction 9 Billing

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 26

Page 30: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

Table 9-2 Billing policy of SQL jobs

Category

Billing Mode

Billingforstorageresources

● If data is stored in DLI, the storage use is billed based on theamount of data stored in the DLI table.

● If data is stored on OBS, then any usage of storage resourceswill be billed by OBS, instead of DLI.

● The billing mode varies according to whether the usersubscribes to the storage package.

Billingforcomputeresources

Billed based on the CUH. 1 CU consists of 1 vCPU and 4 GBmemory.● Usage is billed by the hour. For example, 58 minutes of usage

will be rounded to the hour and billed.● If you submit jobs in a queue you built, the bill is based on the

CUH used.● The billing mode varies according to whether the user

subscribes to the CUH package.

Billed based on the amount of data scanned.● If you submit jobs in the default queue, the bill is based on the

amount of data scanned.● The billing mode varies depending on whether you have

purchased package billed by scanned data.

● Data of Spark jobs is stored on OBS. For DLI, only billing for compute

resources is supported. For pricing details and examples, see Product PricingDetails.

Table 9-3 Spark job billing policy

Category

Billing Mode

Billingforcomputeresources

Billed based on the CUH. 1 CU consists of 1 vCPU and 4 GBmemory.● Usage is billed by the hour. For example, 58 minutes of usage

will be rounded to the hour and billed.● The billing mode varies according to whether the user

subscribes to the CUH package.

● Data of Flink jobs is stored on OBS. For DLI, only billing for compute resources

is supported. For pricing details and examples, see Product Pricing Details.

Data Lake InsightProduct Introduction 9 Billing

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 27

Page 31: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

Table 9-4 Flink job billing policy

Category

Billing Mode

Billingforcomputeresources

Billed based on the CUH. 1 CU consists of 1 vCPU and 4 GBmemory.● Usage is billed by the hour. For example, 58 minutes of usage

will be rounded to the hour and billed.● The billing mode varies according to whether the user

subscribes to the CUH package.

For details about the pricing details and examples of DLI, see Product PricingDetails. You can use the DLI price calculator to quickly calculate an estimate priceof a queue with your desired specifications.

Generally, you are advised to create projects based on different service attributes.● Development project: Jobs in this project are mainly used for debugging

purposes. The operations can be random with small amount of data used. It isrecommended that you use the CUH billing mode to effectively control costs.

● Production project: Jobs in this project have already been debugged and putonline. The operations are relatively stable, so you can buy the CUH packagesto avoid idle resources.

Purchase DescriptionYou need to purchase queues before creating SQL, Spark, or Flink jobs.

To purchase queues, you can use either of the following ways:

● On the Overview page, click .

● In the navigation tree on the left of the SQL Editor page, click the tab

and click on the right.● In the upper right corner of the Queue Management page, click

.

The procedure for purchasing a queue is as follows:

1. Configure: Specify or select related parameters. For details about theparameters, see Creating a Queue in Data Lake Insight User Guide.

2. Confirm: Check whether the resource type configuration is correct.3. Payment: Click Next to purchase queues.

Data Lake InsightProduct Introduction 9 Billing

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 28

Page 32: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

NO TE

● To purchase a package, click buy a DLI package on the Buy Queue page. Alternatively,in the upper right corner of the Queue Management page, click Buy DLI Package.

The packages include:

● Scanned data volume

● CUH

● Storage

● After purchasing a package, you can choose Billing Center > My Packages on themanagement console to view the package.

Queue Specifications Modification

After creating a queue with required specifications, you can run jobs in the queue.During the running process, you cannot change the queue specifications.

Expiration

In DLI, you need to pay for computing and storage resources you use. Therefore,there is no limit on the validity period.

Overdue Payment and Renewal

If the billing mode is set to Pay-per-use during queue purchase, the systemdeducts bills by hour. If the account balance is insufficient, the system cannotdeduct bills for the latest hour, which causes arrears. The system automaticallyprompts you to renew the subscription.

If your account is in arrears, you cannot use DLI. DLI will reserve resources for youfor 15 days. If you pay the renewal usage within the retention period, you cancontinue using DLI. Otherwise, services will be stopped and resources will bereleased after the retention period expires.

NO TE

If you pay the renewal usage within the retention period, the outstanding amount isdeducted first.

The procedure is as follows:

1. On the top right of the management console, choose Billing > Renewal.

Figure 9-2 Renewal

Data Lake InsightProduct Introduction 9 Billing

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 29

Page 33: Product Introductiondata sources, and big data ETL processing. Large-scale Log Analysis Gaming operation data analysis Different departments of a game company analyze daily new logs

2. On the Renewals page, select the order to be renewed and click Renew inthe Operation column.

Bill QueryingOn the menu bar on the top right of the management console, click Billing to goto the Billing Center page, then click Bills > Dashboard. Alternatively, you canclick Billing Dashboard from the Billing drop-down list to view your expenditures.

Data Lake InsightProduct Introduction 9 Billing

Issue 01 (2020-08-06) Copyright © Huawei Technologies Co., Ltd. 30