customer contextual awareness -...

Customer Contextual Awareness

Service Overview

Analysis of Carriers' Real-Time Data

Service Positioning

Basic Terms

Functional View

Quick Start

Designing an Event on the GUI

Context

Terminal uses perform various operations through the mobile phone. In different scenarios,

carriers have different concerns. In this case, events can be designed to filter data to find out

users or user behavior that the carriers focus on.

Event attributes are user or user behavior information that the carriers focus on.

Some service events have been preconfigured in the CAE.

Preconfigured events must comply with the following specifications:

"Interface Specifications (PS Domain) for the Data Integration Server in the Log System of China Mobile V1.0.0" for events whose name contains the PS keyword, for example, SOURCE_PS_XXX. For details about preconfigured CAE events.

"China Mobile Unified DPI Device Technical Specifications - Interface Specifications for the LTE Signaling Collection and Parsing Server V2.0.9" for events whose name contains the LTE keyword, for example, SOURCE_LTE_XXX. For details about preconfigured CAE events.

"Interface Specifications (CS Domain) for the Data Integration Server in the Log System of China Mobile V1.0.0" for other preconfigured events. For details about preconfigured CAE events.

Procedure

Step 1 Log in to the CAE.

Enter http://IP address:Port/console. However, the login URL varies depending on the

installation mode.

Then, enter the login user name, password, and verification code. For details about the default

password, see "Password Change Views" in the Password Change.

The IP address is used for logging in to the Universe.

Step 2 Choose Data Governance in the navigation tree at the upper part. The Data Governance

page is displayed.

Step 3 Choose Realtime Awareness > Event Management Center > Event Design.

The Query page is displayed, as shown in Figure 1-1.

Figure 1-1 Query page

On this page, you can query existing events and their status (online or offline). You can search

for an event by Event code, Name, or Status.

After an event is brought online, you can:

Select the event on the Ingestion Design page.

Create topics associated with the event in the Kafka to store ingested events

Step 4 Create an event type.

Click in Category on the left and create an event type.

(Optional) You can create some extension attributes for a type of events. For details, see

Adding Extension Attributes.

Step 5 Create an event.

Click Add. The event design page is displayed, as shown in Figure 1-2.

Figure 1-2 Event design page

Step 6 Configure basic event information.

Table 1-1 describes the basic event information.

Table 1-1 Basic event parameters

Parameter Description Example

Event Code Unique ID of an event, which can contain

fewer than 50 characters.

The name only can be letters, numbers, and

underline (_), and cannot start with a

number.

1000001

Name Unique name of an event, which can contain

fewer than 200 characters.

The name cannot contain special

characters: !@#$%^&*()+=[]{}<>,./:;'\|~

NewsPotal

Description Event description, which is optional and can

contain a maximum of 500 characters.

characters: !@#$%^&*()+=[]{}<>,./:;'\|~

News portal URL

marketing

Type Event type, which is used for service

management.

You can click in Category on the left

and add a category.

InternetEvent

Event Separator Event data delimiter. ,

Event Line

Separator

Line break in event data. \n

Event Character Set Event character set type. UTF-8

Event Partition

Number

Number of Kafka topic partitions associated

with an event.

The system will automatically calculate the

most suitable number of partitions in the

current environment, and the calculated

number is used as the default value.

Calculation rule: (Number of Kafka

Broker nodes * Value of log.dirs in

Kafka)/Number of event copies

Event Replication

Factor

Number of Kafka topic copies associated

with an event.

most suitable number of copies in the

If there is only one Kafka Broker node, the

number of copies is 1. If there are multiple

nodes, the number of copies is 2.

Related topic: Using default topic. After an event is

brought online, the system automatically

creates a topic whose name is in

sdi_${Event code} format in the Kafka.

Use the default topic.

Step 7 Add an event attribute.

Click Add on the "Event Attr" page.

The Event Attr dialog box is displayed.

Step 8 Configure basic attribute information.

Table 1-2 describes the basic attribute information.

Table 1-2 Basic attribute parameters

Attr Name Attribute name, which can contain fewer

than 50 characters.

The name only can be letters, numbers, and

underline (_), and cannot start with a

number.

Attr Type Attribute type. Select a value from the

drop-down list box.

Character string

Partition Attribute Indicates whether to use the attribute as the

partition key. An event can have only one

partition key.

Remark Remarks. User access URL.

Association Query Optional

It is used during the configuration of

simple moment rules. For example, if this

parameter is set to Website query, all

values related to website query can be

queried in Right Value of a moment rule.

For example, the following associated

values of website query can be queried:

Taobao

Example: If Association Query for the

URL attribute of event_01 is set to

Website query and event_01 is used to

create moments, the preceding four values

can be associated in Right Value during

the URL filtering rule configuration. For

details, see 6.3.1.1.1 Setting a Moment

Rule Design.

Attribute Format (Optional)

This parameter is displayed when Attr

Type is set to Date. Set this parameter to

the time format, for example, yyyyMMdd.

yyyy: year

MM: month

dd: day

HH: hour indicated using the 24-hour method.

hh: hour indicated using the 12-hour method.

mm: minute

ss: second

timestamp: timestamp

For example, if the format is set to yyyy-MM-dd HH:mm:ss, the time is 2017-05-25 13:01:01.

yyyyMMdd

After configuring the information, click Confirm to go to the event design page.

You can repeat this step to add other attributes.

You can click next to the added attribute to view the attribute information, click

to modify the attribute information, or click to delete the attribute.

Step 9 Click Confirm.

The Query page is displayed.

Step 10 Bring the event online.

Select the event to be brought online and click Online, as shown in Figure 1-43.

You can select an event, click to view the event information, click to

edit event information, or click to delete the event.

Figure 1-3 Bringing the event online

----End

Ingestion Process Design

This topic describes basic operations in the ingestion process, including creating a project,

configuring basic information, specifying an execution host, and editing the process.

Procedure

Step 1 Log in to the foreground.

installation mode.

page is displayed.

Step 3 Choose Realtime Awareness > Streaming Studio > Streaming Processing Design from the

navigation tree at the upper part.

The ingestion process query page is displayed.

On this page, you can query existing ingestion processes and their status. You can search for a

process by name or status.

The ingestion process status is described as follows:

Published: indicates that the ingestion process has been released into a properties.properties file and deployed in the corresponding Flume node.

Draft: indicates that the ingestion process is restored and may be incomplete.

Step 4 Create a project.

Click in the navigation tree on the left to expand the project list.

Click at the upper part and create a project.

In the Create dialog box, enter a project name and a description and click Save.

Click the created project in the project list. The ingestion process query page of the project is

displayed.

To edit the newly created project, you can select the project and click at the upper part.

To delete the newly created project, you can select the project and click at the upper part.

Step 5 Click Add and add an ingestion process.

1. Select a basic operation item.

− Directly Create: directly creates an ingestion process.

− Created from Template: creates an ingestion process using a template.

This topic describes how to directly create an ingestion process. For details about how to create an ingestion process using a template, see Creating an Ingestion Process Using a Template.

2. Configure basic information.

Enter an ingestion process name and a description.

3. Select hosts.

Select one or more Flume nodes for executing the process.

4. Edit the process.

Drag corresponding diagram elements from the toolbar to the workspace. The diagram

elements include Source, Channel, Sink, (optional) Interceptor, (optional)Channel

Selector and (optional) Sink Group.

You can double-click a diagram element to edit it and connect diagram elements using

lines.

Figure 1-4 shows a complete ingestion process.

Figure 1-4 Example of a complete ingestion process

For details about diagram elements in the tool bar such as the Source, Sink, and

Channel, see the corresponding topics.

You can click Save to save the process as a draft, or click Publish to release the process.

----End

Reference

Multiple processes can be designed in an ingestion design, as shown in Figure 1-5 and Figure

Figure 1-5 Process 1

Figure 1-6 Process 2

Service Application

Design Events

Before creating ingestion processes, you need to define concerned events. The ingestion,

filtering, and calculation operations can be performed on the events subsequently.

In Quick Start, the complete process of creating an event is provided. An event can be created

in the following ways.

Importing Events in Batches Through the Event Editing Tool

Context

You can use the event editing tool to import events in batches, simplifying the event design

procedure.

If the event editing tool is used, the event design process consists of event design and event

attribute design.

Procedure

installation mode.

page is displayed.

Figure 1-7 Event query page

On this page, you can query existing events and their status (online or offline). You can search

for an event by Event code, Name, or Status.

Step 4 Download the event editing tool.

On the Query page, click Download preset event tool to download the tool.

Step 5 Edit events.

1. Open the EventPreset.xlsm tool downloaded in Step 4.

2. Design events.

Edit events in the event editing tool. For details, see 1.2.2.3 Editing Events.

Step 6 Export events.

Generate the event data file.

On the Event sheet of EventPreset.xlsm, click Generate Data File to export edited

events from the tool.

Generate the event attribute data file.

On the EventAttr sheet of EventPreset.xlsm, click Generate Data File to export edited

event attributes from the tool.

The generated data files are stored in the same directory as EventPreset.xlsm by default.

Step 7 Import events.

Click next to Event import and select the .dat files.

Click Import to import.

Step 8 Bring the event online.

Select the event to be brought online and click Online.

You can select an event, click View to view the event information, clickEdit to edit event

information, or click Delete to delete the event.

Figure 1-8 Bringing the event online

----End

Editing Events

Users can design events based on parameters described in this topic. Events designed in this

way can be imported in batches.

Editing Events

On the Event tab page of the preconfigured event editing tool, set parameters based on the

editing template. Table 1-3 describes the parameters.

Table 1-3 Event parameters

Mandatory

EventName Event name, which must be unique.

characters: !@#$%^&*()+=[]{}<>,./:;'\|~

Internet access event

DataEncode Event encoding method. Currently, only the

CSV encoding method is supported.

DataCharset Event character set type. UTF-8

DataSeparator Event data delimiter. ,

DataLineFeed Event data line break. \n

Optional

EventCode Unique ID of an event, which can contain

less than 50 characters.

topic_1_event

Topic Associated topic. After an event is brought

online, a topic with the name specified by

this parameter will be automatically created

in the Kafka to store messages.

Either "EventCode" or "Topic" needs to be created. After one of the two parameters is set, the system automatically generates the value of the other based on internal rules.

topic_1

EventTypeName Event type, which is used for service

management.

TopicReplicationFa

Number of Kafka topic copies associated

with an event.

most suitable number of copies in the current

environment, and the calculated number is

used as the default value.

If there is only one Kafka Broker node, the

number of copies is 1. If there are multiple

nodes, the number of copies is 2.

TopicPartitionNum

Number of Kafka topic partitions associated

with an event.

most suitable number of partitions in the

Calculation rule: (Number of Kafka Broker

nodes * Value of log.dirs in

Kafka)/Number of event copies

If EventCode is left empty, the system automatically generates the value based on the corresponding

topic. The generation rule is ${Topic}_event rule.

If Topic is left empty, the system automatically generates the value based on the value of EventCode. The generation rule is sdi_${EvnetCode}.

The value of EventName can contain Chinese characters, cannot contain special characters, and can contain less than 200 characters.

For events of the Universe Video Analytics, the recommended event type is UVA.

Editing Event Attributes

On the EventAttr tab page of the preconfigured event editing tool, set parameters based on

the editing template. Table 1-4 describes the parameters.

Table 1-4 Event attribute parameters

EventCode Event code.

The event code exists on the Event tab page.

topic_1_event

AttrName Attribute name, which can contain less than

50 characters.

Ensure that each attribute name of an event is

unique.

StartTime

Index Attribute index. The index indicates the

location of an attribute. The index of each

attribute of an event must be unique.

AttrType Attribute type. String

IsPartitionKey Indicates whether to use the attribute as the

partition key. An event can have only one

partition key.

1: yes

Remark (Optional) Attribute remarks. Start Time

AttrDynQueryId

(Optional)

It is used during the configuration of simple

moment rules in the Digital Marketing

system. Set it to the association ID, which

can be queried from the

t_cae_dyn_qry_config table in the

Campaign database.

For example, if this parameter is set to

Website query, all values related to website

query can be queried in Right Value of a

moment rule.

For example, the following associated values

of website query can be queried:

Taobao

Example: If Association Query for the URL

attribute of event_01 is set to Website query

and event_01 is used to create moments, the

preceding four values can be associated in

Right Value during the URL filtering rule

configuration. For details, see 6.3.1.1.1

Setting a Moment Rule Design.

IsExtendAttr Indicate whether the attribute is an additional

attribute. The value 1 indicates yes and the

value 0 indicates no.

ExtendAttrValue

(Optional)

Default value of the additional attribute. -

AttrFormat

(Optional)

If the attribute type is Date, set the parameter

to the time format. Example: yyyyMMdd.

yyyy: year

MM: month

dd: day

mm: minute

ss: second

For example, if the format is set to yyyy-MM-dd

HH:mm:ss, the time is 2017-05-25 13:01:01.

yyyyMMdd

Adding Extension Attributes

Context

A source event contains limited attributes while other attributes are required for marketing. In

this case, add extension attributes during event definition.

An extension attribute added to an event type is applicable to all events of this type.

Procedure

installation mode.

page is displayed.

Figure 1-9 Event query page

Step 4 Add an event type.

Click in Category on the left and create an event type.

(Optional) You can create some extension attributes for a type of events. Table 1-5 describes

the extension attribute parameters.

Table 1-5 Extension attribute parameters

Name Attribute name, which can contain

less than 50 characters.

Code Unique ID of an event, which can

contain less than 50 characters.

characters: !@#$%^&*()+=[]{}<

>,./:;'\|~

Required Indicates whether the extension

attribute is mandatory.

In the version, only the value Yes

is supported.

The options are as follows:

text: Text type

select: Drop-down list box

radio: Option button type

checkbox: Check box type

datetime: Date control type

textarea: Text box type

For details, see Figure 1-10 and

Figure 1-11.

For details, see Figure 1-10.

Default Value Default value of the extension

attribute, which is optional.

Description Description of the extension

attribute, which is optional.

This attribute indicates whether

the attribute is synchronous, and

this attribute is required by the

Campaign.

Figure 1-10 Configuring extension attributes

Figure 1-11 Extension attribute configuration effect

Table 1-6 Extension attribute description

Parameter Description

isSuppression This attribute maps to the DND function in the Campaign system.

custFlag This attribute maps to the Customer Model function in the

Campaign system.

isSync This attribute maps to the Synchronization and Asynchronization

function in the Campaign system.

permission(R, S) This attribute maps to the Marketing Permission function in the

Campaign system.

You can query values of this attribute from the data dictionary

CAE.EVENT.CAMPAIGN.AGREE.TYPE.

The permissionR and permissionS attributes are only used to demonstrate the display conditions of the radio and select styles. When the business process is configured, you must use the checkbox style.

datetime(1, 2, 3) This attribute has no business meaning, and specifies three display

methods of datetime.

----End

Preconfigured Events

Context

Some service events have been preconfigured in the CAE.

Preconfigured events must comply with the following specifications:

"Interface Specifications (PS Domain) for the Data Integration Server in the Log System

of China Mobile V1.0.0" for events whose name contains the PS keyword, for example,

SOURCE_PS_XXX. For details about preconfigured CAE events.

"China Mobile Unified DPI Device Technical Specifications - Interface Specifications

for the LTE Signaling Collection and Parsing Server V2.0.9" for events whose name

contains the LTE keyword, for example, SOURCE_LTE_XXX. For details about

preconfigured CAE events.

"Interface Specifications (CS Domain) for the Data Integration Server in the Log System

of China Mobile V1.0.0" for other preconfigured events. For details about preconfigured

CAE events.

Designing the Flume Event Ingestion Process

In Quick Start, the complete process of creating an ingestion process is provided. A process

can be created in the following ways.

Creating an Ingestion Process Using a Template

This topic describes how to create an ingestion process using a template.

Prerequisites

You have logged in to the foreground and designed events in Base Operation of the ingestion

process design.

Procedure

Step 1 Click Add to go to the page for adding an ingestion process.

1. Select a basic operation item.

Created from Template: creates an ingestion process using template.

Enter an ingestion process name and a description.

3. Select a host.

Select one or more Flume nodes for executing the process.

4. Select a template.

Select a template.

5. Edit the process.

Figure 1-12 shows the template process.

You can edit the ingestion process using a template.

Drag corresponding diagram elements from the toolbar to the workspace. The diagram

elements include Source, Channel, Sink, (optional) Interceptor, (optional)Channel

Selector and (optional) Sink Group.

Figure 1-12 Template process

For details about diagram elements in the tool bar such as the Source, Sink, and

Channel, see the corresponding topics.

You can click Save to save the process as a draft, or click Publish to release the process.

----End

Creating an Ingestion Process by Importing

Ingestion processes configured in the CAE can be imported and exported.

Prerequisites

The ingestion processes have been created in other CAE.

Procedure

Step 2 Log in to the CAE where the ingestion processes have been created.

Step 3 Export the ingestion processes.

Select the processes to be exported and click Export, as shown in Figure 1-13.

Figure 1-13 Exporting the ingestion processes

A .zip file is exported, and saved to the local host.

Currently, only a single process can be imported.

Step 5 Import the ingestion processes.

Click the icon next to File path, select the exported .zip file, and click Import, as shown in

Figure 1-14.

Figure 1-14 Importing the ingestion processes

----End

Data Source Configuration

The data source can be accessed in multiple modes during ingestion process design.

Spooling Directory

The data source is accessed by reading local files of the Flume.

Prerequisites

You have logged in to the foreground and designed events in the process editing step during

the ingestion process design.

The ingestion directory of the Spooling Directory Source diagram element cannot be deleted during the running of the Flume. Otherwise, the ingestion can be restored only after the Flume is restarted or the Flume properties file is updated.

The ingestion directory of the Spooling Directory Source diagram element cannot contain files with the same name.

Procedure

Step 1 Click the Spooling Directory Source diagram element on the toolbar and click the blank area

in the canvas.

Figure 1-15 Spooling Directory Source diagram element

Step 2 Double-click the Spooling Directory Source diagram element in the canvas to edit it.

Figure 1-16 Page for editing the Spooling Directory Source diagram element

Table 1-7 describes the parameters of the Spooling Directory Source diagram element.

Table 1-7 Parameters of the Spooling Directory Source diagram element

Node Name User-defined diagram element.

Whether the

related event:

Related Event: associates events existing in Event Design or in

the CAE database to facilitate insertion of data interceptors, for

example, Field Filter and Field Encrypt.

The number of accessed source data fields must be the same as the

number of attributes of the associated events.

No Related Event: does not associate events and ingests the

accessed original data.

Source Event Associated event.

Events in Event Design can be displayed only after being brought online.

Data Source

Directory

Local directory for storing source data in the Flume.

The directory cannot be deleted during the running of the Flume. Otherwise, the ingestion can be restored only after the Flume is restarted or the Flume properties file is updated.

Senior Optional

Properties

You can configure advanced configuration items based on description

on the GUI or use default values for them.

If the stored file is a .gz file, select the Compressed File Source

parameter. A package can have only one file.

Add The Flume has a lot of parameters. If the parameter that a user

requires is not available on the GUI, the user can define it.

Ensure that the parameter exists in the matching Flume and is

correctly set.

----End

The Flume server can receive data transferred from the service system using the SDTP

protocol.

Prerequisites

Procedure

Step 1 Click the Sdtp Source diagram element on the toolbar and click the blank area in the canvas.

Figure 1-17 Sdtp Source diagram element

Step 2 Double-click the Sdtp Source diagram element in the canvas to edit it.

Figure 1-18 Page for editing the Sdtp Source diagram element

Table 1-8 describes the parameters of the Sdtp Source diagram element.

Table 1-8 Parameters of the Sdtp Source diagram element

Whether the related

event:

Server Port Server port of the SDTP socket, which is user-defined.

Ensure that the port number is not in use. The check command is as

follows:

netstat -an | grep port_number

If the command has output, the port number is in use.

SDTP Protocol

SDTP protocol type. The options are as follows:

SDTP_DPILTE

SDTP_PS

SDTP_CS

SDTP_SMSP3

Major Version

Number

Primary version number. The parameter does not need to be

modified by default.

Minor Version

Number

Subversion number. The parameter does not need to be modified by

default.

User Name Authentication user name for the client to connect to the SDTP

service. The default value is recommended.

Password Authentication password, which is encrypted. For details about the

default password, see "Password Change Views" in the Password

Change.

You can run the $HOME/manager/bin/encrypt.sh password

command on the active node of the CAE Server to obtain the

encrypted password.

Senior Optional Properties

Authorization

Indicates whether to authenticate the user name and password.

By default, the user name and password need to be authenticated.

The value N indicates that the user name and password do not need

to be verified.

sdtp_cdr Indicates whether to process data contains CDR tags, that is, the first

10 fields in the data are the CDR tags and the rest fields are data

content.

The data is processed by default. The value N indicates that the data

does not need to be processed.

sdtp_eventseparator Row separator. By default, a record is not separated into multiple

Generally, a row contains one data record. You can use the row

separator to separate a record into multiple rows.

The parameter is set to a hexadecimal number and is processed as

the corresponding decimal number during data processing. For

example, if this parameter is set to 0A, 10 is used as the row

separator in the data.

----End

FTP/SFTP

The Flume server can receive data transferred from the service system using the SDTP

protocol.

Prerequisites

You have logged in to the GUI and designed events. Events are designed in Third step:

Editing Process of the ingestion design process.

Procedure

Step 1 Click FTP Source on the toolbar and click the blank area in the canvas.

Figure 1-19 FTP Source

Step 2 Double-click FTP Source in the canvas to edit it.

Figure 1-20 Page for editing the FTP Source diagram element

Table 1-9 describes the FTP Source parameters.

Table 1-9 Parameters of the FTP Source diagram element

Whether the related event: Related Event: associates events existing in Event

Design or in the CAE database to facilitate insertion

of data interceptors, for example, Field Filter and

Field Encrypt.

The number of accessed source data fields must be the

same as the number of attributes of the associated events.

No Related Event: does not associate events and

ingests the accessed original data.

Protocol Type Protocol used for transmitting source events. The options

are as follows:

Ftp Data source identification Unique FTP server ID, which is used by the system to

internally identify an FTP server. For details, see 1.1.2.1

Configuring the FTP Host.

known_hosts File Path The filepath of known_hosts file in FTP server. The

known_hosts file is for SSH authentication.

Add The Flume has a lot of parameters. If the parameter that

a user requires is not available on the GUI, the user can

define it.

Ensure that the parameter exists in the matching Flume

and is correctly set.

----End

The Flume server can receive data transferred from the service system using the AVRO

protocol.

Prerequisites

Context

The Avro source and Avro sink must be used together. Currently, the application scenario is

internal delivery in the ingestion process, as shown in Figure 1-21.

Figure 1-21 Common scenario of the Avro source and Avro sink

After data ingestion, a part of data is distributed to the HDFS sink. Another part of data is

distributed to the Avro sink. The Avro source receives and filters the data and then distributes

the filtered data to the Kafka sink.

In such case, the Avro source and sink are used for internal distribution.

Procedure

Step 1 Click the Avro Source diagram element on the toolbar and click the blank area in the canvas.

Figure 1-22 Avro Source diagram element

Step 2 Double-click the Avro Source diagram element in the canvas to edit it.

Figure 1-23 Page for editing the Avro Source diagram element

Table 1-10 describes the parameters of the Avro Source diagram element.

Table 1-10 Parameters of the Avro Source diagram element

Whether the related

event:

AvroSource

Binded Port Server port of the Avro socket, which is user-defined.

Ensure that the port number is not in use. The check command is as

follows:

netstat -an | grep port_number

If the command has output, the port number is in use.

DecompressAvroSource (The system receives compressed CDR files reported from sites

or gateways through the Avro interface, decompresses and verifies the files, and report

retransmitted messages.)

Enable SSL Indicates whether to enable SSL. The value true indicates yes and

the value false indicates no.

IP of Server

Receiving

Retransmission

Notification

When the source event data is incorrect, the CAE instructs to

retransmit information to the server specified by the service. IP

address of the server that receives retransmission instructions.

Port of Server

Receiving

Retransmission

Notification

Port that receives retransmission instructions.

Quality Statistics

Upload Path in

Quality Statistics Upload Path in HDFS.

Local Directory for

Storing Intermediate

Statistical Result

Local directory in the Flume for storing the intermediate statistical

result file.

Max. Working

Threads

Maximum number of threads used for receiving data from the client

or Avro sink.

Decompression

Format for

Transferred-in Data

If DecompressAvroSource is selected, this parameter does not

need to be set. The .gz format is used by default.

If AvroSource is selected, the open-source capability of the Avro

Source is used. In this case, only the .zlib format is supported. To

receive data in .zlib format, set this parameter to deflate.

SSL Keystore Path Path of the SSL keystore file. This parameter is mandatory if SSL

is enabled.

KeyStore Password keystore password. This parameter is mandatory if SSL is enabled.

Keystore Type in

keystore type.

Excluded SSL/TLS

Protocols

Exclusion list of the SSL/TLS protocol. Use space characters to

separate multiple values. The SSLv3 is always excluded. Therefore,

the default value is SSLv3.

Enable IP Filtering Indicates whether to enable IP filtering for the Netty. The value

true indicates yes and the value false indicates no.

Define IP Filtering

IP filtering rule of the Netty.

Interface Receiving

Retransmission

Notification

The default value is IF_ReUpload.

Retry Times upon

Retransmission

Failure

Number of times that a retransmission instruction can be sent. The

retransmission instruction is sent again when the last retransmission

instruction fails to be sent. The default value is 3.

Retransmission

Notification Sending

Interval (s)

Interval for sending retransmission instructions. Indicates a

retransmission instruction is sent again after a specified time period

when the last retransmission instruction fails to be sent. The default

value is 120.

Retransmission

Notification Timeout

Interval (s)

Timeout interval of retransmission instructions. The server

specified by a site may fail to receive retransmission information

due to network faults. Time interval after the last retransmission

instruction fails to be sent. The default value is 60.

Secure HDFS User User name for accessing the HDFS in secure mode.

If this parameter is not set, the insecure mode is used.

Keytab File

Directory of Secure

HDFS User

Path of the keytab file for accessing the secure HDFS.

If this parameter is not set, the insecure mode is used.

Statistics Period of

File Statistical Item

(minutes)

Statistics period of the following statistical items:

Number of files uploaded through the interface

Volume of data uploaded through the interface

Number of files that have been retransmitted through the

interface

Number of files to be retransmitted through the interface

Number of records in the error CDR file

The default value is 1440, in minutes (that is, a day).

Statistics Period of

Record Statistical

Item (minutes)

Statistics period of the following statistical items:

Number of CDRs received through the interface

Total traffic of CDRs received through the interface

The default value is 10, in minutes.

Period for

Generating

Statistical Result to

HDFS (minutes)

The default value is 1440, in minutes (that is, a day).

----End

(Optional) Data Processor Configuration

The CAE provides built-in plug-ins in the Flume to implement multiple source data

processing capabilities.

Field Projecting

During data ingestion, the Flume can project fields to meet service requirements.

Function Description

Figure 1-24 Example of the field projecting function

Set the first column in the target event to the value of column 4 in the source event.

Set the second column in the target event to the value of column 3 in the source event.

Prerequisites

1. You have logged in to the foreground and designed events in the process editing step

during the ingestion process design.

2. You have configured the source and selected associated events in the source.

Procedure

Step 1 Click the Projection diagram element on the toolbar and click the blank area in the canvas.

Step 2 Connect the source to the Field Projecting diagram element using a line.

Figure 1-25 Field Projecting diagram element

Step 3 Double-click the Field Projecting diagram element in the canvas to edit it.

Figure 1-26 Page for editing the Field Projecting diagram element

Table 1-11 describes the parameters of the Field Projecting diagram element.

Table 1-11 Parameters of the Field Projecting diagram element

Input Event Attribute Existing attributes of the source event.

Output Event Attribute: Output fields.

Select existing attributes and click to import them to

the Output Event Attribute list. The output event attribute

sequence is determined by the import sequence.

The icon is used to select all attributes.

The icon is used to deselect the selected attributes.

The icon is used to deselect all attributes.

In Figure 1-27, only the PhoneNum and Place attributes are

exported and they are exported in sequence.

Remove Space

Characters Preceding to

or Following Field

Choose true or false.

----End

Field Extraction

When ingesting data, the Flume can distribute the record to different storage systems based on

some field values. For example, if eventID is set to 001, the record is distributed to the Kafka

system and other records are distributed to the HDFS system.

Figure 1-27 Common Field Extraction example

Figure 1-28 Level-2 Field Extraction example

The function of level-2 distribution is to save data to specified directories in the HDFS by

category.

For example, data is first saved to different directories based on the value of Type. (The

directory is named after the field value by default.)

If the value of Type is 001, data is secondarily distributed by the value of Place.

If the value of Type is 002, data is secondarily distributed by the value of PhoneNum.

Prerequisites

Procedure

Step 1 Click the Distribution diagram element on the toolbar and click the blank area in the canvas.

Step 2 Connect the source to the Field Extraction diagram element using a line.

Figure 1-29 Field Extraction diagram element

Step 3 Double-click the Field Extraction diagram element in the canvas to edit it.

Figure 1-30 Ordinary sorting example

Table 1-12 Parameters of the Ordinary sorting diagram element

First sorting field Name of the field used for distribution.

First sorting field value Specifies the field value when sorting is performed.

Records whose sorting field value does not equal the

value of this parameter, the records are discarded.

Figure 1-31 Page for editing the Secondary sorting diagram element

Table 1-13 describes the parameters of the Secondary sorting diagram element.

Table 1-13 Parameters of the Secondary sorting diagram element

Ordinary sorting

First sorting field Name of the field used for distribution.

First sorting field value Specifies the field value when sorting is performed.

Records whose sorting field value does not equal the

value of this parameter, the records are discarded.

Secondary sorting

Specify sorting value Level-1 distribution field value for secondary sorting.

Secondary sorting field Name of the field used for secondary sorting.

Secondary sorting field value Specifies the field value when secondary sorting is

performed. Records whose sorting field value does not

equal the value of this parameter, the records are

discarded.

Senior Optional Properties secondaryFieldsDefault: directory for storing the

secondarily distributed data.

For example, if the secondary distribution is performed

when the value of the field for first distribution is 11 and

secondaryFieldsDefault is set to second, the storage

path is 11/second/. By default, data is saved to the

level-1 directory, that is, 11/.

Step 4 Use Field Extraction together with Channel Selector or HDFS Sink.

Use Field Extraction together with Channel Selector to complete the scenario shown in

Figure 1-32. For details, see 1.2.3.6.3 (Optional) Data Channel Selection Configuration.

Use Field Extraction together with HDFS Sink to complete the scenario shown in

Figure 1-33. The details are described later.

Step 5 Drag Memory Channel from the toolbar to the canvas and connect it to Field Extraction.

Step 6 Drag HDFS Sink from the toolbar to the canvas and connect it to Field Extraction.

Set HDFS Storage Path in the HDFS to /flume/test/LTE/%{first}/%{second}.

In the path, /flume/test/LTE/ indicates the path in the HDFS. Change it based on the site

requirements. For details, see 1.2.3.7.5 Using the Client to View and Create Files in the

In the path, %{first}/%{second} indicates that the level-1 distribution field value is used

as the level-1 file name and level-2 distribution field value is used as the secondary file

----End

Field Backfill

You can use the Flume plug-in package provided by the CAE to collect data required for the

service system. The Flume adds fields to source events by searching and mapping the service

data cache file preset in the Flume.

Figure 1-32 Field Backfill function example

The Flume adds fields from cache files to source events. In the preceding example, the

values of the Place field in the source event are updated according to cache table 1.

The Flume adds fields from cache files to source events. In the preceding example, the

PhoneNum field is added to the source event according to cache table 2.

Prerequisites

Procedure

Step 1 Click the Association and Backfill diagram element on the toolbar and click the blank area in

the canvas.

Step 2 Connect the source to the Backfill diagram element using a line.

Figure 1-33 Field Backfill diagram element

Step 3 Double-click the Backfill diagram element in the canvas to edit it.

Figure 1-34 Page for editing the Field Backfill diagram element

Table 1-14 describes the parameters of the Field Backfill diagram element.

Table 1-14 Parameters of the Field Backfill diagram element

Dimension Tables Click Add.

Enter the storage path of the cache table in table1.

The table file must be stored in the Hadoop HDFS.

For details about how to view and create an HDFS directory on the

Hadoop client on the CAE server, see 1.2.3.7.5 Using the Client to

View and Create Files in the HDFS.

Multiple dimension tables can be added.

Backfill Rule Click Add and configure how to fill dimension table data back to

the source data.

Condition: Set this parameter in Backfill Condtion.

Table Name: Select a table in Dimension Tables. The default

value is tableN.

The Index of Table Field: Cache table column where values

are filled back to data.

Target Field: field to which the value is filled back.

When a field is added, you can define the name of the field.

Conversion Type: conversion type. The options are update

(replacing the original field value) and append (adding a field).

In Figure 1-35, if Condition1 is met, the value of the second field in dimension table 1 is filled back to the Place field in the source data. If Condition2 is met, the value of the second field in dimension table 2 is added to the PhoneNum field in the source data.

Backfill Condtion Backfill condition. You can click Add and configure the condition

for triggering backfill.

Condition Name: condition name, which is user-defined.

Table Name: dimension table name. The default value is

tableN.

The Index of Table Field: Cache table column where the value

is used as the comparison value.

Target Value Type: target value type. The options are

Constant and Source Event Attribute.

Target Value: target value. Set this parameter to a constant or a

source event attribute.

In Figure 1-35, Condition1 indicates that the backfill operation is triggered when the value of the first field in dimension table 1 is the same as that of UserID in the source event.

Base Information Configures the source data delimiter.

----End

Field Standard

When ingesting data, the Flume can pre-process some data in the transferred source event to

standardize the data, allowing other components to obtain standard events.

The following types of data standardization are supported:

For mobile number fields:

− Remove +86 or 0086 from the beginning of mobile numbers.

− Remove 0 from the beginning of mobile numbers that start with 0 and contain 12

digits.

− Remove space characters from the beginning and end of mobile numbers.

− Record abnormal data in the

/var/log/Bigdata/flume/flume/flumeExceptionData.log file.

For date fields:

Specify the date display format.

Standardization is not supported if time is represented using multiple fields.

Times zones, milliseconds, and nanoseconds are not considered.

The timestamp is supported.

Prerequisites

Procedure

Step 1 Click the Standardization diagram element on the toolbar and click the blank area in the

canvas.

Step 2 Connect the source to the Standardized diagram element using a line.

Figure 1-35 Field Standard diagram element

Step 3 Double-click the Field Standard diagram element in the canvas to edit it.

Figure 1-36 Page for editing the Field Standard diagram element

Table 1-15 describes the parameters of the Field Standard diagram element.

Table 1-15 Parameters of the Field Standard diagram element

Date Field Standard Conversion time format.

Configure the input and output time formats based on

information displayed on the page.

yyyy: year

MM: month

dd: day

mm: minute

ss: second

For example, if the format is set to yyyy-MM-dd HH:mm:ss,

the time is 2017-05-25 13:01:01.

In the output type, replace indicates that the original field

content is overwritten and append indicates that a field whose

name can be customized is added.

Phone Number

Standard

Source event field that is used as the phone number field.

Whether or not throw

out exception data

true indicates to delete, false indicates not to delete.

----End

Field Key

The structure of the source event ingested by the Flume consists of two parts: header and body.

The header contains tag information such as the timestamp and IP address of host that sends

messages. The body contains field names and values carried in events.

When the CAE customizes the data collection, the CAEqi can change the header information,

add the Key field in the header, assign the value of the source event fields to Key, so that

events whose values of Key are the same can be written to the same partition of Kafka,

facilitating the follow-up data processing of Kafka consumers.

For example, if the mobile number in source events is configured as Key, the source events

whose mobile numbers are the same can be written to the same partition of Kafka.

Figure 1-37 Field Key example

Prerequisites

Procedure

Step 1 Click the Use Field as Key diagram element on the toolbar and click the blank area in the

canvas.

Step 2 Connect the source to the FieldKey diagram element using a line.

Figure 1-38 Field Key diagram element

Step 3 Double-click the Field Key diagram element in the canvas to edit it.

Figure 1-39 Page for editing the Field Key diagram element

Table 1-16 describes the parameters of the Field Key diagram element.

Table 1-16 Parameters of the Field Key diagram element

Key Column Name in

Header

Field name in the data header. The default value is key.

The default partitioning algorithm in the Kafka performs partition based on the key in the header. If this parameter is not set to the value of the key, the Kafka cannot recognize the parameter and cannot perform partitioning based on this parameter.

Retain Original

headername

Indicates whether to overwrite fields configured in Key

Column Name in Header in the header.

The value true indicates yes and the value false indicates no.

Specify fields as key Field in the body whose value is stored in the specified field

in the header.

----End

Field Filter

When ingesting data, the Flume can filter the ingested data based on some field values or

based on the header value. The filter condition is an expression containing the number,

character string, and date fields.

Figure 1-40 Field Filter example 1

In this example, data is filtered based on the Place field in the body of the source event.

Figure 1-41 Field Filter example 2

In this example, data is filtered based on the key value of Time in the header of the source

event.

Prerequisites

Procedure

Step 1 Click the Field Filter diagram element on the toolbar and click the blank area in the canvas.

Step 2 Connect the source to the Field Filter diagram element using a line.

Figure 1-42 Field Filter diagram element

Step 3 Double-click the Field Filter diagram element in the canvas to edit it.

Figure 1-43 Page for editing the Field Filter diagram element

Table 1-17 describes the parameters of the Field Filter diagram element.

Table 1-17 Parameters of the Field Filter diagram element

Filter Type: Filter Header: filters data based on the header.

Filter Body: filters data based on the body.

Data Style: No_Key: normal data format, for example,

001,12441,Beijing,13900000001.

Key_Value: data format containing the attribute name, for

example,

userid:001,imsi:12441,place:Beijing,phonenum:139000000

If the key-value pair is selected, you need to configure the

delimiter for separating the field name from the field value in

the basic configuration.

Filter Field: Existing fields on the page.

The running result of

the expression must be

Boolean

Filter condition expression.

Fields of the character type support the following expressions:

Fieldname.startsWith (String)

Fieldname.endsWith (String)

Fieldname.isEmpty ()

Fieldname.length()>= int

Fieldname.in('1','2','3','4'...)

Example: PhoneNum.length() >=11

Fields of the number type support the following expressions:

Integer type: Fieldname.in(1,2,3,4...)

Long type: Fieldname.in(1l,2l,3l,4l...)

Double type: Fieldname.in(1.0,2.0,3.0,4.0...)

Float type: Fieldname.in(1.0f,2.0f,3.0f,4.0f...)

Example: UserID.in(001,002)

Fields of the date type support the following expressions:

sysdate()

Fieldname.addMonths(int)

Fieldname.addDays(int)

Fieldname.addHours(int)

Fieldname.addMinutes(int)

Fieldname.addSeconds(int)

Example: Time.addMonths(3) >= 6

Supported common expressions: ==, =, <=, +, -, &&, ||

----End

File Name

The structure of the source event ingested by the Flume consists of two parts: header and body.

The header contains tag information such as the timestamp and IP address of the host sending

the source event. The body contains fields and their values in the source event. This function

can be used to save tags required by users to the header.

When customizing data ingestion using the CAE, you can change the source event header

information, add tags (Key in the header) to the header, and assign the name of the file storing

the source event to the tag. In this way, events with the same value of Key can be written to

the same filepath in the HDFS.

For example, if the source event file name is 20160301_Beijing_Micromarketing.txt, you

can save the file name, absolute file path, and information in the file name to the header, as

shown in Figure 1-44.

Figure 1-44 File Name example

Save the file path to the FilePath key.

Save the file name to the FileName key.

Save the first piece of information in the file name to the Time key.

Save the second piece of information in the file name to the Place key.

After the values of fields in the header are determined, events with the same field values can

be written into the same partition in the HDFS during data receiving. For example, set the

HDFS directory to /test/%{Place}/%{Time} for data storage.

Prerequisites

Procedure

Step 1 Click the File Name diagram element on the toolbar and click the blank area in the canvas.

Step 2 Connect the source to the File Name diagram element using a line.

Figure 1-45 File Name diagram element

Step 3 Double-click the File Name diagram element in the canvas to edit it.

Figure 1-46 Page for editing the File Name diagram element

Table 1-18 describes the parameters of the File Name diagram element.

Table 1-18 Parameters of the File Name diagram element

Extraction Items Fields added to the header.

New Field Name: Enter a name.

The location field segmentation: Nth piece of

information in the file name. For example, if the file

name is 20160301_Beijing_Micromarketing.txt and

the parameter is set to 1, information "20160301" is

extracted.

You can click Add and add multiple extraction fields.

File Name Information

Delimiter

File name information delimiter.

For example, if the file name is

20160301_Beijing_Micromarketing.txt, the delimiter

is the underscore (_).

The following characters cannot be used as separators: < > " '.

Key Storing File Name New field in the header for storing the complete file

name. The field name can be customized.

File Name Extension File name extension. This parameter is optional and is

used to differentiate the scenario where File Name

Information Delimiter is set to dot (.).

----End

Field Change

When ingesting data, the Flume can calculate the original value of a field and replace the

original value with the calculation result or add a field value output.

Figure 1-47 Field Change example 1

Calculate the name:James field in the source file using the substring(2,4) function, change

the field name, convert the field into newName:me, and add the conversion result to the new

field.

Increase the value of the lac:12 field in the source file by 1, change the field name, convert

the field into newLac:13, and replace the original field with the new field.

Figure 1-48 Field Change example 2

Perform the lac+ci calculation on value 12 of the lac field in the source file and replace the

original field with the new field that is obtained.

Prerequisites

Procedure

Step 1 Click the Field Change diagram element on the toolbar and click the blank area in the canvas.

Step 2 Connect the source to the Field Change diagram element using a line.

Figure 1-49 Field Change diagram element

Step 3 Double-click the Field Change diagram element in the canvas to edit it.

Figure 1-50 Page for editing the Field Change diagram element

Table 1-19 describes the parameters of the Field Change diagram element.

Table 1-19 Parameters of the Field Change diagram element

Data Style: No_Key: normal data format, for example, James,12,34.

Key_Value: data format containing the attribute name, for

example, name:James,lac:12,ci:34.

If the parameter is set to Key_Value, you need to configure the

following information:

Delimiter between the field name and field value.

Whether to delete the field name from the source data. The value

true indicates yes and the output data is James,12,34. The value

false indicates no and the output data is

name:James,lac:12,ci:34.

Field Change Rule Field change rule. You can click Add and add a field change rule.

Rule Name: The default value is ruleN.

Expression: Specify a value in the Expression text box.

Convert Type: conversion type. The value replace indicates that

the original field value is replaced. If this value is used, you need

to set Convert Type Name. The value append indicates that a

field is added. If this value is used, you need to set Append Field

New Key: new key name. For example, if the parameter is set to

newName, the output data is newName:xxx.

This parameter is valid only when Data Style is set to

Key_Value.

Expression: The expression format is AttrName.Function.

After the configuration, click .

----End

Field Encrypt

The function can be used to collect and aggregate the CDRs of sites in real time and encrypt

sensitive fields for statistics and analysis of the downstream service system.

The sensitive fields in the files have been encrypted using the SM4 or AES128 algorithm.

Prerequisites

Procedure

Step 1 Click the Encryption diagram element on the toolbar and click the blank area in the canvas.

Step 2 Connect the source to the Encrypt diagram element using a line.

Figure 1-51 Field Encrypt diagram element

Step 3 Double-click the Encrypt diagram element in the canvas to edit it.

Figure 1-52 Page for editing the Field Encrypt diagram element

Table 1-20 describes the parameters of the Field Encrypt diagram element.

Table 1-20 Parameters of the Field Encrypt diagram element

Encrypt Mode Encryption algorithm.

Base Information

Select fields to be encrypted and click to

import them to the list on the right.

The icon is used to select all fields.

The icon is used to deselect the selected fields.

The icon is used to deselect all fields.

Data Delimiter Data Delimiter

Time Field Location Event field name in the source data.

Time Field Format Time field format.

Key Obtaining Interval

(minutes)

Interval for obtaining the key after the key fails to be

obtained, in minutes.

Key Obtaining REST Interface

URL of the REST interface for obtaining the key.

Parameters that need to be set when AES128 is used

Authentication User User name for authentication. For details, see the DG

documentation.

Authentication User Password Password for authentication. For details, see the DG

documentation.

Parameters that need to be set when SM4 is used

Key Validity Length Before

and After Current Time

Number of months before and after the current month.

Keys used during this period can be obtained.

For example, the validity period of the current key is

20151201-20161230 and the current time is March 2016.

If the parameter is set to 2, the system obtains keys used

in the period ranging from January 1 of 2016 to May 31

of 2016.

If the parameter is set to 10, the system obtains keys

used in the period ranging from December 1 of 2015 to

December 30 of 2016. (Since the total time duration

before and after the current time exceeds the maximum

validity period of the key, the validity period of the key

is used.)

----End

Data Channel Configuration

Memory

Prerequisites

Procedure

Step 1 Click the Memory Channel diagram element on the toolbar and click the blank area in the

canvas.

Figure 1-53 Memory Channel diagram element

Step 2 Double-click the Memory Channel diagram element in the canvas to edit it.

Table 1-21 describes the parameters of the Memory Channel diagram element.

Table 1-21 Parameters of the Memory Channel diagram element

Senior Optional Properties You can configure advanced configuration items based

on description on the GUI or use default values for them.

----End

Prerequisites

You have logged in to the GUI and designed events. Events are designed in Third step:

Editing Process of the ingestion design process.

Procedure

Step 1 Click File Channel on the toolbar and click the blank area in the canvas.

Figure 1-54 File Channel

Step 2 Double-click File Channel in the canvas to edit it.

Table 1-22 describes the File Channel parameters.

Table 1-22 Parameters of the File Channel diagram element

encrypt If it is set to Yes, set information including the key,

password, and password file.

----End

(Optional) Data Channel Selection Configuration

The Channel Selector diagram element determines channels into which a specific event

received by the source is written.

Currently, no parser is provided for the Channel Selector diagram element. Follow

instructions in the properties.properties file to configure the Channel Selector diagram

element.

For details, visit https://flume.apache.org/FlumeUserGuide.html to see the Hadoop official

document.

In the example, the Channel Selector diagram element is used to implement the following

function:

Figure 1-55 Common Field Extraction example

When ingesting data, the Flume can distribute the record to different storage systems based on

some field values. For example, if eventID is set to 001, the record is distributed to the Kafka

system and other records are distributed to the HDFS system.

Prerequisites

Procedure

Step 1 Drag diagram elements such as Figure 1-56 from the toolbar to the canvas and connect the

diagram elements.

Figure 1-56 Channel Selector configuration example

Step 2 Double-click each diagram element to edit it.

1. Configure the source.

Figure 1-57 Configuring the source

Table 1-23 Parameters of the Spooling Directory Source diagram element

spooldir Local directory for storing source data in the Flume.

The directory cannot be deleted during the running of the Flume. Otherwise, the ingestion can be restored only after the Flume is restarted or the Flume properties file is updated.

If the stored file is a .gz file, select the

GzFileDeserializer parameter. A package can have only

one file.

2. Configure the Field Extraction diagram element.

Figure 1-58 Configuring the Field Extraction diagram element

Table 1-24 Parameters of the Secondary sorting diagram element

Ordinary sorting

New Field Name New field name defined by a user for conversion.

Field Name Name of the field used for distribution.

Secondary sorting

New field value Level-1 distribution field value for performing secondary

distribution.

Secondary sorting field New field name defined by a user for conversion.

Field Name Name of the field used for secondary distribution.

secondaryFieldsDefault: directory for storing the

secondarily distributed data.

For example, if the secondary distribution is performed

when the value of the field for first distribution is 11 and

secondaryFieldsDefault is set to second, the storage

path is 11/second/. By default, data is saved to the

level-1 directory, that is, 11/.

3. Configure the Channel Selector diagram element.

Figure 1-59 Configuring the Channel Selector diagram element

Table 1-25 Parameters of the Channel Selector diagram element

type Channel Selector mode. The value multiplexing indicates

multi-channel distribution.

header Field used for distribution in the header or body.

In the document, the value of first is the field alias configured

in Field Extraction.

Default channel In the example, the Channel_N1462272643037 channel is

used when the value is not 001 (default).

In the preceding information, Channel_N1462272643037 is value of Agent Name in the channel.

Mapping of header value and channel

Header Value In the example, the Channel_N1462272643030 channel is used

when the value is 001.

In the preceding information, Channel_N1462272643030 is value of Agent Name in the channel.

Channel Name

4. Configure the channel.

Figure 1-60 Configuring channel 1

Figure 1-61 Configuring channel 2

5. Configure the Kafka Sink diagram element.

Figure 1-62 Configuring the Kafka Sink diagram element

Table 1-26 Parameters of the Kafka Sink diagram element

Kafka Broker IP address of the Kafka Broker:Service port of the Kafka Broker.

Use commas (,) to separate multiple values.

The port number is the same as the value of port in the /opt/huawei/Bigdata/etc/*_**_Broker/server.properties file. The default port number is 21005.

Kafka Topic Topic for storing the event. You can select a value from the

drop-down list box.

Partitioning Method

Default: indicates the default partition method, that is, the partition

is performed based on the key in the header.

ConsistencyHash: indicates the consistency hash.

Random: indicates random partition.

Events Processed in Each Batch

Copies to Authorize Before Event Writing Success

6. Configure the HDFS Sink diagram element.

Figure 1-63 Configuring the HDFS Sink diagram element

Table 1-27 Parameters of the HDFS Sink diagram element

HDFS Storage Path Storage path of events ingested by the Flume.

If the storage path is /tmp/flume_ide, the parent directory /tmp

of the path must be an existing HDFS directory. The /flume_ide

subdirectory can be defined by a user, and multiple levels of

subdirectories can be defined by a user. The CAE system will

automatically generate a user-defined subdirectory.

For details about how to view and create an HDFS directory on the Hadoop client, see .

Kerberos Principal

Kerberos File Path

When the kerberos authentication function is enabled in the

HDFS, the Kerberos Principal and Kerberos File Path

parameters must be selected and correctly configured. Generally,

the parameter settings are as follows:

: flume

/opt/huawei/Bigdata/FusionInsight-Flume-*.*.*/flume/conf/fl

ume.keytab

----End

Data Output Configuration (Sink)

This topic describes how to configure the mode for exporting ingested data, that is, configure

the sink.

The Kafka is used to store events ingested by the Flume. You can specify the topic

corresponding to each event.

Prerequisites

Procedure

Step 1 Click the Kafka Sink diagram element on the toolbar and click the blank area in the canvas.

Figure 1-64 Kafka Sink diagram element

Step 2 Double-click the Kafka Sink diagram element in the canvas to edit it.

Figure 1-65 Page for editing the Kafka Sink diagram element

Table 1-28 describes the parameters of the Kafka Sink diagram element.

Table 1-28 Parameters of the Kafka Sink diagram element

Kafka Topic Topic for storing the event. You can select a value from the

drop-down list box.

Senior Optional

Properties

Partitioning Method

Default: indicates the default partition method, that is, the

partition is performed based on the key in the header.

ConsistencyHash: indicates the consistency hash.

Random: indicates random partition.

Events Processed in Each Batch

Copies to Authorize Before Event Writing Success

correctly set.

----End

The HDFS is used to store events ingested by the Flume. You can specify the storage path of

the ingested data.

Prerequisites

Procedure

Step 1 Click the HDFS Sink diagram element on the toolbar and click the blank area in the canvas.

Figure 1-66 HDFS Sink diagram element

Step 2 Double-click the HDFS Sink diagram element in the canvas to edit it.

Figure 1-67 Page for editing the HDFS Sink diagram element

Table 1-29 describes the parameters of the HDFS Sink diagram element.

Table 1-29 Parameters of the HDFS Sink diagram element

HDFS Storage Path Storage path of events ingested by the Flume.

If the storage path is /tmp/flume_ide, the parent directory

/tmp of the path must be an existing HDFS directory. The

/flume_ide subdirectory can be defined by a user, and multiple

levels of subdirectories can be defined by a user. The CAE

system will automatically generate a user-defined subdirectory.

For details about how to view and create an HDFS directory on the Hadoop client, see 1.2.3.7.5 Using the Client to View and Create Files in the HDFS.

Kerberos Principal

Kerberos File Path

When the kerberos authentication function is enabled in the

HDFS, the Kerberos Principal and Kerberos File Path

parameters must be selected and correctly configured.

Generally, the parameter settings are as follows:

: flume

/opt/huawei/Bigdata/FusionInsight-Flume-*.*.*/flume/conf/

flume.keytab

Senior Optional

Properties

You can configure advanced configuration items based on

description on the GUI or use default values for them.

correctly set.

----End

The service system can use the Avro protocol to obtain data ingested by the Flume.

Prerequisites

Context

The Avro source and Avro sink must be used together. Currently, the application scenario is

internal delivery in the ingestion process, as shown in Figure 1-68.

Figure 1-68 Common scenario of the Avro source and Avro sink

After data ingestion, a part of data is distributed to the HDFS sink. Another part of data is

distributed to the Avro sink. The Avro source receives and filters the data and then distributes

the filtered data to the Kafka sink.

In such case, the Avro source and sink are used for internal distribution.

Procedure

Step 1 Click the Avro Sink diagram element on the toolbar and click the blank area in the canvas.

Figure 1-69 Avro Sink diagram element

Step 2 Double-click the Avro Sink diagram element in the canvas to edit it.

Figure 1-70 Page for editing the Avro Sink diagram element

Table 1-30 describes the parameters of the Avro Sink diagram element.

Table 1-30 Parameters of the Avro Sink diagram element

hostname Bound IP address or host name in Avro.

port Bound port number in Avro.

Add The Flume has a lot of parameters. If the parameter that

a user requires is not available on the GUI, the user can

define it.

Ensure that the parameter exists in the matching Flume

and is correctly set.

----End

(Optional) Sink Group Configuration

The Sink Group is used for event output load balancing, ensuring high usability when a sink is

unavailable.

Context

Currently, no parser is provided for the Sink Group diagram element. Follow instructions in

the properties.properties file to configure the Sink Group diagram element.

Prerequisites

Procedure

Step 1 Drag diagram elements such as Figure 1-71 from the toolbar to the canvas and connect the

diagram elements.

Figure 1-71 Sink Group configuration example

Step 2 Double-click the Sink Group diagram element in the canvas to edit it.

Figure 1-72 Configuring the Sink Group diagram element

In the example, the Load-Balancing Sink processor is used as the load balancer processing mode. For details about the configuration of other modes, visit https://flume.apache.org/FlumeUserGuide.html to see the Hadoop official document.

Table 1-31 Parameters of the Sink Group diagram element

type Type of the selected sink. In this example, the value is

load-balance.

selector Distribution mode. The options are as follows:

round_robin: distributes data by the sink order, for

example, sink 1, sink 2, sink 3...

random: distributes data randomly.

backoff Indicates whether to add a sink to the blacklist when it is

faulty.

maxTimout Maximum timeout interval.

----End

Using the Client to View and Create Files in the HDFS

The CAE server provides the Hadoop client. Users can directly use the client to view and

create files in the HDFS.

Step 1 Log in to a Hadoop client.

Step 2 Edit the environment variable.

Enter the bash mode and initialize the environment variable.

% source bigdata_env

Initialize the ticket.

% kinit <user name>

Generally, the admin user is used for login. Enter the user password as prompted.

Step 3 Query the HDFS file directory.

% hdfs dfs -ls /

The following information is displayed:

Found 8 items

-rw-r--r-- 3 hdfs supergroup 0 2016-04-05 11:21

/PRE_CREATE_DIR.SUCCESS

drwxr-x--- - flume hadoop 0 2016-04-05 11:21 /flume

drwx------ - hbase hadoop 0 2016-04-05 11:21 /hbase

drwxrwxrwx - mapred hadoop 0 2016-04-05 11:21 /mr-history

drwxrwxrwx - spark supergroup 0 2016-04-07 16:00 /sparkJobHistory

drwxrwxrwx - hdfs hadoop 0 2016-04-05 11:43 /tmp

drwxrwxrwx - hdfs hadoop 0 2016-04-05 11:43 /user

drwxr-xr-x - ldapuserzh0405 supergroup 0 2016-04-07 16:19 /usr

Step 4 Create the /usr/test directory.

% hdfs dfs -mkdir /usr/test

Step 5 Query the /usr directory in the HDFS.

% hdfs dfs -ls /usr

Found 4 items

drwxr-xr-x - ldapuser_ling0416 supergroup 0 2016-04-18 15:18 /usr/data1

drwxr-xr-x - ldapuserzh0405 supergroup 0 2016-04-08 15:37 /usr/streaming

drwxr-xr-x - ldapuser_tmy supergroup 0 2016-04-07 16:19 /usr/streaming

drwxr-xr-x - ldapuser_ling0416 supergroup 0 2016-04-18 15:18 /usr/test

The command for deleting the directory is hadoop fs -rmr /usr/test.

----End

Designing Spark Streaming for Task Processing You can manage and orchestrate Spark tasks on the page. On the management page, you can

upload and instantiate Spark task rules, and start, stop, or delete Spark tasks. On the

orchestration page, you can graphically orchestrate Spark task rules.

Creating Task Rules by Orchestration

In the CAE system, the Spark task rule is called rule, and the instantiated task is called task.

Procedure

installation mode.

page is displayed.

Step 3 In the navigation bar on the upper part, choose Realtime Awareness > Streaming Studio >

Stream Computing Design.

The page Stream Computing Design is displayed, as shown in Figure 1-73.

Figure 1-73 Stream Computing Design page

Step 4 Create a project.

Click on the upper part and create a project, which facilitates the classification of Spark

rules.

In the Create dialog box, enter the project name and description, and click Save.

Click the created project in the project list. The ingestion process query page of the project is

displayed.

To edit the created project, you can select the project and click at the upper corner.

To delete the created project, you can select the project and click at the upper corner.

Step 5 Click Add to access the rule orchestration page.

1. Set Rule Name and Description.

2. Edit the rule.

Drag the corresponding Source, Interceptor, and Sink diagram elements from the

toolbar.

Double-click diagram elements to edit them and connect diagram elements by lines.

Figure 1-74 shows a complete process.

Figure 1-74 Rule design example

For details about diagram elements on the toolbar such as Source, Interceptor, and Sink,

see the corresponding topics. Figure 1-75 describes the configuration.

Table 1-32 Stream processing rule configuration

Type Parameter

Source Kafka

Socket

Intercepter Projecting

Filter

Backfill

GroupBy

Accumulate

Sink HDFS Sink

Type Parameter

KafKa Sink

3. Click Publish to release the task rule.

Click Save to save the task rule. Click Return to return to the Realtime Application Management page.

After the release, the Realtime Application Management page is displayed, and a new

Spark rule is generated.

Step 6 Instantiate the generated Spark task rule.

Click instantiation in the Operation column next to a Spark rule name.

Set instantiated parameters.

Click Confirm to view the corresponding instantiated task name on the Task management

----End

Creating Task Rules by Upload

Prerequisites

In the CAE system, the Spark task rule is called rule, and the instantiated task is called task.

JAR packages and XML files of Spark task rules have been developed. For details,

contact Huawei technical support to obtain the Customization Development Guide.

Compress the JAR packages and XML files.

Context

You can upload the JAR packages of Spark task rules on the CAE GUI. Then you can

instantiate the rule to generate an executable task.

Procedure

installation mode.

page is displayed.

Step 3 Choose Realtime Awareness > Application Management > Realtime Application

Management.

The Tenacies Manager page is displayed, as shown in Figure 1-75.

Figure 1-75 Tenacies Manager

Step 4 Click upload file and click .

Select the ZIP package of a Spark task rule.

Step 5 Click upload.

After the ZIP package is successfully uploaded, the corresponding rule is displayed.

Step 6 Instantiate the Spark task rule.

Click in the Operation column.

Set instantiated parameters.

Click Confirm. You can view the generated task in the Task management page.

----End

Configuring the Input Data Source (Source)

You can configure the data source (Kafka Source or Socket Source) to obtain data for Spark

calculation.

Kafka Source

You can specify the data in Kafka topic as the source data for calculation.

Prerequisites

You have logged in to the foreground and designed events on the stream processing design

Procedure

Step 1 Choose Source > Kafka on the toolbar and click the blank area in the canvas.

Figure 1-76 Kafka diagram element

Step 2 Double-click the Kafka diagram element in the canvas to edit it.

Figure 1-77 Page for editing the Kafka Source diagram element

Table 1-33 describes the parameters.

Table 1-33 Kafka Source parameter description

Item Description

Broker List Kafka Broker server list, which is automatically obtained from the

system and does not need to be configured.

Topic Names Subscribed topics from which source data are obtained for calculation.

Use commas (,) to separate multiple topics.

Data Encoding Method of encoding data in topics. Only string and byte array are

currently supported.

Data Separator Source data delimiter. Set the parameter based on the site requirements.

Field Names Name of each field in source data. Use commas (,) to separate multiple

fields.

Field Types Type of each field in source data. Use commas (,) to separate multiple

types. Each type maps to a field name.

Only string, int, double, long, float, and boolean are supported, and

only lowercase letters are supported.

----End

Socket Source

You can obtain the source data for calculation from the source data server through the

interface.

Prerequisites

Procedure

Step 1 Choose Source > Socket on the toolbar and click the blank area in the canvas.

Figure 1-78 Socket Source diagram element

Step 2 Double-click the Socket diagram element in the canvas to edit it.

Figure 1-79 Page for editing the Socket Source diagram element

Table 1-34 Socket Source parameter description

Item Description

Server IP IP address of the Socket server.

Server Port Port number of the Socket server.

Item Description

Data Separator Source data delimiter. Set the parameter based on the site

requirements.

Field Names Name of each field in source data. Use commas (,) to

separate multiple fields.

Field Types Type of each field in source data. Use commas (,) to

separate multiple types. Each type maps to a field name.

Only string, int, double, long, float, and boolean are

supported, and only lowercase letters are supported.

----End

Configuring the Calculation Method (Interceptor)

The CAE has preconfigured an algorithm in the Spark to provide multi-functional source data

calculation capabilities.

Projection

You can extract meaningful fields from the source data by projection.

Prerequisites

page, and the Source has been configured.

Procedure

Step 1 Click the Projection diagram element on the toolbar and click the blank area in the canvas.

Step 2 Connect the Source with the Projection NE by lines.

Figure 1-80 Projection network element

Step 3 Double-click the Projection diagram element in the canvas to edit it.

Figure 1-81 Page for editing the projection

Table 1-35 Projection parameter description

Item Description

Input Event Attribute Existing attributes of the source event.

Output Event Attribute: Output fields.

Select existing attributes and click to import them to

the Output Event Attribute list. The output event

attribute sequence is determined by the import sequence.

The icon is used to select all attributes.

The icon is used to deselect the selected attributes.

The icon is used to deselect all attributes.

----End

Filtering

You can filter source data based on values of some fields in the input data. You can define the

filtering expression.

Prerequisites

Procedure

Step 1 Click the Filtering diagram element on the toolbar and click the blank area in the canvas.

Step 2 Connect the Source with the Filtering NE by lines.

Figure 1-82 Filtering diagram element

Step 3 Double-click the Filtering diagram element in the canvas to edit it.

Figure 1-83 Page for editing the filtering

Table 1-36 Filtering parameter description

Item Description

The running result of the

expression must be Boolean.

Filtering condition expression.

Fields of the character type support the following

expressions:

Fieldname.startsWith (String)

Fieldname.endsWith (String)

Fieldname.isEmpty ()

Fieldname.length()>= int

Fieldname.in('1','2','3','4'...)

Example: PhoneNum.length() >=11

Fields of the number type support the following

expressions:

Integer type: Fieldname.in(1,2,3,4...)

Long type: Fieldname.in(1l,2l,3l,4l...)

Double type: Fieldname.in(1.0,2.0,3.0,4.0...)

Float type: Fieldname.in(1.0f,2.0f,3.0f,4.0f...)

Example: UserID.in(001,002)

This version does not support the date-type field

expression.

The JEXL3 expressions are commonly supported, such

as ==, =, <=, >=, !=, +, -, &&, and ||.

----End

Association and Backfill

You can add or update fields in the source data through the dimension table (cache table)

provided by the business.

Figure 1-84 Example of the association and backfill function

The dimension table (cache table) contains event update fields. In the preceding figure,

the value of the Place field in the source event is updated according to the cache table 1.

The dimension table (cache table) contains event added fields. In the preceding figure,

the PhoneNum field is added to the source event according to the cache table 2.

Prerequisites

Procedure

Step 1 Click the Association and Backfill diagram element on the toolbar and click the blank area in

the canvas.

Step 2 Connect the Source with the Association and Backfill NE by lines.

Figure 1-85 Association and backfill diagram element

Step 3 Double-click the Association and Backfill diagram element in the canvas to edit it.

Figure 1-86 Page for editing the association and backfill

Table 1-37 Association and backfill parameter description

Item Description

Table Name Click Add.

Configure the table name, path for storing the table, and data delimiter in the

table.

The table file must be stored in the Hadoop HDFS in advance and the

storage path is a path in the HDFS.

For details about how to view and create an HDFS directory on the Hadoop client on the CAE server, see 1.2.3.7.5 Using the Client to View and Create Files in the HDFS.

You can add multiple dimension tables.

Backfill

Click Add and configure how to backfill the data in the dimension table to

the source data.

Condition: Set this parameter in Backfill condition.

Table Name: Set this parameter to a Dimension Table name. The

default value is tableN.

Backfill Value SN in Dimension Table: cache table column where

values are filled back to data. (The value starts from 1, not 0.)

Target Field: field to which a value is backfilled.

If the value is backfilled to an added field, the name of the field can be defined by the user.

Conversion Type: Update the value of the original field or add a field.

In Figure 1-87, if "condition1" is met, the value of the second field in dimension table 1 is updated to the Place field in the source data. If "Condition2" is met, the value of the second field in dimension table 2 is added to the PhoneNum field in the source data.

Backfill

Condition

Click Add and configure the condition for triggering backfill.

Condition Name: condition name, which is defined by a user.

Table Name: dimension table name. The default value is tableN.

The Index of Table Field: dimension table column where values are

used for comparison.

Target Field: Select a source data field.

In Figure 1-87, Condition 1 indicates that the backfill is triggered if the value of the first field in dimension table 1 is equal to the value of the Place field.

Information

Configures the source data delimiter.

----End

Grouping

You can group the data based on the value of a field. After grouping, you can calculate the

data in the group by functions and export the calculation result to the Sink.

Figure 1-87 Grouping function display

After grouping by UserType, calculate the sum of Score in the group and the maximum value

of Age.

The naming rule of the output result field is ${FunctionName}_${FieldName}, for

example, sum_Score,max_Age.

The last column in the output result is key field column, for example, UserType.

Prerequisites

Procedure

Step 1 Click the GroupBy diagram element on the toolbar and click the blank area in the canvas.

Step 2 Connect the Source with the GroupBy NE with lines.

Figure 1-88 Field distribution diagram element

Step 3 Double-click the GroupBy diagram element in the canvas to edit it.

Figure 1-89 Grouping example

Table 1-38 Grouping parameter description

Item Description

Fields To Group On Name of the field based on which groups are configured.

You can select multiple fields here. Events that map both

grouping fields will be classified into the same group.

For example, if the first and second fields are grouping

fields, the following two events will be classified into the

same group:

A,B,22,44,66,88

A,B,11,33,55,77

Available Groupby Functions Functions that are selected to calculate the data in the

group after grouping. You can select multiple functions.

Input Field of XX Field that is selected to calculate the data in the group by

functions after grouping.

Input Field Type of XX This parameter does not need to be set. The system will

automatically extract the configuration information from

the Source.

For the max, min, avg, and sum functions, the input parameter must be of the value type. For the count function, the input parameter can be of any type.

----End

Accumulation

See Typical Configuration Case: Marketing Event Upon Traffic Usage Saturation.

Configuring the Data Output Type (Sink)

You can configure the output method of calculation result data.

Exporting Data to HDFS

You can use the HDFS to store the processing result and specify the path for storing the

processing result.

Prerequisites

page, and the Source and Interceptor have been configured.

Procedure

Step 1 Choose Sink > HDFS on the toolbar and click the blank area in the canvas.

Figure 1-90 HDFS Sink diagram element

Step 2 Double-click the HDFS diagram element in the canvas to edit it.

Figure 1-91 Page for editing the HDFS Sink diagram element

Table 1-39 HDFS Sink parameter description

Item Description

HDFS Storage Path Specified path for storing the result data. The default path is

/tmp/spark_ide.

The parent directory /tmp of the path must be the existing

directory in the HDFS system. The subdirectory /spark_ide can

be defined by the user, and the user-defined subdirectory can

have multiple levels. The CAE system will automatically

generate a user-defined subdirectory.

For details about how to view and create an HDFS directory on the Hadoop client, see 1.2.3.7.5 Using the Client to View and Create Files in the HDFS.

Senior Optional

Properties

You can configure advanced configuration items based on

description on the GUI or use default values for them.

----End

Exporting Data to Kafka

You can use Kafka to store the processing result and specify the topic for storing the

processing result.

Prerequisites

page, and the Source and Interceptor have been configured.

Procedure

Step 1 Choose Sink > Kafka Sink on the toolbar and click the blank area in the canvas.

Figure 1-92 Kafka Sink diagram element

Step 2 Double-click the Kafka Sink diagram element in the canvas to edit it.

Figure 1-93 Page for editing the Kafka Sink diagram element

Table 1-40 Kafka Sink parameter description

Item Description

Topic Names Specified topic name for storing the output data.

Output Data Separator Output data delimiter.

Event Batch Size The Kafka uses an asynchronous processing distribution

mechanism. This option indicates the number of records

processed in each batch.

Data Serializer Serialization method of output data, the default value is

kafka.serializer.StringEncoder.

Output Fields You can adjust the sequence of output fields and set

fields in sequence based on site requirements.

----End

Task Manager

Prerequisites

In the CAE, the Spark task rule is called rule, and the instantiated rule is called task.

Rules have been instantiated and corresponding tasks have been generated.

Context

You can manage tasks on the CAE GUI, including starting, stopping, deleting, and querying

tasks.

Procedure

installation mode.

page is displayed.

Step 3 Choose .

The Realtime Application Management page is displayed, as shown in Figure 1-94.

Figure 1-94 Realtime Application Management

Step 4 Click Task Manager on the left.

The Task Manager page is displayed, as shown in Figure 1-95.

Figure 1-95 Task Manager

Step 5 Query tasks.

Use rule name for fuzzy query.

Use task name for conditional query.

Use status for conditional query.

1. Set the name or status of the task to be queried.

2. Click Query.

For the query by rule name, all tasks generated for the rule will be found.

You can configure search criteria to query tasks, and tasks meeting these criteria will be displayed, as shown in Figure 1-96.

Figure 1-96 Search criteria

Step 6 View task information.

1. Select the task to be viewed.

2. Click see. The tab page for displaying detailed task information is displayed.

Step 7 Change task status

Select a task and change the its status.

You can click start to start the task, click stop to stop the task, and click Delete to delete the

A stopped task can be restarted.

----End

Preconfigured Task Rules

The CAE preconfigures six Spark task rules that can be used for corresponding service

calculation.

Preconfigured rule overview

Table 1-41 Preconfigured rule overview

Rule Function Example

1.2.4.7.1

NetcatWordCount

Calculates the number of

words in the input character

strings within 10 minutes and

the number of the occurrence

times of each word.

The Spark reads character

strings from a specified

Socket Server, calculates the

number of words in the

character strings, and records

Input data:

Hello World Hello

Output data:

Hello 2

World 1

the calculation result to a

specified directory in the

(/spark/NetcatWordCount-x

xxx.output, where xxxx

indicates the timestamp).

KafkaWordCount Calculates the number of

words in the input character

strings within 10 minutes and

the number of the occurrence

times of each word.

The Spark reads character

strings from specified Kafka

topics, calculates the number

of words in the character

strings, and records the

calculation result to Kafka

topics.

MicroMarketing Performs Spark SQL

statements on input data and

sends the result to the Kafka

or Oracle database in Tianjin

micromarketing scenario.

Input data:

Xiaoming|15|80

Zhangsan|16|92

Lisi|14|85

The Spark SQL statement is

SELECT MAX(age) as name

from table1, which finds the

largest age from the input data.

Output data:

RoamingAwareness Calculates the number of users

who roam to a specified city

and are of the specified

roaming type and writes the

IMSIs of these users to the

Kafka.

Input data:

1|460023700411005|354096053181

3501|8615879005652|10.211.76.1

90|6432|255|6299|221.177.147.2

30|221.177.147.230|221.177.151

.163|221.177.151.169|2|cmnet|1

03|1460441839|1460441839905|14

60441871060|1|110|0|43426|117.

169.71.158|80|460|0|666|6995|1

0|9|0|0|0|0|0|0|6|200|3|1|1|60

0|mmocgame.qpic.cn|mmocgame.qp

ic.cn/wechatgame/mEMdfrX5RU1ib

Nvae0bPXE6eyejGjTo1wicricDldmQ

2iazRDV56uOc0B9L2QAudt0v0/0|mm

ocgame.qpic.cn|Dalvik/1.6.0

(Linux; U; Android 4.0.4;

GT-S7562i

Build/IMM76I)|image/png|||6323

|0|0|0|||2322300259|||China

Mobile|China|791|0791||Shenzhe

n|2||||Samsung|S7562I|Mobile

Specify the roaming city to

Shenzhen and roaming type to 2.

Output data:

460023700411005

LocationRemainAw

areness

Finds the users who stay in the

target area for a period longer

than the specified period and

writes the IMSIs and location

codes of these users to the

Kafka.

Input data:

1|460023700411007|354096053181

3501|8615879005652|10.211.76.1

90|200|0755|6299|221.177.147.2

30|221.177.147.230|221.177.151

.163|221.177.151.169|2|cmnet|1

03|1460441839|1460441839905|14

60441871060|1|110|0|43426|117.

169.71.158|80|460|0|15001301|2

5001301|10|9|0|0|0|0|0|0|6|200

|3|1|1|600|mmocgame.qpic.cn|mm

ocgame.qpic.cn/wechatgame/mEMd

frX5RU1ibNvae0bPXE6eyejGjTo1wi

cricDldmQ2iazRDV56uOc0B9L2QAud

t0v0/0|mmocgame.qpic.cn|Dalvik

/1.6.0 (Linux; U; Android 4.0.4;

GT-S7562i

|0|0|0|||2322300259||||China

Mobile||China|791|0791||Shenzh

en|2||||Samsung|S7562I|Mobile

In the preceding information, data in bold indicates the LAC and RAC of users. A LAC and an RAC of a user identify the location of the user. If the input data within the period indicates that a user's location remains unchanged, the user meets the requirements.

Output data:

460023700411007,200,0755

The output data contains the IMSI and location code of the user who meets the requirements.

AppTrafficAwarene

Finds out users whose traffic

usage generated for using a

specified app reaches the

specified threshold in a day or

month and writes the IMSI,

traffic usage, and the date

when the traffic usage reaches

the threshold to the Kafka.

Input data:

1|460023700411005|354096053181

3501|8615879005652|10.211.76.1

90|200|0755|6299|221.177.147.2

30|221.177.147.230|221.177.151

.163|221.177.151.169|2|cmnet|1

03|1460441839|1460441839905|14

60441871060|1|110|0|43426|117.

169.71.158|80|460|0|15001301|2

5001301|10|9|0|0|0|0|0|0|6|200

|3|1|1|600|mmocgame.qpic.cn|mm

ocgame.qpic.cn/wechatgame/mEMd

frX5RU1ibNvae0bPXE6eyejGjTo1wi

cricDldmQ2iazRDV56uOc0B9L2QAud

t0v0/0|mmocgame.qpic.cn|Dalvik

/1.6.0 (Linux; U; Android 4.0.4;

GT-S7562i

|0|0|0|||2322300259||||China

Mobile||China|791|0791||Shenzh

en|2||||Samsung|S7562I|Mobile

In the preceding information, data in bold indicates the upstream and downstream traffic, and the sum of them indicates the traffic usage generated when using the app.

Output data:

460023700411005,4.0002602E7,20

16-04-12

In the preceding information, "4.0002602E7" indicates the traffic usage generated for using the app, where "E7" indicates the seventh power of 10, and "2016-04-12" indicates the date when the traffic usage reaches the threshold. The unit of traffic usage is byte.

NetcatWordCount

This task calculates the number of words in the input character strings within 10 minutes and

the number of the occurrence times of each word. The Spark reads character strings from a

specified Socket Server, calculates the number of words in the character strings, and records

the calculation result to a specified directory in the HDFS

(/spark/NetcatWordCount-xxxx.output, where xxxx indicates the timestamp).

Input Parameter Description

Table 1-42 Input parameter description

Application Properties

source.port Communication port of the Socket data source. 9999

source.ip IP address of the Socket data source. The default

value is 127.0.0.1. Change it based on the site

requirements.

10.0.0.1

driver-memory Memory allocated to the driver, in GB. The value

must be an integer. Increase the value if the

service logic is complex.

driver-cores Number of CPU cores allocated to the driver. The

value must be an integer.

executor-memory Memory allocated to one executor, in GB. The

value must be an integer. Increase the value of

this parameter if the service logic is complex.

executor-cores Number of CPUs allocated to one executor. The

value must be an integer. Increase the value if the

num-executors Number of executors allocated to the current

Spark task. The value must be an integer. The

value of this parameter determines Spark

calculation concurrency. Change the value based

on the calculation requirements.

Verification

Management.

The Realtime Application Management page is displayed.

Step 2 Click next to a preconfigured rule.

On the instantiation page, set parameters according to examples in Table 1-43.

Step 3 Send data on the Socket server.

Hello World Hello

Step 4 Use a client to log in to the HDFS and check output data in

/spark/NetcatWordCount-xxxx.output (where xxxx indicates the timestamp).

Hello 2

World 1

For details about how to view files in the HDFS, see 1.2.3.7.5 Using the Client to View and Create Files in the HDFS.

----End

KafkaWordCount

This task calculates the number of words in the input character strings within 10 minutes and

the number of the occurrence times of each word. The Spark reads character strings from

specified Kafka topics, calculates the number of words in the character strings, and records

the calculation result to logs.

outputTopic Topic to which results are generated. If

multiple topics exist, use commas (,) to

separate them.

KafkaWordCount_ou

inputTopic Input topic. If multiple topics exist, use

commas (,) to separate them.

KafkaWordCount_in

driver-memory Memory allocated to the driver, in GB.

The value must be an integer. Increase the

value if the service logic is complex.

driver-cores Number of CPU cores allocated to the

driver. The value must be an integer.

executor-memory Memory allocated to one executor, in GB.

The value must be an integer. Increase the

value of this parameter if the service logic

is complex.

executor-cores Number of CPUs allocated to one

executor. The value must be an integer.

Increase the value if the service logic is

complex.

num-executors Number of executors allocated to the

current Spark task. The value must be an

integer. The value of this parameter

determines Spark calculation concurrency.

Change the value based on the calculation

requirements.

Verification

Management.

Step 3 Send data in the Kafka topic specified by Table 1-44.

1. Log in to the Hadoop Client.

2. Initialize the environment variable.

3. Initialize the ticket and log in to the client in secure mode.

% kinit <user name>

4. Send data in the source topic in the Kafka.

% kafka-console-producer.sh --broker-list

10.0.0.1:21005,10.0.0.2:21005,10.0.0.3:21005 --topic KafkaWordCount_input

Input data: Hello World Hello

5. Use the Kafka client tool to check whether result data is received.

% kafka-console-consumer.sh --zookeeper 10.0.0.4:24002/kafka --topic

KafkaWordCount_output --from-beginning

Hello 2

World 1

The information indicates that the Kafka has received the data.

----End

MicroMarketing

Performs Spark SQL statements on input data and sends the result to the Kafka or Oracle

database in Tianjin micromarketing scenario.

outputTopic Topic to which results are

generated. If multiple topics exist,

use commas (,) to separate them.

MicroMarketing_output

inputTopic Input topic. If multiple topics

exist, use commas (,) to separate

["MicroMarketing_input"]

windowSize Time window size of the Spark

Streaming, in seconds.

slideInterval Time window slide interval of the

Spark Streaming, in seconds.

sparkColumns Field description in the input data. [{"index":1,"columnType":"String

","columnName":"name"},{"inde

x":2,"columnType":"Long","colu

mnName":"age"},{"index":3,"col

umnType":"Float","columnName"

:"score"}]

fieldSeperator Source data delimiter. |

outputDestinatio

System for storing output data,

Kafka or Oracle database.

kafka.consumer.

Specifies the consumer group of

the Kafka.

test_group

sparkSql Spark SQL statement that

specifies the calculation logic.

SELECT MAX(age) as name

from table1

driver-memory Memory allocated to the driver, in

GB. The value must be an integer.

Increase the value if the service

logic is complex.

driver-cores Number of CPU cores allocated to

the driver. The value must be an

integer.

executor-memor

Memory allocated to one

executor, in GB. The value must

be an integer. Increase the value

of this parameter if the service

logic is complex.

executor. The value must be an

integer. Increase the value if the

num-executors Number of executors allocated to

the current Spark task. The value

must be an integer. The value of

this parameter determines Spark

calculation concurrency. Change

the value based on the calculation

requirements.

Verification

The following example finds the maximum age from the input data.

Management.

Step 3 Send data in the Kafka topic specified by inputTopics.

% kinit <user name>

10.0.0.1:21005,10.0.0.2:21005,10.0.0.3:21005 --topic MicroMarketing_input

Input data:

Xiaoming|15|80

Zhangsan|16|92

Lisi|14|85

MicroMarketing_output --from-beginning

----End

RoamingAwareness

Calculates the number of users who roam to a specified city and are of the specified roaming

type and writes the IMSIs of these users to the Kafka.

outputTopic Topic to which results are

generated. If multiple topics exist,

RoamingAwareness_output

inputTopic Input topic. If multiple topics

exist, use commas (,) to separate

RoamingAwareness_input

imsi.position.in.

Location of the IMSI field in the

input event.

The value starts from 0. For

example, if this parameter is set to

1, the IMSI field is the second

field in the input event.

roaming.type.po

sition.in.event

Location of the roaming type field

in the input event.

roaming.in.city.

position.in.event

Location of the roaming city field

in the input event.

roam.to.city City to which users roam. Philadelphia

roam.type Roaming type, which is an integer. 2

spark.checkpoin

t.path

Checkpoint path of the Spark. checkpoint_test

spark.batch.inter

Batch processing interval, in

milliseconds.

event.field.separ

Field delimiter in source data. |

consumerGroup Specifies the consumer group of

the Kafka.

test_group

Increase the value if the service

logic is complex.

driver-cores Number of CPU cores allocated to

the driver. The value must be an

integer.

executor-memor

Memory allocated to one executor,

in GB. The value must be an

integer. Increase the value of this

parameter if the service logic is

complex.

kafka.consumer.

Specifies the consumer group of

the Kafka.

test_group

Verification

The following example finds out users who roam to Philadelphia and are of roaming type 2 and records the IMSIs of the users.

Management.

Step 3 Send data in the source topic in the Kafka.

% kinit <user name>

10.0.0.1:21005,10.0.0.2:21005,10.0.0.3:21005 --topic RoamingAwareness_input

Input data:

1|460023700411005|3540960531813501|8615879005652|10.211.76.190|6432|255|6299|22

1.177.147.230|221.177.147.230|221.177.151.163|221.177.151.169|2|cmnet|103|14604

41839|1460441839905|1460441871060|1|110|0|43426|117.169.71.158|80|460|0|666|699

5|10|9|0|0|0|0|0|0|6|200|3|1|1|600|mmocgame.qpic.cn|mmocgame.qpic.cn/wechatgame

/mEMdfrX5RU1ibNvae0bPXE6eyejGjTo1wicricDldmQ2iazRDV56uOc0B9L2QAudt0v0/0|mmocgam

e.qpic.cn|Dalvik/1.6.0 (Linux; U; Android 4.0.4; GT-S7562i

Build/IMM76I)|image/png|||6323|0|0|0|||2322300259|||China

Mobile|China|791|0791||Shenzhen|2||||Samsung|S7562I|Mobile phone

Table 1-46 describes fields in the preceding information.

Table 1-46 PS_HTTP_Event event

No. Attribute Type Length Description

1 Interface unsigned byte 1 Interface type. The options

are as follows:

2: reserve

3: IuPS

2 IMSI string 15 User IMSI (in TBCD

encoding format).

3 IMEI string 16 User IMEI (in TBCD

encoding format).

4 MSISDN string 24 User number.

5 USER_IP unsigendInt 16 IP address of a user. If the

user uses an IPv6 address,

set this field to the IPv6

address. An IPv6 address

contains 128 bits.

If the user uses an IPv4

address (consisting of 32

bits), convert the IPv4

address to an IPv6

address, where the first 10

bytes are all zeros, the

middle two bytes are all f

(hexadecimal).

Both the IPv4 address and

IPv6 address must be set

in binary mode.

6 LAC int 4 LAC.

The LAC is used for location management in the CS (voice service).

7 RAC int 4 RAC.

The RAC is used for location management in the PS (data service).

8 CID int 4 CI (SAC, ECI).

9 SGSN_C_IP unsigendInt 4 IP address in the SGSN

signaling plane.

10 SGSN_U_IP unsigendInt 4 IP address in the SGSN

user plane.

11 GGSN_C_IP unsigendInt 4 IP address in the GGSN

signaling plane.

12 GGSN_U_IP unsigendInt 4 IP address in the GGSN

user plane.

13 RAT string 32 0-None

1-UTRAN

2-GERAN

3-WLAN

5-HSPA Evolution

6-EUTRAN

14 APN string 32 -

15 HTTP Service

xDR Type Code

unsigned byte 1 All Fs.

16 procedure ID unsigned byte 8 All Fs.

17 Start Time (ms) dateTime 8 1970/1/1 0:00

18 End Time (ms) dateTime 8 1970/1/1 0:00

19 App Category unsigned byte 2 All Fs.

20 App

Subcategory

21 L4 Protocol unsigned byte 1 All Fs.

22 User Port unsigned byte 2 0

23 Server IP unsigned byte 16 0

24 Server Port unsigned byte 2 0

25 Country Code int 4 -1

26 Network ID int 4 -1

27 Upstream Traffic unsigned byte 4 0

28 Downstream

Traffic

unsigned byte 4 0

29 Upstream IP

Packet Count

unsigned byte 4 0

30 Downstream IP

Packet Count

unsigned byte 4 0

31 Disordered

Upstream TCP

Packet Count

unsigned byte 4 0

32 Disordered TCP

Downstream

Packet Count

unsigned byte 4 0

33 Retransmitted

Upstream TCP

Packet Count

unsigned byte 4 0

34 Retransmitted

Downstream

TCP Packet

unsigned byte 4 0

35 UL_IP_FRAG_P

ACKETS

unsigned byte 4 0

36 DL_IP_FRAG_P

ACKETS

unsigned byte 4 0

37 Transaction

38 Transaction

Response Code

39 HTTP Version unsigned byte 1 All Fs.

40 First HTTP

Response Delay

unsigned byte 4 0

41 Last HTTP

Content Packet

Delay (ms)

unsigned byte 4 0

42 Last ACK

Confirmation

Packet Delay

unsigned byte 4 0

43 HOST String 128 All Fs.

44 URL String 256 Request URL.

45 X-Online-Host String 128 All Fs.

46 User-Agent char 64 All Fs.

47 HTTP_content_t

char 64 All Fs.

48 refer_URI char 128 All Fs.

49 Cookie char - All Fs.

50 Content-Length unsigned byte 4 0

51 Target Behavior unsigned byte 1 All Fs.

52 WTP

Interruption

unsigned byte 1 -

53 WTP

Interruption

Reason

unsigned byte 1 -

54 title String 256 This field is the title field

in an HTTP packet.

55 keyword String 256 This field is the keyword

field in an HTTP packet.

56 ChargeID unsignedInt 2 Charging information.

57 Cell Type String 16 Cell type (defined in

thirteen scenarios on

network optimization

platform).

58 Coverage Area String 16 Countryside, county, and

59 Carrier String 16 Carrier to which the user

belongs.

60 Country String 16 Home country of a user.

61 Home Province String 16 Province to which a user

belongs.

62 Home City String 16 City to which a user

belongs.

63 Roaming

Province

String 16 Province to which a user

roams.

64 Roaming City String 16 City to which a user

roams.

65 Roaming Type String 16 User roaming type. The

options are as follows:

1-International

roaming

2-Inter-province

roaming

3-Intra-province

roaming

4-local

66 SGSN Name String 16 SGSN

67 GGSN Name String 16 GGSN

68 BSC/RNC Name String 16 BSC/RNC

69 Terminal

Manufacturer

String 16 Manufacturer of a user's

terminal.

70 Terminal Model String 16 Model of a user terminal.

71 Terminal Type String 16 Type of a terminal, for

example, 2G mobile

phone, 3G mobile phone,

2G WAN card, 3G WAN

card, and 3G notebook.

This Spark task requires only the second, sixty-fourth, and sixty-fifth fields (IMSI, Roaming City, and Roaming Type respectively).

RoamingAwareness_output --from-beginning

460023700411005

----End

LocationRemainAwareness

Finds the users who stay in the target area for a period longer than the specified period and

writes the IMSIs and location codes of these users to the Kafka.

outputTopic Topic to which results are generated. If

multiple topics exist, use commas (,) to

separate them.

_output

inputTopic Input topic. If multiple topics exist, use

commas (,) to separate them.

_input

imsi.position.in.

Location of the IMSI field in the input

event.

The value starts from 0. For example, if

this parameter is set to 1, the IMSI field

is the second field in the input event.

rac.position.in.e

Location of the RAC field in the input

event.

lac.position.in.e

Location of the LAC field in the input

event.

monitor.remain.

Stay duration, in minutes. 30

monitor.location Location code (province, city). (200,0755)

spark.checkpoin

t.path

spark.batch.inter

milliseconds.

event.field.separ

kafka.consumer.

Specifies the consumer group of the

Kafka.

test_group

driver-memory Memory allocated to the driver, in GB.

The value must be an integer. Increase

the value if the service logic is complex.

executor-memor

Memory allocated to one executor, in

Increase the value of this parameter if

the service logic is complex.

executor. The value must be an integer.

Increase the value if the service logic is

complex.

current Spark task. The value must be an

integer. The value of this parameter

determines Spark calculation

concurrency. Change the value based on

the calculation requirements.

Verification

The following example calculates the number of users who stays in the (200,0755) area for more 30 minutes and writes the result to the Kafka.

Management.

% kinit <user name>

10.0.0.1:21005,10.0.0.2:21005,10.0.0.3:21005 --topic LocationRemainAwareness_input

Send the following input data for 30 consecutive minutes:

1|460023700411007|3540960531813501|8615879005652|10.211.76.190|200|0755|6299|22

1.177.147.230|221.177.147.230|221.177.151.163|221.177.151.169|2|cmnet|103|14604

41839|1460441839905|1460441871060|1|110|0|43426|117.169.71.158|80|460|0|1500130

1|25001301|10|9|0|0|0|0|0|0|6|200|3|1|1|600|mmocgame.qpic.cn|mmocgame.qpic.cn/w

echatgame/mEMdfrX5RU1ibNvae0bPXE6eyejGjTo1wicricDldmQ2iazRDV56uOc0B9L2QAudt0v0/

0|mmocgame.qpic.cn|Dalvik/1.6.0 (Linux; U; Android 4.0.4; GT-S7562i

This Spark task requires only the second, sixth, and seventh fields (IMSI, LAC, and RAC respectively).

LocationRemainAwareness_output --from-beginning

460023700411007,200,0755

----End

AppTrafficAwareness

Finds out users whose traffic usage generated for using a specified app reaches the specified

threshold in a day or month and writes the IMSI, traffic usage, and the date when the traffic

usage reaches the threshold to the Kafka.

outputTopic Topic to which the calculation result

is recorded, which must be the same

as that set in Key.

AppTrafficAwareness_outpu

inputTopic Input topic. If multiple topics exist,

AppTrafficAwareness_input

imsi.position.in.ev

Location of the IMSI field in the

input event.

The value starts from 0. For example,

if this parameter is set to 1, the IMSI

field is the second field in the input

event.

app.id.position.in.

Location of the APP_ID field in the

input event.

date.position.in.ev

Location of the DATE field in the

input event.

down.flow.positio

n.in.event

Location of the DOWN_FLOW field

in the input event.

up.flow.position.i

n.event

Location of the UP_FLOW field in

the input event.

appID ID of the app to be monitored. 110

checkpointPath Checkpoint path of the Spark. checkpoint_test

app.id.to.monitor ID of the app to be monitored. 110

statistic.period Statistics period, day or month. day

flow.threshold Traffic usage threshold, in MB. 30

spark.checkpoint.p

spark.batch.interv

milliseconds.

event.field.separat

kafka.consumer.gr

Specifies the consumer group of the

Kafka.

test_group

Increase the value if the service logic

is complex.

executor-memory Memory allocated to one executor, in

Increase the value of this parameter if

the service logic is complex.

current Spark task. The value must be

an integer. The value of this

parameter determines Spark

calculation concurrency. Change the

value based on the calculation

requirements.

Verification

The following example finds out users whose traffic usage generated for using the app whose ID is 110 reaches 30 MB in a day and writes the IMSI, traffic usage, and date of the users to the Kafka.

Management.

% kinit <user name>

10.0.0.1:21005,10.0.0.2:21005,10.0.0.3:21005 --topic AppTrafficAwareness_input

Input data:

1|460023700411005|3540960531813501|8615879005652|10.211.76.190|200|0755|6299|22

1.177.147.230|221.177.147.230|221.177.151.163|221.177.151.169|2|cmnet|103|14604

41839|1460441839905|1460441871060|1|110|0|43426|117.169.71.158|80|460|0|1500130

1|25001301|10|9|0|0|0|0|0|0|6|200|3|1|1|600|mmocgame.qpic.cn|mmocgame.qpic.cn/w

echatgame/mEMdfrX5RU1ibNvae0bPXE6eyejGjTo1wicricDldmQ2iazRDV56uOc0B9L2QAudt0v0/

0|mmocgame.qpic.cn|Dalvik/1.6.0 (Linux; U; Android 4.0.4; GT-S7562i

This Spark task requires only the second, seventeenth, twentieth, twenty-seventh, twenty-eighth fields (IMSI, Start Time, App Category, Upstream Traffic, and Downstream Traffic respectively).

AppTrafficAwareness_output --from-beginning

460023700411005,4.0002602E7,2016-04-12

In the preceding information, "4.0002602E7", where "E7" indicates the seventh power of 10, indicates the traffic usage generated for using the app. The value is the sum of the values of the twenty-seventh and twenty-eighth fields, in bytes.

In the preceding information, "2016-04-12" indicates the date when the traffic usage reaches the maximum, which is converted from the value of the seventeenth field.

----End

Typical Configuration Case: Marketing Event Upon Traffic Usage Saturation

The system monitors the traffic usage of a specific app, finds users whose traffic usage

reaches the threshold in the specified time, and sends the user information to the message

middleware for storage.

Context

Figure 1-97 Service of monitoring users' traffic usage

Traffic usage accumulation result format: Key field,Timestamp,Accumulated value

APP_ID1: app category; APP_ID2: app subcategory

Within an hour after 03:00:00 on May 6, a user's traffic usage does not reach the maximum.

The system deletes the result and starts accumulation again. Within an hour after 04:00:00 on

May 6, a user's traffic usage reaches the maximum. The system generates the data.

Procedure

installation mode.

page is displayed.

Step 3 Choose Realtime Awareness > Streaming Studio > Stream Computing Design.

The Stream Computing Design page is displayed, as shown in Figure 1-98.

Figure 1-98 Stream Computing Design

Step 4 Click Add. The rule orchestration page is displayed.

Set parameters. Figure 1-99 shows the parameter settings.

Figure 1-99 Rule orchestration

The following describes configurations of each module:

Kafka Source configuration

Figure 1-100 Kafka Source configuration

Table 1-49 Kafka Source parameters

Parameter Value Description

Topic Name sumInput Topic from which source data is read for

calculation.

Data Type string Both the string and bytearray types are

supported. Set this parameter based on the

type of source data.

Data Separator , Set this parameter based on the source data.

Field Name MSISDN,FLUX,APP_

ID1,APP_ID2,TIME

Set this parameter based on the source data.

Field Type string,double,int,string,

Set this parameter based on the source data.

The value options are as follows: string,

int, double, long, float, and boolean. The

value must be in lower-case

For a timestamp field, set this parameter to long.

Filter interceptor configuration

Figure 1-101 Filter interceptor configuration

Table 1-50 Filter interceptor parameters

The running result

of the expression

must be Boolean.

APP_ID1==2 and

(APP_ID2=='aiqiyi' or

APP_ID2=='kugou')

Filter data whose app category is

2 and app subcategory is aiqiyi

or kugou.

Accumulation interceptor configuration

Figure 1-102 Accumulation interceptor configuration

Table 1-51 Accumulation interceptor parameters

Key Field MSISDN Group and accumulate values by MSISDN.

Accumulate

FLUX Accumulate the FLUX field.

Accumulate Field can only be numeric, such as int, long,

double, float.

Trigger

Threshold

300 When the accumulation result reaches the trigger

threshold, accumulation stops and result outputs.

Clear cycle Fixed time: 1

Period of resetting the accumulation result. When a period

ends, the accumulation result is set to 0. In the next

period, the accumulation starts from 0. The options are as

follows:

Natural month

Natural day

Fixed time. The unit is hour. Only positive integers are

supported. The maximum value is 31 x 24.

Whether to

timestamp

Yes Yes: Trust time in the source data. Select a timestamp

field of the long type.

No: Not trust time in the source data. Use the system

time of the CAE as the data generation time (in this

case, the timestamp in the output data is also the

system time).

The accumulation result is in Key field,Timestamp,Accumulated value format.

Kafka Sink configuration

Figure 1-103 Kafka Sink configuration

Table 1-52 Kafka Sink parameters

Topic Name sumOutput Topic that stores the calculation result.

Output Data Separator , Separator of the output data.

Step 5 After the orchestration is complete, click Publish.

The Realtime Application Management page is displayed after a successful publishing.

Step 6 Instantiate the Spark task rule.

Click Operation > Instantiation next to the Spark rule.

Set instantiation parameters. Customize the task name, use the default values for other

parameters, and click Confirm.

In the instantiation configuration, the application.batch.milliseconds parameter indicates the time interval for the Spark to process each batch of tasks. The default value is 3000ms (3s).

The Task Manager page is displayed. You can view the instantiated task name on this page.

Step 7 Select the task and click "Start".

----End

Verification

Step 1 Log in to the Hadoop Client.

Step 2 Initialize the environment variable.

Step 3 Initialize the ticket and log in to the client in secure mode.

% kinit <user name>

Step 4 Send the data in the data source topic of Kafka.

% kafka-console-producer.sh --broker-list 10.0.0.1:21005,10.0.0.2:21005,10.0.0.3:21005

--topic sumInput

Enter the following information:

13800000001,100,2,aiqiyi,1494010800

13800000001,100,2,kugou,1494014400

13800000001,100,2,aiqiyi,1494016200

13800000001,100,2,kugou,1494017400

Step 5 Use the Kafka client tool to check whether result data is received.

% kafka-console-consumer.sh --zookeeper 10.0.0.4:24002/kafka --topic test_output

--from-beginning

13800000001,1494017400,300.0

The preceding information indicates that the calculation is successful.

The traffic usage accumulation result is in Key field,Timestamp,Accumulated value format.

----End

Best Practices

Typical Configuration Cases: Location Event Log Collection

Principle

Location and communication signaling is ingested from original service data provided by

carriers and used for identifying users who stayed in a specified location for a specified period,

based on which the system conducts precision marketing.

The Flume is a common CDR ingestion platform. The Contextual Awareness Engine is

required to create events, specify event ingestion rules, and release the rules to the Flume.

Then the Flume ingests original CDRs based on the rules. CDR files ingested by the Flume

can be provided for the Campaign Management for real-time marketing.

Figure 1-104 shows the data direction for Commissioning Location Event Log Collection.

Figure 1-104 Data direction for Commissioning Location Event Log Collection

1. The Location_In event (indicating signaling CDRs from the A interface), the

Location_Basic event (indicating collected location data), and the Location_Out event

(indicating the result of Timing calculation) need to be created for the signaling ingestion

commissioning process.

2. The Flume ingests source data (Location_In events) from service signaling, filters the

ingested data to obtain the required data (Location_Basic events), and then stores the

required data to the Kafka.

3. The event rule template User Staying at Specified Area for a Time Period is used in

subsequent commissioning. The Location_Basic event is used as the input of location

moment rule templates.

4. After moment calculation, data meeting the requirements (Location_Out event ) will be

sent to the Campaign system for conducting related marketing activities.

CAE system includes the following modules:

CAE Server: Huawei-developed CAEServer performs unified task scheduling for stream processing components at the bottom layer. CAE Server enables operators to process streaming data without paying attention to components at the bottom layer, simplifying operations and improving processing efficiency.

Flume: connected Hadoop component, for data ingestion.

Kafka: connected Hadoop component, for data distribution.

Spark: connected Hadoop component, for data computing.

Creating Location Events

Procedure

Step 1 Log in to the Universe as the user user.

1. Enter http://Floating IP address of the SLB:9010/console/login.action in the address box

of the browser and press Enter.

2. Enter the user name and password of the user user and click Login.

Step 2 Choose Data Governance.

Step 4 Click on the left and create an event type.

Figure 1-105 Creating an event type

The extension attribute isSync indicates whether an event is a synchronous event. This

attribute is used by the Campaign and the default value is 0.

Step 5 Create the basic location event Location_In in this directory. The event contains the userID,

MSISDN, IMSI, LacTac, and CI attributes.

Figure 1-106 Creating the Location_In event

This event is used for ingesting user location data. Leave Associated Query empty on the

Event Attr page.

Step 6 Create the basic location event Location_Basic in this directory. The event contains the

MSISDN, LacTac, and CI attributes.

Figure 1-107 Creating the Location_Basic event

This event is used for ingesting user location data. Leave Associated Query empty on the

Event Attr page.

Step 7 Create the output template event Location_Out. The event contains the MSISDN, LacTac,

CI, and Addr attributes.

Figure 1-108 Creating the Location_Out event

This event is used to define the locating staying moment template. Leave Associated Query

empty on the Event Attr page.

Step 8 Select the three new events and click Online.

----End

Creating the Event Ingestion Process

Procedure

Step 1 Log in to the Universe as the user user.

1. Enter http://Floating IP address of the SLB:9010/console/login.action in the address

box of the browser and press Enter.

2. Enter the user name and password of the user user and click Login.

Step 2 Choose Data Governance > Realtime Awareness > Streaming Studio > Streaming

Processing Design.

Step 3 Create the ingestion process collect_location under the FLUME node on the left.

1. Select the basic operation item Directly Create.

Set the process name to collect_location.

3. Select a Flume node.

Select the IP address of the Flume node contains prepared data.

4. Click to show the tool bar and edit the process.

5. Drag the Spooling Directory Source diagram element from the tool bar to the process

editing area and double-click the diagram element to edit it, as shown in Figure 1-109.

Figure 1-109 Configuring the Spooling Directory Source diagram element

− Source Event: Set it to the basic event Location_In.

− Data Source Directory:Set it to /opt/huawei/universe/data/location.

The directory levels in each Flume vary. The /opt/huawei/universe/data/location directory is used as an example. Modify it based on the site requirements. The omm user must have the read, write, and execute permissions on the data source directory.

6. Drag the Field Projecting diagram element and connect it to the Spooling Directory

Source diagram element. Double-click the Field Projecting diagram element and

configure it.

Figure 1-110 shows the configuration.

Figure 1-110 Configuring the Field Projecting diagram element

The Field Projecting diagram element is used to filter fields in the source data and find

out fields required by services.

Select the MSISDN, LacTac, and CI fields in sequence. Other fields will be filtered out.

7. Drag the Memory Channel diagram element, double-click it, and set Node Name.

8. Drag the Kafka Sink diagram element, double-click it, and set Node Name.

Figure 1-111 shows the configuration.

Figure 1-111 Configuring the Field Projecting diagram element

Kafka Topic: Set it to the sdi_Location_Basic topic corresponding to the basic event.

9. Connect the diagram elements using connection lines, as shown in Figure 1-112.

Figure 1-112 Ingestion process

Step 4 Click Save to save the configuration.

Step 5 Click Release to release the process.

----End

Verification

Importing Test Data

Log in to the Flume node as the omm user and create the /opt/huawei/universe/data/location

directory.

> cd /opt/huawei/universe/data

> mkdir location

The omm user must have the read, write, and execute permissions on this directory.

Create a .text file in the /opt/huawei/universe/data/location directory.

> vi location.txt

1,13810031351,460002198001011234,1000,1001

The directory levels in each FusionInsight vary. The /opt/huawei/universe/data directory is used as

an example. Modify it based on the site requirements. In principle, the file storage directory must be the same as the data source directory configured in the ingestion flow and the omm user have the read, write, and execute permissions on this directory.

The delimiter in the .txt file must be the same as the input event delimiter.

The field sequence in the .txt file must be the same as the sequence of input event attributes.

Customer numbers must be from the prepared customer segment.

For the LacTac and CI fields, use data for which the mapping already exists.

Checking Data Ingestion

Log in to the Flume node as the omm user, go to the service data storage directory

/opt/huawei/universe/data/location, and check whether the file is suffixed

by .COMPLETED.

drwxr-xr-x 3 omm ficommon 4096 Nov 16 23:52 ./

drwxr-xr-x 4 omm ficommon 4096 Nov 17 00:53 ../

drwx------ 2 omm wheel 4096 Nov 16 19:23 .flumespool/

-rw------- 1 omm wheel 22 Nov 16 11:33 location.txt.COMPLETED

If yes, data ingestion is successful.

Checking Kafka Consumption

Use the Kafka client tool to check whether ingested data is received.

% kinit <user name>

4. Check whether the Kafka has received data.

sdi_Location_Basic --from-beginning

In the command, the IP address is that of the ZooKeeper.

13810031351,1000,1001

Typical Configuration Example: Projection and Grouping Calculation

Context

Figure 1-113 Display of projection and grouping calculation functions

Procedure

installation mode.

page is displayed.

Step 3 In the navigation bar on the upper part, choose Realtime Awareness > Streaming Studio >

Stream Computing Design.

The Realtime Application Management page is displayed, as shown in Figure 1-114.

Figure 1-114 Stream Computing Design page

Step 4 Click Add to access the rule orchestration page.

Complete the configuration based on the Figure 1-115 method.

Figure 1-115 Rule orchestration

The configurations of each module are as follows:

Kafka Source configuration

Figure 1-116 Kafka Source configuration

Table 1-53 Kafka Source parameter description

Topic Names test_input This parameter reads the source

data for calculation from the

test_input topic.

Data Encoding string This parameter has string and

bytearray types. Set this

parameter based on the actual

type of source data.

Data Separator , Set this parameter based on the

site requirements of source data.

Field Names MSISDN,IMSI,Terminal_ID,UP_F

LUX,DOWN_FLUX,SUM_FLUX,

APP_ID

Set this parameter based on the

Use commas (,) to separate

multiple values.

Field Types string,string,string,int,int,int,string Set this parameter based on the

Use commas (,) to separate

multiple values.

Projection Interceptor configuration

Figure 1-117 Projection Interceptor configuration

Table 1-54 Projection Interceptor parameter description

Output Event

Attribute:

MSISDN,IMSI,SUM_FLUX,AP

Select the field to be exported

and adjust the export sequence.

Grouping Interceptor configuration

Figure 1-118 Grouping Interceptor configuration

Table 1-55 Grouping Interceptor parameter description

Fields To Group

MSISDN Perform the grouping based on

the value of MSISDN.

Input Field of Sum SUM_FLUX Calculate the sum based on the

value of SUM_FLUX.

Kafka Sink configuration

Figure 1-119 Kafka Sink configuration

Table 1-56 Kafka Sink parameter description

Topic Names test_ouput Topic for storing the calculation

result data.

Output Data

Separator

, Specified output data delimiter.

Step 5 After the orchestration, click to release.

After the release, the Realtime Application Management page is displayed.

Step 6 Instantiate the generated Spark task rule.

Click in the Operation column next to a Spark rule name.

Set instantiated parameters. Customize the task name, use the default values for other

parameters, and click Confirm.

The Task Manager page is displayed, on which the instantiated task name can be viewed.

Step 7 Select the created task and click "Start".

----End

Verifying the Result

Step 1 Log in to the Hadoop Client.

Step 2 Initialize the environment variable.

Step 3 Initialize the ticket and log in to the client in secure mode.

% kinit <user name>

Step 4 Send the data in the data source topic of Kafka.

% kafka-console-producer.sh --broker-list 10.0.0.1:21005,10.0.0.2:21005,10.0.0.3:21005

--topic test_input

Inputs:

13800000001,13800000001,211,100,200,300,001

13800000002,13800000002,111,150,200,350,001

13800000002,13800000002,111,350,200,550,002

13800000001,13800000001,211,120,220,340,003

13800000001,13800000001,211,120,120,240,002

13800000003,13800000001,211,120,120,240,002

Step 5 Use the Kafka client tool to check whether result data is received.

% kafka-console-consumer.sh --zookeeper 10.0.0.4:24002/kafka --topic test_output

--from-beginning

The system displays the following information.

880,13800000001

900,13800000002

240,13800000003

Indicates that the calculation is successful.

The first column indicates the sum of SUM_FLUX fields in each group, and the second

column indicates that the grouping is performed based on the MSISDN field.

----End

customer contextual awareness -...

Documents

customer awareness of environmental-friendly products and

knowledge management cycle to increase customer awareness

towards contextual awareness and...

gac self-evaluation report template - pmi.org · web...

how does personality relate to contextual performance,...

customer awareness - c.ymcdn.comc.ymcdn.com/sites/ ·...

assessing customer awareness and selection criteria …

creating more contextual customer experiences with ... ·...

contextual awareness for the web

customer awareness about general insurance future generali ,

building customer awareness - nkba puget sound

shared awareness and agile mission groups · 2012. 10....

a new world of possibilities for contextual awareness with...

space situational awareness - space weather customer

the bank of one: providing contextual, … bank of one:...

ebook a score is only part of the story. you need...

customer awareness program by lankabangla finance limited

contextual inquiry in practice -...

customer security awareness - prodevmedia.com€¦ ·...

customer awareness towards