customer contextual awareness -...
Post on 31-Jul-2018
225 Views
Preview:
TRANSCRIPT
Customer Contextual Awareness
Customer Contextual Awareness
Service Overview
Analysis of Carriers' Real-Time Data
Customer Contextual Awareness
Service Positioning
Basic Terms
Customer Contextual Awareness
Functional View
Customer Contextual Awareness
Customer Contextual Awareness
Quick Start
Designing an Event on the GUI
Context
Terminal uses perform various operations through the mobile phone. In different scenarios,
carriers have different concerns. In this case, events can be designed to filter data to find out
users or user behavior that the carriers focus on.
Event attributes are user or user behavior information that the carriers focus on.
Some service events have been preconfigured in the CAE.
Customer Contextual Awareness
Preconfigured events must comply with the following specifications:
"Interface Specifications (PS Domain) for the Data Integration Server in the Log System of China Mobile V1.0.0" for events whose name contains the PS keyword, for example, SOURCE_PS_XXX. For details about preconfigured CAE events.
"China Mobile Unified DPI Device Technical Specifications - Interface Specifications for the LTE Signaling Collection and Parsing Server V2.0.9" for events whose name contains the LTE keyword, for example, SOURCE_LTE_XXX. For details about preconfigured CAE events.
"Interface Specifications (CS Domain) for the Data Integration Server in the Log System of China Mobile V1.0.0" for other preconfigured events. For details about preconfigured CAE events.
Procedure
Step 1 Log in to the CAE.
Enter http://IP address:Port/console. However, the login URL varies depending on the
installation mode.
Then, enter the login user name, password, and verification code. For details about the default
password, see "Password Change Views" in the Password Change.
The IP address is used for logging in to the Universe.
Step 2 Choose Data Governance in the navigation tree at the upper part. The Data Governance
page is displayed.
Step 3 Choose Realtime Awareness > Event Management Center > Event Design.
The Query page is displayed, as shown in Figure 1-1.
Figure 1-1 Query page
On this page, you can query existing events and their status (online or offline). You can search
for an event by Event code, Name, or Status.
After an event is brought online, you can:
Customer Contextual Awareness
Select the event on the Ingestion Design page.
Create topics associated with the event in the Kafka to store ingested events
Step 4 Create an event type.
Click in Category on the left and create an event type.
(Optional) You can create some extension attributes for a type of events. For details, see
Adding Extension Attributes.
Step 5 Create an event.
Click Add. The event design page is displayed, as shown in Figure 1-2.
Figure 1-2 Event design page
Step 6 Configure basic event information.
Table 1-1 describes the basic event information.
Customer Contextual Awareness
Table 1-1 Basic event parameters
Parameter Description Example
Event Code Unique ID of an event, which can contain
fewer than 50 characters.
The name only can be letters, numbers, and
underline (_), and cannot start with a
number.
1000001
Name Unique name of an event, which can contain
fewer than 200 characters.
The name cannot contain special
characters: !@#$%^&*()+=[]{}<>,./:;'\|~
NewsPotal
Description Event description, which is optional and can
contain a maximum of 500 characters.
The name cannot contain special
characters: !@#$%^&*()+=[]{}<>,./:;'\|~
News portal URL
marketing
Type Event type, which is used for service
management.
You can click in Category on the left
and add a category.
InternetEvent
Event Separator Event data delimiter. ,
Event Line
Separator
Line break in event data. \n
Event Character Set Event character set type. UTF-8
Event Partition
Number
Number of Kafka topic partitions associated
with an event.
The system will automatically calculate the
most suitable number of partitions in the
current environment, and the calculated
number is used as the default value.
Calculation rule: (Number of Kafka
Broker nodes * Value of log.dirs in
Kafka)/Number of event copies
2
Event Replication
Factor
Number of Kafka topic copies associated
with an event.
The system will automatically calculate the
most suitable number of copies in the
current environment, and the calculated
number is used as the default value.
If there is only one Kafka Broker node, the
number of copies is 1. If there are multiple
nodes, the number of copies is 2.
2
Customer Contextual Awareness
Parameter Description Example
Related topic: Using default topic. After an event is
brought online, the system automatically
creates a topic whose name is in
sdi_${Event code} format in the Kafka.
Use the default topic.
Step 7 Add an event attribute.
Click Add on the "Event Attr" page.
The Event Attr dialog box is displayed.
Step 8 Configure basic attribute information.
Table 1-2 describes the basic attribute information.
Table 1-2 Basic attribute parameters
Parameter Description Example
Attr Name Attribute name, which can contain fewer
than 50 characters.
The name only can be letters, numbers, and
underline (_), and cannot start with a
number.
URL
Attr Type Attribute type. Select a value from the
drop-down list box.
Character string
Partition Attribute Indicates whether to use the attribute as the
partition key. An event can have only one
partition key.
false
Remark Remarks. User access URL.
Association Query Optional
It is used during the configuration of
simple moment rules. For example, if this
parameter is set to Website query, all
values related to website query can be
queried in Right Value of a moment rule.
For example, the following associated
values of website query can be queried:
Baidu
Sina
Taobao
JD
Example: If Association Query for the
URL attribute of event_01 is set to
Website query and event_01 is used to
create moments, the preceding four values
can be associated in Right Value during
-
Customer Contextual Awareness
Parameter Description Example
the URL filtering rule configuration. For
details, see 6.3.1.1.1 Setting a Moment
Rule Design.
Attribute Format (Optional)
This parameter is displayed when Attr
Type is set to Date. Set this parameter to
the time format, for example, yyyyMMdd.
yyyy: year
MM: month
dd: day
HH: hour indicated using the 24-hour method.
hh: hour indicated using the 12-hour method.
mm: minute
ss: second
timestamp: timestamp
For example, if the format is set to yyyy-MM-dd HH:mm:ss, the time is 2017-05-25 13:01:01.
yyyyMMdd
After configuring the information, click Confirm to go to the event design page.
You can repeat this step to add other attributes.
You can click next to the added attribute to view the attribute information, click
to modify the attribute information, or click to delete the attribute.
Step 9 Click Confirm.
The Query page is displayed.
Step 10 Bring the event online.
Select the event to be brought online and click Online, as shown in Figure 1-43.
You can select an event, click to view the event information, click to
edit event information, or click to delete the event.
Figure 1-3 Bringing the event online
Customer Contextual Awareness
After an event is brought online, you can:
Select the event on the Ingestion Design page.
Create topics associated with the event in the Kafka to store ingested events
----End
Ingestion Process Design
This topic describes basic operations in the ingestion process, including creating a project,
configuring basic information, specifying an execution host, and editing the process.
Procedure
Step 1 Log in to the foreground.
Enter http://IP address:Port/console. However, the login URL varies depending on the
installation mode.
Then, enter the login user name, password, and verification code. For details about the default
password, see "Password Change Views" in the Password Change.
The IP address is used for logging in to the Universe.
Step 2 Choose Data Governance in the navigation tree at the upper part. The Data Governance
page is displayed.
Step 3 Choose Realtime Awareness > Streaming Studio > Streaming Processing Design from the
navigation tree at the upper part.
The ingestion process query page is displayed.
On this page, you can query existing ingestion processes and their status. You can search for a
process by name or status.
The ingestion process status is described as follows:
Published: indicates that the ingestion process has been released into a properties.properties file and deployed in the corresponding Flume node.
Draft: indicates that the ingestion process is restored and may be incomplete.
Step 4 Create a project.
Click in the navigation tree on the left to expand the project list.
Click at the upper part and create a project.
In the Create dialog box, enter a project name and a description and click Save.
Click the created project in the project list. The ingestion process query page of the project is
displayed.
To edit the newly created project, you can select the project and click at the upper part.
Customer Contextual Awareness
To delete the newly created project, you can select the project and click at the upper part.
Step 5 Click Add and add an ingestion process.
1. Select a basic operation item.
− Directly Create: directly creates an ingestion process.
− Created from Template: creates an ingestion process using a template.
This topic describes how to directly create an ingestion process. For details about how to create an ingestion process using a template, see Creating an Ingestion Process Using a Template.
2. Configure basic information.
Enter an ingestion process name and a description.
3. Select hosts.
Select one or more Flume nodes for executing the process.
4. Edit the process.
Drag corresponding diagram elements from the toolbar to the workspace. The diagram
elements include Source, Channel, Sink, (optional) Interceptor, (optional)Channel
Selector and (optional) Sink Group.
You can double-click a diagram element to edit it and connect diagram elements using
lines.
Figure 1-4 shows a complete ingestion process.
Figure 1-4 Example of a complete ingestion process
For details about diagram elements in the tool bar such as the Source, Sink, and
Channel, see the corresponding topics.
Customer Contextual Awareness
You can click Save to save the process as a draft, or click Publish to release the process.
The ingestion process status is described as follows:
Published: indicates that the ingestion process has been released into a properties.properties file and deployed in the corresponding Flume node.
Draft: indicates that the ingestion process is restored and may be incomplete.
----End
Reference
Multiple processes can be designed in an ingestion design, as shown in Figure 1-5 and Figure
1-6.
Figure 1-5 Process 1
Customer Contextual Awareness
Figure 1-6 Process 2
Service Application
Design Events
Before creating ingestion processes, you need to define concerned events. The ingestion,
filtering, and calculation operations can be performed on the events subsequently.
In Quick Start, the complete process of creating an event is provided. An event can be created
in the following ways.
Importing Events in Batches Through the Event Editing Tool
Context
You can use the event editing tool to import events in batches, simplifying the event design
procedure.
If the event editing tool is used, the event design process consists of event design and event
attribute design.
Customer Contextual Awareness
Procedure
Step 1 Log in to the CAE.
Enter http://IP address:Port/console. However, the login URL varies depending on the
installation mode.
Then, enter the login user name, password, and verification code. For details about the default
password, see "Password Change Views" in the Password Change.
The IP address is used for logging in to the Universe.
Step 2 Choose Data Governance in the navigation tree at the upper part. The Data Governance
page is displayed.
Step 3 Choose Realtime Awareness > Event Management Center > Event Design.
The Query page is displayed, as shown in Figure 1-7.
Figure 1-7 Event query page
On this page, you can query existing events and their status (online or offline). You can search
for an event by Event code, Name, or Status.
After an event is brought online, you can:
Select the event on the Ingestion Design page.
Create topics associated with the event in the Kafka to store ingested events
Step 4 Download the event editing tool.
On the Query page, click Download preset event tool to download the tool.
Step 5 Edit events.
1. Open the EventPreset.xlsm tool downloaded in Step 4.
2. Design events.
Edit events in the event editing tool. For details, see 1.2.2.3 Editing Events.
Step 6 Export events.
Customer Contextual Awareness
Generate the event data file.
On the Event sheet of EventPreset.xlsm, click Generate Data File to export edited
events from the tool.
Generate the event attribute data file.
On the EventAttr sheet of EventPreset.xlsm, click Generate Data File to export edited
event attributes from the tool.
The generated data files are stored in the same directory as EventPreset.xlsm by default.
Step 7 Import events.
Click next to Event import and select the .dat files.
Click Import to import.
Step 8 Bring the event online.
Select the event to be brought online and click Online.
You can select an event, click View to view the event information, clickEdit to edit event
information, or click Delete to delete the event.
Figure 1-8 Bringing the event online
After an event is brought online, you can:
Select the event on the Ingestion Design page.
Create topics associated with the event in the Kafka to store ingested events
----End
Editing Events
Users can design events based on parameters described in this topic. Events designed in this
way can be imported in batches.
Editing Events
On the Event tab page of the preconfigured event editing tool, set parameters based on the
editing template. Table 1-3 describes the parameters.
Customer Contextual Awareness
Table 1-3 Event parameters
Parameter Description Example
Mandatory
EventName Event name, which must be unique.
The name cannot contain special
characters: !@#$%^&*()+=[]{}<>,./:;'\|~
Internet access event
DataEncode Event encoding method. Currently, only the
CSV encoding method is supported.
CSV
DataCharset Event character set type. UTF-8
DataSeparator Event data delimiter. ,
DataLineFeed Event data line break. \n
Optional
EventCode Unique ID of an event, which can contain
less than 50 characters.
topic_1_event
Topic Associated topic. After an event is brought
online, a topic with the name specified by
this parameter will be automatically created
in the Kafka to store messages.
Either "EventCode" or "Topic" needs to be created. After one of the two parameters is set, the system automatically generates the value of the other based on internal rules.
topic_1
EventTypeName Event type, which is used for service
management.
UVA
TopicReplicationFa
ctor
Number of Kafka topic copies associated
with an event.
The system will automatically calculate the
most suitable number of copies in the current
environment, and the calculated number is
used as the default value.
If there is only one Kafka Broker node, the
number of copies is 1. If there are multiple
nodes, the number of copies is 2.
2
Customer Contextual Awareness
Parameter Description Example
TopicPartitionNum
ber
Number of Kafka topic partitions associated
with an event.
The system will automatically calculate the
most suitable number of partitions in the
current environment, and the calculated
number is used as the default value.
Calculation rule: (Number of Kafka Broker
nodes * Value of log.dirs in
Kafka)/Number of event copies
8
S
If EventCode is left empty, the system automatically generates the value based on the corresponding
topic. The generation rule is ${Topic}_event rule.
If Topic is left empty, the system automatically generates the value based on the value of EventCode. The generation rule is sdi_${EvnetCode}.
The value of EventName can contain Chinese characters, cannot contain special characters, and can contain less than 200 characters.
For events of the Universe Video Analytics, the recommended event type is UVA.
Editing Event Attributes
On the EventAttr tab page of the preconfigured event editing tool, set parameters based on
the editing template. Table 1-4 describes the parameters.
Table 1-4 Event attribute parameters
Parameter Description Example
EventCode Event code.
The event code exists on the Event tab page.
topic_1_event
AttrName Attribute name, which can contain less than
50 characters.
Ensure that each attribute name of an event is
unique.
StartTime
Index Attribute index. The index indicates the
location of an attribute. The index of each
attribute of an event must be unique.
1
AttrType Attribute type. String
IsPartitionKey Indicates whether to use the attribute as the
partition key. An event can have only one
partition key.
1: yes
0: no
0
Remark (Optional) Attribute remarks. Start Time
Customer Contextual Awareness
Parameter Description Example
AttrDynQueryId
(Optional)
It is used during the configuration of simple
moment rules in the Digital Marketing
system. Set it to the association ID, which
can be queried from the
t_cae_dyn_qry_config table in the
Campaign database.
For example, if this parameter is set to
Website query, all values related to website
query can be queried in Right Value of a
moment rule.
For example, the following associated values
of website query can be queried:
Baidu
Sina
Taobao
JD
Example: If Association Query for the URL
attribute of event_01 is set to Website query
and event_01 is used to create moments, the
preceding four values can be associated in
Right Value during the URL filtering rule
configuration. For details, see 6.3.1.1.1
Setting a Moment Rule Design.
-
IsExtendAttr Indicate whether the attribute is an additional
attribute. The value 1 indicates yes and the
value 0 indicates no.
0
ExtendAttrValue
(Optional)
Default value of the additional attribute. -
AttrFormat
(Optional)
If the attribute type is Date, set the parameter
to the time format. Example: yyyyMMdd.
yyyy: year
MM: month
dd: day
HH: hour indicated using the 24-hour method.
hh: hour indicated using the 12-hour method.
mm: minute
ss: second
timestamp: timestamp
For example, if the format is set to yyyy-MM-dd
HH:mm:ss, the time is 2017-05-25 13:01:01.
yyyyMMdd
Customer Contextual Awareness
Adding Extension Attributes
Context
A source event contains limited attributes while other attributes are required for marketing. In
this case, add extension attributes during event definition.
An extension attribute added to an event type is applicable to all events of this type.
Procedure
Step 1 Log in to the CAE.
Enter http://IP address:Port/console. However, the login URL varies depending on the
installation mode.
Then, enter the login user name, password, and verification code. For details about the default
password, see "Password Change Views" in the Password Change.
The IP address is used for logging in to the Universe.
Step 2 Choose Data Governance in the navigation tree at the upper part. The Data Governance
page is displayed.
Step 3 Choose Realtime Awareness > Event Management Center > Event Design.
The Query page is displayed, as shown in Figure 1-9.
Figure 1-9 Event query page
Step 4 Add an event type.
Click in Category on the left and create an event type.
(Optional) You can create some extension attributes for a type of events. Table 1-5 describes
the extension attribute parameters.
Customer Contextual Awareness
Table 1-5 Extension attribute parameters
Parameter Description Example
Name Attribute name, which can contain
less than 50 characters.
isSys
Code Unique ID of an event, which can
contain less than 50 characters.
The name cannot contain special
characters: !@#$%^&*()+=[]{}<
>,./:;'\|~
isSys
Required Indicates whether the extension
attribute is mandatory.
In the version, only the value Yes
is supported.
Yes
Type
Rule
The options are as follows:
text: Text type
select: Drop-down list box
type
radio: Option button type
checkbox: Check box type
datetime: Date control type
textarea: Text box type
For details, see Figure 1-10 and
Figure 1-11.
For details, see Figure 1-10.
Default Value Default value of the extension
attribute, which is optional.
1
Description Description of the extension
attribute, which is optional.
This attribute indicates whether
the attribute is synchronous, and
this attribute is required by the
Campaign.
Customer Contextual Awareness
Figure 1-10 Configuring extension attributes
Figure 1-11 Extension attribute configuration effect
Table 1-6 Extension attribute description
Parameter Description
isSuppression This attribute maps to the DND function in the Campaign system.
Customer Contextual Awareness
Parameter Description
custFlag This attribute maps to the Customer Model function in the
Campaign system.
isSync This attribute maps to the Synchronization and Asynchronization
function in the Campaign system.
permission(R, S) This attribute maps to the Marketing Permission function in the
Campaign system.
You can query values of this attribute from the data dictionary
CAE.EVENT.CAMPAIGN.AGREE.TYPE.
The permissionR and permissionS attributes are only used to demonstrate the display conditions of the radio and select styles. When the business process is configured, you must use the checkbox style.
datetime(1, 2, 3) This attribute has no business meaning, and specifies three display
methods of datetime.
----End
Preconfigured Events
Context
Some service events have been preconfigured in the CAE.
Preconfigured events must comply with the following specifications:
"Interface Specifications (PS Domain) for the Data Integration Server in the Log System
of China Mobile V1.0.0" for events whose name contains the PS keyword, for example,
SOURCE_PS_XXX. For details about preconfigured CAE events.
"China Mobile Unified DPI Device Technical Specifications - Interface Specifications
for the LTE Signaling Collection and Parsing Server V2.0.9" for events whose name
contains the LTE keyword, for example, SOURCE_LTE_XXX. For details about
preconfigured CAE events.
"Interface Specifications (CS Domain) for the Data Integration Server in the Log System
of China Mobile V1.0.0" for other preconfigured events. For details about preconfigured
CAE events.
Designing the Flume Event Ingestion Process
In Quick Start, the complete process of creating an ingestion process is provided. A process
can be created in the following ways.
Creating an Ingestion Process Using a Template
This topic describes how to create an ingestion process using a template.
Customer Contextual Awareness
Prerequisites
You have logged in to the foreground and designed events in Base Operation of the ingestion
process design.
Procedure
Step 1 Click Add to go to the page for adding an ingestion process.
1. Select a basic operation item.
Created from Template: creates an ingestion process using template.
2. Configure basic information.
Enter an ingestion process name and a description.
3. Select a host.
Select one or more Flume nodes for executing the process.
4. Select a template.
Select a template.
5. Edit the process.
Figure 1-12 shows the template process.
You can edit the ingestion process using a template.
Drag corresponding diagram elements from the toolbar to the workspace. The diagram
elements include Source, Channel, Sink, (optional) Interceptor, (optional)Channel
Selector and (optional) Sink Group.
Figure 1-12 Template process
For details about diagram elements in the tool bar such as the Source, Sink, and
Channel, see the corresponding topics.
Customer Contextual Awareness
You can click Save to save the process as a draft, or click Publish to release the process.
The ingestion process status is described as follows:
Published: indicates that the ingestion process has been released into a properties.properties file and deployed in the corresponding Flume node.
Draft: indicates that the ingestion process is restored and may be incomplete.
----End
Creating an Ingestion Process by Importing
Ingestion processes configured in the CAE can be imported and exported.
Prerequisites
The ingestion processes have been created in other CAE.
Procedure
Step 2 Log in to the CAE where the ingestion processes have been created.
Step 3 Export the ingestion processes.
Select the processes to be exported and click Export, as shown in Figure 1-13.
Figure 1-13 Exporting the ingestion processes
A .zip file is exported, and saved to the local host.
Currently, only a single process can be imported.
Step 4 Log in to the CAE.
Step 5 Import the ingestion processes.
Click the icon next to File path, select the exported .zip file, and click Import, as shown in
Figure 1-14.
Customer Contextual Awareness
Figure 1-14 Importing the ingestion processes
----End
Data Source Configuration
The data source can be accessed in multiple modes during ingestion process design.
Spooling Directory
The data source is accessed by reading local files of the Flume.
Prerequisites
You have logged in to the foreground and designed events in the process editing step during
the ingestion process design.
The ingestion directory of the Spooling Directory Source diagram element cannot be deleted during the running of the Flume. Otherwise, the ingestion can be restored only after the Flume is restarted or the Flume properties file is updated.
The ingestion directory of the Spooling Directory Source diagram element cannot contain files with the same name.
Procedure
Step 1 Click the Spooling Directory Source diagram element on the toolbar and click the blank area
in the canvas.
Customer Contextual Awareness
Figure 1-15 Spooling Directory Source diagram element
Step 2 Double-click the Spooling Directory Source diagram element in the canvas to edit it.
Figure 1-16 Page for editing the Spooling Directory Source diagram element
Table 1-7 describes the parameters of the Spooling Directory Source diagram element.
Table 1-7 Parameters of the Spooling Directory Source diagram element
Parameter Description
Node Name User-defined diagram element.
Customer Contextual Awareness
Parameter Description
Whether the
related event:
Related Event: associates events existing in Event Design or in
the CAE database to facilitate insertion of data interceptors, for
example, Field Filter and Field Encrypt.
The number of accessed source data fields must be the same as the
number of attributes of the associated events.
No Related Event: does not associate events and ingests the
accessed original data.
Source Event Associated event.
Events in Event Design can be displayed only after being brought online.
Data Source
Directory
Local directory for storing source data in the Flume.
The directory cannot be deleted during the running of the Flume. Otherwise, the ingestion can be restored only after the Flume is restarted or the Flume properties file is updated.
Senior Optional
Properties
You can configure advanced configuration items based on description
on the GUI or use default values for them.
If the stored file is a .gz file, select the Compressed File Source
parameter. A package can have only one file.
Add The Flume has a lot of parameters. If the parameter that a user
requires is not available on the GUI, the user can define it.
Ensure that the parameter exists in the matching Flume and is
correctly set.
----End
SDTP
The Flume server can receive data transferred from the service system using the SDTP
protocol.
Prerequisites
You have logged in to the foreground and designed events in the process editing step during
the ingestion process design.
Procedure
Step 1 Click the Sdtp Source diagram element on the toolbar and click the blank area in the canvas.
Customer Contextual Awareness
Figure 1-17 Sdtp Source diagram element
Step 2 Double-click the Sdtp Source diagram element in the canvas to edit it.
Figure 1-18 Page for editing the Sdtp Source diagram element
Table 1-8 describes the parameters of the Sdtp Source diagram element.
Table 1-8 Parameters of the Sdtp Source diagram element
Parameter Description
Node Name User-defined diagram element.
Customer Contextual Awareness
Parameter Description
Whether the related
event:
Related Event: associates events existing in Event Design or in
the CAE database to facilitate insertion of data interceptors, for
example, Field Filter and Field Encrypt.
The number of accessed source data fields must be the same as the
number of attributes of the associated events.
No Related Event: does not associate events and ingests the
accessed original data.
Source Event Associated event.
Events in Event Design can be displayed only after being brought online.
Server Port Server port of the SDTP socket, which is user-defined.
Ensure that the port number is not in use. The check command is as
follows:
netstat -an | grep port_number
If the command has output, the port number is in use.
SDTP Protocol
Type
SDTP protocol type. The options are as follows:
SDTP_DPILTE
SDTP_PS
SDTP_CS
SDTP_SMSP3
Major Version
Number
Primary version number. The parameter does not need to be
modified by default.
Minor Version
Number
Subversion number. The parameter does not need to be modified by
default.
User Name Authentication user name for the client to connect to the SDTP
service. The default value is recommended.
Password Authentication password, which is encrypted. For details about the
default password, see "Password Change Views" in the Password
Change.
You can run the $HOME/manager/bin/encrypt.sh password
command on the active node of the CAE Server to obtain the
encrypted password.
Senior Optional Properties
SDTP
Authorization
Indicates whether to authenticate the user name and password.
By default, the user name and password need to be authenticated.
The value N indicates that the user name and password do not need
to be verified.
sdtp_cdr Indicates whether to process data contains CDR tags, that is, the first
10 fields in the data are the CDR tags and the rest fields are data
content.
The data is processed by default. The value N indicates that the data
Customer Contextual Awareness
Parameter Description
does not need to be processed.
sdtp_eventseparator Row separator. By default, a record is not separated into multiple
rows.
Generally, a row contains one data record. You can use the row
separator to separate a record into multiple rows.
The parameter is set to a hexadecimal number and is processed as
the corresponding decimal number during data processing. For
example, if this parameter is set to 0A, 10 is used as the row
separator in the data.
----End
FTP/SFTP
The Flume server can receive data transferred from the service system using the SDTP
protocol.
Prerequisites
You have logged in to the GUI and designed events. Events are designed in Third step:
Editing Process of the ingestion design process.
Procedure
Step 1 Click FTP Source on the toolbar and click the blank area in the canvas.
Figure 1-19 FTP Source
Step 2 Double-click FTP Source in the canvas to edit it.
Customer Contextual Awareness
Figure 1-20 Page for editing the FTP Source diagram element
Table 1-9 describes the FTP Source parameters.
Table 1-9 Parameters of the FTP Source diagram element
Parameter Description
Node Name User-defined diagram element.
Whether the related event: Related Event: associates events existing in Event
Design or in the CAE database to facilitate insertion
of data interceptors, for example, Field Filter and
Field Encrypt.
The number of accessed source data fields must be the
same as the number of attributes of the associated events.
No Related Event: does not associate events and
ingests the accessed original data.
Source Event Associated event.
Events in Event Design can be displayed only after being brought online.
Protocol Type Protocol used for transmitting source events. The options
are as follows:
FTP
SFTP
Customer Contextual Awareness
Parameter Description
Ftp Data source identification Unique FTP server ID, which is used by the system to
internally identify an FTP server. For details, see 1.1.2.1
Configuring the FTP Host.
known_hosts File Path The filepath of known_hosts file in FTP server. The
known_hosts file is for SSH authentication.
Add The Flume has a lot of parameters. If the parameter that
a user requires is not available on the GUI, the user can
define it.
Ensure that the parameter exists in the matching Flume
and is correctly set.
----End
Avro
The Flume server can receive data transferred from the service system using the AVRO
protocol.
Prerequisites
You have logged in to the foreground and designed events in the process editing step during
the ingestion process design.
Context
The Avro source and Avro sink must be used together. Currently, the application scenario is
internal delivery in the ingestion process, as shown in Figure 1-21.
Figure 1-21 Common scenario of the Avro source and Avro sink
After data ingestion, a part of data is distributed to the HDFS sink. Another part of data is
distributed to the Avro sink. The Avro source receives and filters the data and then distributes
the filtered data to the Kafka sink.
In such case, the Avro source and sink are used for internal distribution.
Customer Contextual Awareness
Procedure
Step 1 Click the Avro Source diagram element on the toolbar and click the blank area in the canvas.
Figure 1-22 Avro Source diagram element
Step 2 Double-click the Avro Source diagram element in the canvas to edit it.
Figure 1-23 Page for editing the Avro Source diagram element
Table 1-10 describes the parameters of the Avro Source diagram element.
Customer Contextual Awareness
Table 1-10 Parameters of the Avro Source diagram element
Parameter Description
Node Name User-defined diagram element.
Whether the related
event:
Related Event: associates events existing in Event Design or in
the CAE database to facilitate insertion of data interceptors, for
example, Field Filter and Field Encrypt.
The number of accessed source data fields must be the same as the
number of attributes of the associated events.
No Related Event: does not associate events and ingests the
accessed original data.
Source Event Associated event.
Events in Event Design can be displayed only after being brought online.
AvroSource
Binded Port Server port of the Avro socket, which is user-defined.
Ensure that the port number is not in use. The check command is as
follows:
netstat -an | grep port_number
If the command has output, the port number is in use.
DecompressAvroSource (The system receives compressed CDR files reported from sites
or gateways through the Avro interface, decompresses and verifies the files, and report
retransmitted messages.)
Enable SSL Indicates whether to enable SSL. The value true indicates yes and
the value false indicates no.
IP of Server
Receiving
Retransmission
Notification
When the source event data is incorrect, the CAE instructs to
retransmit information to the server specified by the service. IP
address of the server that receives retransmission instructions.
Port of Server
Receiving
Retransmission
Notification
Port that receives retransmission instructions.
Quality Statistics
Upload Path in
HDFS
Quality Statistics Upload Path in HDFS.
Local Directory for
Storing Intermediate
Statistical Result
File
Local directory in the Flume for storing the intermediate statistical
result file.
Senior Optional Properties
Max. Working
Threads
Maximum number of threads used for receiving data from the client
or Avro sink.
Customer Contextual Awareness
Parameter Description
Decompression
Format for
Transferred-in Data
If DecompressAvroSource is selected, this parameter does not
need to be set. The .gz format is used by default.
If AvroSource is selected, the open-source capability of the Avro
Source is used. In this case, only the .zlib format is supported. To
receive data in .zlib format, set this parameter to deflate.
SSL Keystore Path Path of the SSL keystore file. This parameter is mandatory if SSL
is enabled.
KeyStore Password keystore password. This parameter is mandatory if SSL is enabled.
Keystore Type in
Use
keystore type.
Excluded SSL/TLS
Protocols
Exclusion list of the SSL/TLS protocol. Use space characters to
separate multiple values. The SSLv3 is always excluded. Therefore,
the default value is SSLv3.
Enable IP Filtering Indicates whether to enable IP filtering for the Netty. The value
true indicates yes and the value false indicates no.
Define IP Filtering
Rule
IP filtering rule of the Netty.
Interface Receiving
Retransmission
Notification
The default value is IF_ReUpload.
Retry Times upon
Retransmission
Failure
Number of times that a retransmission instruction can be sent. The
retransmission instruction is sent again when the last retransmission
instruction fails to be sent. The default value is 3.
Retransmission
Notification Sending
Interval (s)
Interval for sending retransmission instructions. Indicates a
retransmission instruction is sent again after a specified time period
when the last retransmission instruction fails to be sent. The default
value is 120.
Retransmission
Notification Timeout
Interval (s)
Timeout interval of retransmission instructions. The server
specified by a site may fail to receive retransmission information
due to network faults. Time interval after the last retransmission
instruction fails to be sent. The default value is 60.
Secure HDFS User User name for accessing the HDFS in secure mode.
If this parameter is not set, the insecure mode is used.
Keytab File
Directory of Secure
HDFS User
Path of the keytab file for accessing the secure HDFS.
If this parameter is not set, the insecure mode is used.
Statistics Period of
File Statistical Item
(minutes)
Statistics period of the following statistical items:
Number of files uploaded through the interface
Volume of data uploaded through the interface
Number of files that have been retransmitted through the
interface
Customer Contextual Awareness
Parameter Description
Number of files to be retransmitted through the interface
Number of records in the error CDR file
The default value is 1440, in minutes (that is, a day).
Statistics Period of
Record Statistical
Item (minutes)
Statistics period of the following statistical items:
Number of CDRs received through the interface
Total traffic of CDRs received through the interface
The default value is 10, in minutes.
Period for
Generating
Statistical Result to
HDFS (minutes)
The default value is 1440, in minutes (that is, a day).
----End
(Optional) Data Processor Configuration
The CAE provides built-in plug-ins in the Flume to implement multiple source data
processing capabilities.
Field Projecting
During data ingestion, the Flume can project fields to meet service requirements.
Customer Contextual Awareness
Function Description
Figure 1-24 Example of the field projecting function
Set the first column in the target event to the value of column 4 in the source event.
Set the second column in the target event to the value of column 3 in the source event.
Prerequisites
1. You have logged in to the foreground and designed events in the process editing step
during the ingestion process design.
2. You have configured the source and selected associated events in the source.
Procedure
Step 1 Click the Projection diagram element on the toolbar and click the blank area in the canvas.
Step 2 Connect the source to the Field Projecting diagram element using a line.
Customer Contextual Awareness
Figure 1-25 Field Projecting diagram element
Step 3 Double-click the Field Projecting diagram element in the canvas to edit it.
Customer Contextual Awareness
Figure 1-26 Page for editing the Field Projecting diagram element
Table 1-11 describes the parameters of the Field Projecting diagram element.
Table 1-11 Parameters of the Field Projecting diagram element
Parameter Description
Node Name User-defined diagram element.
Input Event Attribute Existing attributes of the source event.
Output Event Attribute: Output fields.
Select existing attributes and click to import them to
the Output Event Attribute list. The output event attribute
sequence is determined by the import sequence.
The icon is used to select all attributes.
The icon is used to deselect the selected attributes.
The icon is used to deselect all attributes.
In Figure 1-27, only the PhoneNum and Place attributes are
exported and they are exported in sequence.
Customer Contextual Awareness
Parameter Description
Remove Space
Characters Preceding to
or Following Field
Value
Choose true or false.
----End
Field Extraction
When ingesting data, the Flume can distribute the record to different storage systems based on
some field values. For example, if eventID is set to 001, the record is distributed to the Kafka
system and other records are distributed to the HDFS system.
Function Description
Figure 1-27 Common Field Extraction example
Customer Contextual Awareness
Figure 1-28 Level-2 Field Extraction example
The function of level-2 distribution is to save data to specified directories in the HDFS by
category.
For example, data is first saved to different directories based on the value of Type. (The
directory is named after the field value by default.)
If the value of Type is 001, data is secondarily distributed by the value of Place.
If the value of Type is 002, data is secondarily distributed by the value of PhoneNum.
Prerequisites
1. You have logged in to the foreground and designed events in the process editing step
during the ingestion process design.
2. You have configured the source and selected associated events in the source.
Customer Contextual Awareness
Procedure
Step 1 Click the Distribution diagram element on the toolbar and click the blank area in the canvas.
Step 2 Connect the source to the Field Extraction diagram element using a line.
Figure 1-29 Field Extraction diagram element
Step 3 Double-click the Field Extraction diagram element in the canvas to edit it.
Customer Contextual Awareness
Figure 1-30 Ordinary sorting example
Table 1-12 Parameters of the Ordinary sorting diagram element
Parameter Description
Node Name User-defined diagram element.
First sorting field Name of the field used for distribution.
First sorting field value Specifies the field value when sorting is performed.
Records whose sorting field value does not equal the
value of this parameter, the records are discarded.
Customer Contextual Awareness
Figure 1-31 Page for editing the Secondary sorting diagram element
Table 1-13 describes the parameters of the Secondary sorting diagram element.
Table 1-13 Parameters of the Secondary sorting diagram element
Parameter Description
Node Name User-defined diagram element.
Ordinary sorting
First sorting field Name of the field used for distribution.
First sorting field value Specifies the field value when sorting is performed.
Records whose sorting field value does not equal the
value of this parameter, the records are discarded.
Secondary sorting
Specify sorting value Level-1 distribution field value for secondary sorting.
Secondary sorting field Name of the field used for secondary sorting.
Secondary sorting field value Specifies the field value when secondary sorting is
performed. Records whose sorting field value does not
equal the value of this parameter, the records are
discarded.
Customer Contextual Awareness
Parameter Description
Senior Optional Properties secondaryFieldsDefault: directory for storing the
secondarily distributed data.
For example, if the secondary distribution is performed
when the value of the field for first distribution is 11 and
secondaryFieldsDefault is set to second, the storage
path is 11/second/. By default, data is saved to the
level-1 directory, that is, 11/.
Step 4 Use Field Extraction together with Channel Selector or HDFS Sink.
Use Field Extraction together with Channel Selector to complete the scenario shown in
Figure 1-32. For details, see 1.2.3.6.3 (Optional) Data Channel Selection Configuration.
Use Field Extraction together with HDFS Sink to complete the scenario shown in
Figure 1-33. The details are described later.
Step 5 Drag Memory Channel from the toolbar to the canvas and connect it to Field Extraction.
Step 6 Drag HDFS Sink from the toolbar to the canvas and connect it to Field Extraction.
Set HDFS Storage Path in the HDFS to /flume/test/LTE/%{first}/%{second}.
In the path, /flume/test/LTE/ indicates the path in the HDFS. Change it based on the site
requirements. For details, see 1.2.3.7.5 Using the Client to View and Create Files in the
HDFS.
In the path, %{first}/%{second} indicates that the level-1 distribution field value is used
as the level-1 file name and level-2 distribution field value is used as the secondary file
name.
----End
Field Backfill
You can use the Flume plug-in package provided by the CAE to collect data required for the
service system. The Flume adds fields to source events by searching and mapping the service
data cache file preset in the Flume.
Customer Contextual Awareness
Function Description
Figure 1-32 Field Backfill function example
The Flume adds fields from cache files to source events. In the preceding example, the
values of the Place field in the source event are updated according to cache table 1.
The Flume adds fields from cache files to source events. In the preceding example, the
PhoneNum field is added to the source event according to cache table 2.
Prerequisites
1. You have logged in to the foreground and designed events in the process editing step
during the ingestion process design.
2. You have configured the source and selected associated events in the source.
Procedure
Step 1 Click the Association and Backfill diagram element on the toolbar and click the blank area in
the canvas.
Step 2 Connect the source to the Backfill diagram element using a line.
Customer Contextual Awareness
Figure 1-33 Field Backfill diagram element
Step 3 Double-click the Backfill diagram element in the canvas to edit it.
Customer Contextual Awareness
Figure 1-34 Page for editing the Field Backfill diagram element
Table 1-14 describes the parameters of the Field Backfill diagram element.
Table 1-14 Parameters of the Field Backfill diagram element
Parameter Description
Node Name User-defined diagram element.
Dimension Tables Click Add.
Enter the storage path of the cache table in table1.
The table file must be stored in the Hadoop HDFS.
For details about how to view and create an HDFS directory on the
Hadoop client on the CAE server, see 1.2.3.7.5 Using the Client to
View and Create Files in the HDFS.
Multiple dimension tables can be added.
Backfill Rule Click Add and configure how to fill dimension table data back to
the source data.
Condition: Set this parameter in Backfill Condtion.
Table Name: Select a table in Dimension Tables. The default
value is tableN.
The Index of Table Field: Cache table column where values
Customer Contextual Awareness
Parameter Description
are filled back to data.
Target Field: field to which the value is filled back.
When a field is added, you can define the name of the field.
Conversion Type: conversion type. The options are update
(replacing the original field value) and append (adding a field).
In Figure 1-35, if Condition1 is met, the value of the second field in dimension table 1 is filled back to the Place field in the source data. If Condition2 is met, the value of the second field in dimension table 2 is added to the PhoneNum field in the source data.
Backfill Condtion Backfill condition. You can click Add and configure the condition
for triggering backfill.
Condition Name: condition name, which is user-defined.
Table Name: dimension table name. The default value is
tableN.
The Index of Table Field: Cache table column where the value
is used as the comparison value.
Target Value Type: target value type. The options are
Constant and Source Event Attribute.
Target Value: target value. Set this parameter to a constant or a
source event attribute.
In Figure 1-35, Condition1 indicates that the backfill operation is triggered when the value of the first field in dimension table 1 is the same as that of UserID in the source event.
Base Information Configures the source data delimiter.
----End
Field Standard
When ingesting data, the Flume can pre-process some data in the transferred source event to
standardize the data, allowing other components to obtain standard events.
Function Description
The following types of data standardization are supported:
For mobile number fields:
− Remove +86 or 0086 from the beginning of mobile numbers.
− Remove 0 from the beginning of mobile numbers that start with 0 and contain 12
digits.
− Remove space characters from the beginning and end of mobile numbers.
− Record abnormal data in the
/var/log/Bigdata/flume/flume/flumeExceptionData.log file.
Customer Contextual Awareness
For date fields:
Specify the date display format.
Standardization is not supported if time is represented using multiple fields.
Times zones, milliseconds, and nanoseconds are not considered.
The timestamp is supported.
Prerequisites
2. You have logged in to the foreground and designed events in the process editing step
during the ingestion process design.
3. You have configured the source and selected associated events in the source.
Procedure
Step 1 Click the Standardization diagram element on the toolbar and click the blank area in the
canvas.
Step 2 Connect the source to the Standardized diagram element using a line.
Figure 1-35 Field Standard diagram element
Step 3 Double-click the Field Standard diagram element in the canvas to edit it.
Customer Contextual Awareness
Figure 1-36 Page for editing the Field Standard diagram element
Table 1-15 describes the parameters of the Field Standard diagram element.
Table 1-15 Parameters of the Field Standard diagram element
Parameter Description
Node Name User-defined diagram element.
Date Field Standard Conversion time format.
Configure the input and output time formats based on
information displayed on the page.
yyyy: year
MM: month
dd: day
HH: hour indicated using the 24-hour method.
hh: hour indicated using the 12-hour method.
mm: minute
ss: second
timestamp: timestamp
For example, if the format is set to yyyy-MM-dd HH:mm:ss,
the time is 2017-05-25 13:01:01.
In the output type, replace indicates that the original field
content is overwritten and append indicates that a field whose
Customer Contextual Awareness
Parameter Description
name can be customized is added.
Phone Number
Standard
Source event field that is used as the phone number field.
Whether or not throw
out exception data
true indicates to delete, false indicates not to delete.
----End
Field Key
The structure of the source event ingested by the Flume consists of two parts: header and body.
The header contains tag information such as the timestamp and IP address of host that sends
messages. The body contains field names and values carried in events.
Function Description
When the CAE customizes the data collection, the CAEqi can change the header information,
add the Key field in the header, assign the value of the source event fields to Key, so that
events whose values of Key are the same can be written to the same partition of Kafka,
facilitating the follow-up data processing of Kafka consumers.
For example, if the mobile number in source events is configured as Key, the source events
whose mobile numbers are the same can be written to the same partition of Kafka.
Figure 1-37 Field Key example
Customer Contextual Awareness
Prerequisites
1. You have logged in to the foreground and designed events in the process editing step
during the ingestion process design.
2. You have configured the source and selected associated events in the source.
Procedure
Step 1 Click the Use Field as Key diagram element on the toolbar and click the blank area in the
canvas.
Step 2 Connect the source to the FieldKey diagram element using a line.
Figure 1-38 Field Key diagram element
Step 3 Double-click the Field Key diagram element in the canvas to edit it.
Customer Contextual Awareness
Figure 1-39 Page for editing the Field Key diagram element
Table 1-16 describes the parameters of the Field Key diagram element.
Table 1-16 Parameters of the Field Key diagram element
Parameter Description
Node Name User-defined diagram element.
Key Column Name in
Header
Field name in the data header. The default value is key.
The default partitioning algorithm in the Kafka performs partition based on the key in the header. If this parameter is not set to the value of the key, the Kafka cannot recognize the parameter and cannot perform partitioning based on this parameter.
Retain Original
headername
Indicates whether to overwrite fields configured in Key
Column Name in Header in the header.
The value true indicates yes and the value false indicates no.
Specify fields as key Field in the body whose value is stored in the specified field
in the header.
----End
Customer Contextual Awareness
Field Filter
When ingesting data, the Flume can filter the ingested data based on some field values or
based on the header value. The filter condition is an expression containing the number,
character string, and date fields.
Function Description
Figure 1-40 Field Filter example 1
In this example, data is filtered based on the Place field in the body of the source event.
Customer Contextual Awareness
Figure 1-41 Field Filter example 2
In this example, data is filtered based on the key value of Time in the header of the source
event.
Prerequisites
1. You have logged in to the foreground and designed events in the process editing step
during the ingestion process design.
2. You have configured the source and selected associated events in the source.
Procedure
Step 1 Click the Field Filter diagram element on the toolbar and click the blank area in the canvas.
Step 2 Connect the source to the Field Filter diagram element using a line.
Customer Contextual Awareness
Figure 1-42 Field Filter diagram element
Step 3 Double-click the Field Filter diagram element in the canvas to edit it.
Customer Contextual Awareness
Figure 1-43 Page for editing the Field Filter diagram element
Table 1-17 describes the parameters of the Field Filter diagram element.
Table 1-17 Parameters of the Field Filter diagram element
Parameter Description
Node Name User-defined diagram element.
Filter Type: Filter Header: filters data based on the header.
Filter Body: filters data based on the body.
Data Style: No_Key: normal data format, for example,
001,12441,Beijing,13900000001.
Key_Value: data format containing the attribute name, for
example,
userid:001,imsi:12441,place:Beijing,phonenum:139000000
01.
If the key-value pair is selected, you need to configure the
delimiter for separating the field name from the field value in
the basic configuration.
Filter Field: Existing fields on the page.
The running result of
the expression must be
Boolean
Filter condition expression.
Fields of the character type support the following expressions:
Fieldname.startsWith (String)
Customer Contextual Awareness
Parameter Description
Fieldname.endsWith (String)
Fieldname.isEmpty ()
Fieldname.length()>= int
Fieldname.in('1','2','3','4'...)
Example: PhoneNum.length() >=11
Fields of the number type support the following expressions:
Integer type: Fieldname.in(1,2,3,4...)
Long type: Fieldname.in(1l,2l,3l,4l...)
Double type: Fieldname.in(1.0,2.0,3.0,4.0...)
Float type: Fieldname.in(1.0f,2.0f,3.0f,4.0f...)
Example: UserID.in(001,002)
Fields of the date type support the following expressions:
sysdate()
Fieldname.addMonths(int)
Fieldname.addDays(int)
Fieldname.addHours(int)
Fieldname.addMinutes(int)
Fieldname.addSeconds(int)
Example: Time.addMonths(3) >= 6
Supported common expressions: ==, =, <=, +, -, &&, ||
Base Information Configures the source data delimiter.
----End
File Name
The structure of the source event ingested by the Flume consists of two parts: header and body.
The header contains tag information such as the timestamp and IP address of the host sending
the source event. The body contains fields and their values in the source event. This function
can be used to save tags required by users to the header.
Function Description
When customizing data ingestion using the CAE, you can change the source event header
information, add tags (Key in the header) to the header, and assign the name of the file storing
the source event to the tag. In this way, events with the same value of Key can be written to
the same filepath in the HDFS.
For example, if the source event file name is 20160301_Beijing_Micromarketing.txt, you
can save the file name, absolute file path, and information in the file name to the header, as
shown in Figure 1-44.
Customer Contextual Awareness
Figure 1-44 File Name example
Save the file path to the FilePath key.
Save the file name to the FileName key.
Save the first piece of information in the file name to the Time key.
Save the second piece of information in the file name to the Place key.
After the values of fields in the header are determined, events with the same field values can
be written into the same partition in the HDFS during data receiving. For example, set the
HDFS directory to /test/%{Place}/%{Time} for data storage.
Prerequisites
1. You have logged in to the foreground and designed events in the process editing step
during the ingestion process design.
2. You have configured the source and selected associated events in the source.
Procedure
Step 1 Click the File Name diagram element on the toolbar and click the blank area in the canvas.
Step 2 Connect the source to the File Name diagram element using a line.
Customer Contextual Awareness
Figure 1-45 File Name diagram element
Step 3 Double-click the File Name diagram element in the canvas to edit it.
Customer Contextual Awareness
Figure 1-46 Page for editing the File Name diagram element
Table 1-18 describes the parameters of the File Name diagram element.
Table 1-18 Parameters of the File Name diagram element
Parameter Description
Node Name User-defined diagram element.
Extraction Items Fields added to the header.
New Field Name: Enter a name.
The location field segmentation: Nth piece of
information in the file name. For example, if the file
name is 20160301_Beijing_Micromarketing.txt and
the parameter is set to 1, information "20160301" is
extracted.
You can click Add and add multiple extraction fields.
File Name Information
Delimiter
File name information delimiter.
For example, if the file name is
20160301_Beijing_Micromarketing.txt, the delimiter
is the underscore (_).
Customer Contextual Awareness
Parameter Description
The following characters cannot be used as separators: < > " '.
Key Storing File Name New field in the header for storing the complete file
name. The field name can be customized.
File Name Extension File name extension. This parameter is optional and is
used to differentiate the scenario where File Name
Information Delimiter is set to dot (.).
----End
Field Change
When ingesting data, the Flume can calculate the original value of a field and replace the
original value with the calculation result or add a field value output.
Function Description
Figure 1-47 Field Change example 1
Calculate the name:James field in the source file using the substring(2,4) function, change
the field name, convert the field into newName:me, and add the conversion result to the new
field.
Increase the value of the lac:12 field in the source file by 1, change the field name, convert
the field into newLac:13, and replace the original field with the new field.
Customer Contextual Awareness
Figure 1-48 Field Change example 2
Perform the lac+ci calculation on value 12 of the lac field in the source file and replace the
original field with the new field that is obtained.
Prerequisites
1. You have logged in to the foreground and designed events in the process editing step
during the ingestion process design.
2. You have configured the source and selected associated events in the source.
Procedure
Step 1 Click the Field Change diagram element on the toolbar and click the blank area in the canvas.
Step 2 Connect the source to the Field Change diagram element using a line.
Customer Contextual Awareness
Figure 1-49 Field Change diagram element
Step 3 Double-click the Field Change diagram element in the canvas to edit it.
Customer Contextual Awareness
Figure 1-50 Page for editing the Field Change diagram element
Table 1-19 describes the parameters of the Field Change diagram element.
Table 1-19 Parameters of the Field Change diagram element
Parameter Description
Node Name User-defined diagram element.
Data Style: No_Key: normal data format, for example, James,12,34.
Key_Value: data format containing the attribute name, for
example, name:James,lac:12,ci:34.
If the parameter is set to Key_Value, you need to configure the
following information:
Delimiter between the field name and field value.
Whether to delete the field name from the source data. The value
true indicates yes and the output data is James,12,34. The value
false indicates no and the output data is
name:James,lac:12,ci:34.
Customer Contextual Awareness
Parameter Description
Field Change Rule Field change rule. You can click Add and add a field change rule.
Rule Name: The default value is ruleN.
Expression: Specify a value in the Expression text box.
Convert Type: conversion type. The value replace indicates that
the original field value is replaced. If this value is used, you need
to set Convert Type Name. The value append indicates that a
field is added. If this value is used, you need to set Append Field
Name.
New Key: new key name. For example, if the parameter is set to
newName, the output data is newName:xxx.
This parameter is valid only when Data Style is set to
Key_Value.
Expression: The expression format is AttrName.Function.
After the configuration, click .
Base Information Configures the source data delimiter.
----End
Field Encrypt
The function can be used to collect and aggregate the CDRs of sites in real time and encrypt
sensitive fields for statistics and analysis of the downstream service system.
Function Description
The sensitive fields in the files have been encrypted using the SM4 or AES128 algorithm.
Prerequisites
2. You have logged in to the foreground and designed events in the process editing step
during the ingestion process design.
3. You have configured the source and selected associated events in the source.
Procedure
Step 1 Click the Encryption diagram element on the toolbar and click the blank area in the canvas.
Step 2 Connect the source to the Encrypt diagram element using a line.
Customer Contextual Awareness
Figure 1-51 Field Encrypt diagram element
Step 3 Double-click the Encrypt diagram element in the canvas to edit it.
Customer Contextual Awareness
Figure 1-52 Page for editing the Field Encrypt diagram element
Table 1-20 describes the parameters of the Field Encrypt diagram element.
Table 1-20 Parameters of the Field Encrypt diagram element
Parameter Description
Node Name User-defined diagram element.
Encrypt Mode Encryption algorithm.
Base Information
Select fields to be encrypted and click to
import them to the list on the right.
The icon is used to select all fields.
The icon is used to deselect the selected fields.
The icon is used to deselect all fields.
Data Delimiter Data Delimiter
Customer Contextual Awareness
Parameter Description
Time Field Location Event field name in the source data.
Time Field Format Time field format.
Key Obtaining Interval
(minutes)
Interval for obtaining the key after the key fails to be
obtained, in minutes.
Key Obtaining REST Interface
URL
URL of the REST interface for obtaining the key.
Parameters that need to be set when AES128 is used
Authentication User User name for authentication. For details, see the DG
documentation.
Authentication User Password Password for authentication. For details, see the DG
documentation.
Parameters that need to be set when SM4 is used
Key Validity Length Before
and After Current Time
Number of months before and after the current month.
Keys used during this period can be obtained.
For example, the validity period of the current key is
20151201-20161230 and the current time is March 2016.
If the parameter is set to 2, the system obtains keys used
in the period ranging from January 1 of 2016 to May 31
of 2016.
If the parameter is set to 10, the system obtains keys
used in the period ranging from December 1 of 2015 to
December 30 of 2016. (Since the total time duration
before and after the current time exceeds the maximum
validity period of the key, the validity period of the key
is used.)
----End
Data Channel Configuration
Memory
Prerequisites
You have logged in to the foreground and designed events in the process editing step during
the ingestion process design.
Customer Contextual Awareness
Procedure
Step 1 Click the Memory Channel diagram element on the toolbar and click the blank area in the
canvas.
Figure 1-53 Memory Channel diagram element
Step 2 Double-click the Memory Channel diagram element in the canvas to edit it.
Table 1-21 describes the parameters of the Memory Channel diagram element.
Table 1-21 Parameters of the Memory Channel diagram element
Parameter Description
Node Name User-defined diagram element.
Senior Optional Properties You can configure advanced configuration items based
on description on the GUI or use default values for them.
----End
Customer Contextual Awareness
File
Prerequisites
You have logged in to the GUI and designed events. Events are designed in Third step:
Editing Process of the ingestion design process.
Procedure
Step 1 Click File Channel on the toolbar and click the blank area in the canvas.
Figure 1-54 File Channel
Step 2 Double-click File Channel in the canvas to edit it.
Table 1-22 describes the File Channel parameters.
Table 1-22 Parameters of the File Channel diagram element
Parameter Description
Node Name User-defined diagram element.
Senior Optional Properties You can configure advanced configuration items based
on description on the GUI or use default values for them.
encrypt If it is set to Yes, set information including the key,
password, and password file.
----End
Customer Contextual Awareness
(Optional) Data Channel Selection Configuration
The Channel Selector diagram element determines channels into which a specific event
received by the source is written.
Function Description
Currently, no parser is provided for the Channel Selector diagram element. Follow
instructions in the properties.properties file to configure the Channel Selector diagram
element.
For details, visit https://flume.apache.org/FlumeUserGuide.html to see the Hadoop official
document.
In the example, the Channel Selector diagram element is used to implement the following
function:
Figure 1-55 Common Field Extraction example
When ingesting data, the Flume can distribute the record to different storage systems based on
some field values. For example, if eventID is set to 001, the record is distributed to the Kafka
system and other records are distributed to the HDFS system.
Prerequisites
None.
Procedure
Step 1 Drag diagram elements such as Figure 1-56 from the toolbar to the canvas and connect the
diagram elements.
Customer Contextual Awareness
Figure 1-56 Channel Selector configuration example
Step 2 Double-click each diagram element to edit it.
1. Configure the source.
Customer Contextual Awareness
Figure 1-57 Configuring the source
Table 1-23 Parameters of the Spooling Directory Source diagram element
Parameter Description
spooldir Local directory for storing source data in the Flume.
The directory cannot be deleted during the running of the Flume. Otherwise, the ingestion can be restored only after the Flume is restarted or the Flume properties file is updated.
If the stored file is a .gz file, select the
GzFileDeserializer parameter. A package can have only
one file.
2. Configure the Field Extraction diagram element.
Customer Contextual Awareness
Figure 1-58 Configuring the Field Extraction diagram element
Table 1-24 Parameters of the Secondary sorting diagram element
Parameter Description
Ordinary sorting
New Field Name New field name defined by a user for conversion.
Field Name Name of the field used for distribution.
Secondary sorting
New field value Level-1 distribution field value for performing secondary
distribution.
Secondary sorting field New field name defined by a user for conversion.
Field Name Name of the field used for secondary distribution.
secondaryFieldsDefault: directory for storing the
secondarily distributed data.
For example, if the secondary distribution is performed
when the value of the field for first distribution is 11 and
secondaryFieldsDefault is set to second, the storage
path is 11/second/. By default, data is saved to the
level-1 directory, that is, 11/.
3. Configure the Channel Selector diagram element.
Customer Contextual Awareness
Figure 1-59 Configuring the Channel Selector diagram element
Table 1-25 Parameters of the Channel Selector diagram element
Parameter Description
type Channel Selector mode. The value multiplexing indicates
multi-channel distribution.
header Field used for distribution in the header or body.
In the document, the value of first is the field alias configured
in Field Extraction.
Default channel In the example, the Channel_N1462272643037 channel is
used when the value is not 001 (default).
In the preceding information, Channel_N1462272643037 is value of Agent Name in the channel.
Mapping of header value and channel
Header Value In the example, the Channel_N1462272643030 channel is used
when the value is 001.
In the preceding information, Channel_N1462272643030 is value of Agent Name in the channel.
Channel Name
Customer Contextual Awareness
4. Configure the channel.
Figure 1-60 Configuring channel 1
Figure 1-61 Configuring channel 2
5. Configure the Kafka Sink diagram element.
Figure 1-62 Configuring the Kafka Sink diagram element
Customer Contextual Awareness
Table 1-26 Parameters of the Kafka Sink diagram element
Parameter Description
Kafka Broker IP address of the Kafka Broker:Service port of the Kafka Broker.
Use commas (,) to separate multiple values.
The port number is the same as the value of port in the /opt/huawei/Bigdata/etc/*_**_Broker/server.properties file. The default port number is 21005.
Kafka Topic Topic for storing the event. You can select a value from the
drop-down list box.
Partitioning Method
Default: indicates the default partition method, that is, the partition
is performed based on the key in the header.
ConsistencyHash: indicates the consistency hash.
Random: indicates random partition.
Events Processed in Each Batch
Copies to Authorize Before Event Writing Success
6. Configure the HDFS Sink diagram element.
Figure 1-63 Configuring the HDFS Sink diagram element
Table 1-27 Parameters of the HDFS Sink diagram element
Parameter Description
Customer Contextual Awareness
Parameter Description
HDFS Storage Path Storage path of events ingested by the Flume.
If the storage path is /tmp/flume_ide, the parent directory /tmp
of the path must be an existing HDFS directory. The /flume_ide
subdirectory can be defined by a user, and multiple levels of
subdirectories can be defined by a user. The CAE system will
automatically generate a user-defined subdirectory.
For details about how to view and create an HDFS directory on the Hadoop client, see .
Kerberos Principal
Kerberos File Path
When the kerberos authentication function is enabled in the
HDFS, the Kerberos Principal and Kerberos File Path
parameters must be selected and correctly configured. Generally,
the parameter settings are as follows:
: flume
:
/opt/huawei/Bigdata/FusionInsight-Flume-*.*.*/flume/conf/fl
ume.keytab
----End
Data Output Configuration (Sink)
This topic describes how to configure the mode for exporting ingested data, that is, configure
the sink.
Kafka
The Kafka is used to store events ingested by the Flume. You can specify the topic
corresponding to each event.
Prerequisites
You have logged in to the foreground and designed events in the process editing step during
the ingestion process design.
Procedure
Step 1 Click the Kafka Sink diagram element on the toolbar and click the blank area in the canvas.
Customer Contextual Awareness
Figure 1-64 Kafka Sink diagram element
Step 2 Double-click the Kafka Sink diagram element in the canvas to edit it.
Figure 1-65 Page for editing the Kafka Sink diagram element
Table 1-28 describes the parameters of the Kafka Sink diagram element.
Customer Contextual Awareness
Table 1-28 Parameters of the Kafka Sink diagram element
Parameter Description
Node Name User-defined diagram element.
Kafka Topic Topic for storing the event. You can select a value from the
drop-down list box.
Senior Optional
Properties
Partitioning Method
Default: indicates the default partition method, that is, the
partition is performed based on the key in the header.
ConsistencyHash: indicates the consistency hash.
Random: indicates random partition.
Events Processed in Each Batch
Copies to Authorize Before Event Writing Success
Add The Flume has a lot of parameters. If the parameter that a user
requires is not available on the GUI, the user can define it.
Ensure that the parameter exists in the matching Flume and is
correctly set.
----End
HDFS
The HDFS is used to store events ingested by the Flume. You can specify the storage path of
the ingested data.
Prerequisites
You have logged in to the foreground and designed events in the process editing step during
the ingestion process design.
Procedure
Step 1 Click the HDFS Sink diagram element on the toolbar and click the blank area in the canvas.
Customer Contextual Awareness
Figure 1-66 HDFS Sink diagram element
Step 2 Double-click the HDFS Sink diagram element in the canvas to edit it.
Figure 1-67 Page for editing the HDFS Sink diagram element
Table 1-29 describes the parameters of the HDFS Sink diagram element.
Customer Contextual Awareness
Table 1-29 Parameters of the HDFS Sink diagram element
Parameter Description
Node Name User-defined diagram element.
HDFS Storage Path Storage path of events ingested by the Flume.
If the storage path is /tmp/flume_ide, the parent directory
/tmp of the path must be an existing HDFS directory. The
/flume_ide subdirectory can be defined by a user, and multiple
levels of subdirectories can be defined by a user. The CAE
system will automatically generate a user-defined subdirectory.
For details about how to view and create an HDFS directory on the Hadoop client, see 1.2.3.7.5 Using the Client to View and Create Files in the HDFS.
Kerberos Principal
Kerberos File Path
When the kerberos authentication function is enabled in the
HDFS, the Kerberos Principal and Kerberos File Path
parameters must be selected and correctly configured.
Generally, the parameter settings are as follows:
: flume
:
/opt/huawei/Bigdata/FusionInsight-Flume-*.*.*/flume/conf/
flume.keytab
Senior Optional
Properties
You can configure advanced configuration items based on
description on the GUI or use default values for them.
Add The Flume has a lot of parameters. If the parameter that a user
requires is not available on the GUI, the user can define it.
Ensure that the parameter exists in the matching Flume and is
correctly set.
----End
Avro
The service system can use the Avro protocol to obtain data ingested by the Flume.
Prerequisites
You have logged in to the foreground and designed events in the process editing step during
the ingestion process design.
Context
The Avro source and Avro sink must be used together. Currently, the application scenario is
internal delivery in the ingestion process, as shown in Figure 1-68.
Customer Contextual Awareness
Figure 1-68 Common scenario of the Avro source and Avro sink
After data ingestion, a part of data is distributed to the HDFS sink. Another part of data is
distributed to the Avro sink. The Avro source receives and filters the data and then distributes
the filtered data to the Kafka sink.
In such case, the Avro source and sink are used for internal distribution.
Procedure
Step 1 Click the Avro Sink diagram element on the toolbar and click the blank area in the canvas.
Figure 1-69 Avro Sink diagram element
Step 2 Double-click the Avro Sink diagram element in the canvas to edit it.
Customer Contextual Awareness
Figure 1-70 Page for editing the Avro Sink diagram element
Table 1-30 describes the parameters of the Avro Sink diagram element.
Table 1-30 Parameters of the Avro Sink diagram element
Parameter Description
Node Name User-defined diagram element.
hostname Bound IP address or host name in Avro.
port Bound port number in Avro.
Senior Optional Properties You can configure advanced configuration items based
on description on the GUI or use default values for them.
Add The Flume has a lot of parameters. If the parameter that
a user requires is not available on the GUI, the user can
define it.
Ensure that the parameter exists in the matching Flume
and is correctly set.
----End
(Optional) Sink Group Configuration
The Sink Group is used for event output load balancing, ensuring high usability when a sink is
unavailable.
Customer Contextual Awareness
Context
Currently, no parser is provided for the Sink Group diagram element. Follow instructions in
the properties.properties file to configure the Sink Group diagram element.
Prerequisites
None.
Procedure
Step 1 Drag diagram elements such as Figure 1-71 from the toolbar to the canvas and connect the
diagram elements.
Figure 1-71 Sink Group configuration example
Step 2 Double-click the Sink Group diagram element in the canvas to edit it.
Customer Contextual Awareness
Figure 1-72 Configuring the Sink Group diagram element
In the example, the Load-Balancing Sink processor is used as the load balancer processing mode. For details about the configuration of other modes, visit https://flume.apache.org/FlumeUserGuide.html to see the Hadoop official document.
Table 1-31 Parameters of the Sink Group diagram element
Parameter Description
type Type of the selected sink. In this example, the value is
load-balance.
selector Distribution mode. The options are as follows:
round_robin: distributes data by the sink order, for
example, sink 1, sink 2, sink 3...
random: distributes data randomly.
backoff Indicates whether to add a sink to the blacklist when it is
faulty.
maxTimout Maximum timeout interval.
----End
Using the Client to View and Create Files in the HDFS
The CAE server provides the Hadoop client. Users can directly use the client to view and
create files in the HDFS.
Step 1 Log in to a Hadoop client.
Customer Contextual Awareness
Step 2 Edit the environment variable.
Enter the bash mode and initialize the environment variable.
% source bigdata_env
Initialize the ticket.
% kinit <user name>
Generally, the admin user is used for login. Enter the user password as prompted.
Step 3 Query the HDFS file directory.
% hdfs dfs -ls /
The following information is displayed:
Found 8 items
-rw-r--r-- 3 hdfs supergroup 0 2016-04-05 11:21
/PRE_CREATE_DIR.SUCCESS
drwxr-x--- - flume hadoop 0 2016-04-05 11:21 /flume
drwx------ - hbase hadoop 0 2016-04-05 11:21 /hbase
drwxrwxrwx - mapred hadoop 0 2016-04-05 11:21 /mr-history
drwxrwxrwx - spark supergroup 0 2016-04-07 16:00 /sparkJobHistory
drwxrwxrwx - hdfs hadoop 0 2016-04-05 11:43 /tmp
drwxrwxrwx - hdfs hadoop 0 2016-04-05 11:43 /user
drwxr-xr-x - ldapuserzh0405 supergroup 0 2016-04-07 16:19 /usr
Step 4 Create the /usr/test directory.
% hdfs dfs -mkdir /usr/test
Step 5 Query the /usr directory in the HDFS.
% hdfs dfs -ls /usr
The following information is displayed:
Found 4 items
drwxr-xr-x - ldapuser_ling0416 supergroup 0 2016-04-18 15:18 /usr/data1
drwxr-xr-x - ldapuserzh0405 supergroup 0 2016-04-08 15:37 /usr/streaming
drwxr-xr-x - ldapuser_tmy supergroup 0 2016-04-07 16:19 /usr/streaming
_tmy
drwxr-xr-x - ldapuser_ling0416 supergroup 0 2016-04-18 15:18 /usr/test
The command for deleting the directory is hadoop fs -rmr /usr/test.
----End
Designing Spark Streaming for Task Processing You can manage and orchestrate Spark tasks on the page. On the management page, you can
upload and instantiate Spark task rules, and start, stop, or delete Spark tasks. On the
orchestration page, you can graphically orchestrate Spark task rules.
Creating Task Rules by Orchestration
In the CAE system, the Spark task rule is called rule, and the instantiated task is called task.
Customer Contextual Awareness
Procedure
Step 1 Log in to the foreground.
Enter http://IP address:Port/console. However, the login URL varies depending on the
installation mode.
Then, enter the login user name, password, and verification code. For details about the default
password, see "Password Change Views" in the Password Change.
Step 2 Choose Data Governance in the navigation tree at the upper part. The Data Governance
page is displayed.
Step 3 In the navigation bar on the upper part, choose Realtime Awareness > Streaming Studio >
Stream Computing Design.
The page Stream Computing Design is displayed, as shown in Figure 1-73.
Figure 1-73 Stream Computing Design page
Step 4 Create a project.
Click on the upper part and create a project, which facilitates the classification of Spark
rules.
In the Create dialog box, enter the project name and description, and click Save.
Click the created project in the project list. The ingestion process query page of the project is
displayed.
To edit the created project, you can select the project and click at the upper corner.
To delete the created project, you can select the project and click at the upper corner.
Step 5 Click Add to access the rule orchestration page.
1. Set Rule Name and Description.
2. Edit the rule.
Drag the corresponding Source, Interceptor, and Sink diagram elements from the
toolbar.
Customer Contextual Awareness
Double-click diagram elements to edit them and connect diagram elements by lines.
Figure 1-74 shows a complete process.
Figure 1-74 Rule design example
For details about diagram elements on the toolbar such as Source, Interceptor, and Sink,
see the corresponding topics. Figure 1-75 describes the configuration.
Table 1-32 Stream processing rule configuration
Type Parameter
Source Kafka
Socket
Intercepter Projecting
Filter
Backfill
GroupBy
Accumulate
Sink HDFS Sink
Customer Contextual Awareness
Type Parameter
KafKa Sink
3. Click Publish to release the task rule.
Click Save to save the task rule. Click Return to return to the Realtime Application Management page.
After the release, the Realtime Application Management page is displayed, and a new
Spark rule is generated.
Step 6 Instantiate the generated Spark task rule.
Click instantiation in the Operation column next to a Spark rule name.
Set instantiated parameters.
Click Confirm to view the corresponding instantiated task name on the Task management
page.
----End
Creating Task Rules by Upload
Prerequisites
In the CAE system, the Spark task rule is called rule, and the instantiated task is called task.
JAR packages and XML files of Spark task rules have been developed. For details,
contact Huawei technical support to obtain the Customization Development Guide.
Compress the JAR packages and XML files.
Context
You can upload the JAR packages of Spark task rules on the CAE GUI. Then you can
instantiate the rule to generate an executable task.
Procedure
Step 1 Log in to the foreground.
Enter http://IP address:Port/console. However, the login URL varies depending on the
installation mode.
Then, enter the login user name, password, and verification code. For details about the default
password, see "Password Change Views" in the Password Change.
The IP address is used for logging in to the Universe.
Step 2 Choose Data Governance in the navigation tree at the upper part. The Data Governance
page is displayed.
Customer Contextual Awareness
Step 3 Choose Realtime Awareness > Application Management > Realtime Application
Management.
The Tenacies Manager page is displayed, as shown in Figure 1-75.
Figure 1-75 Tenacies Manager
Step 4 Click upload file and click .
Select the ZIP package of a Spark task rule.
Step 5 Click upload.
After the ZIP package is successfully uploaded, the corresponding rule is displayed.
Step 6 Instantiate the Spark task rule.
Click in the Operation column.
Set instantiated parameters.
Click Confirm. You can view the generated task in the Task management page.
----End
Configuring the Input Data Source (Source)
You can configure the data source (Kafka Source or Socket Source) to obtain data for Spark
calculation.
Kafka Source
You can specify the data in Kafka topic as the source data for calculation.
Customer Contextual Awareness
Prerequisites
You have logged in to the foreground and designed events on the stream processing design
page.
Procedure
Step 1 Choose Source > Kafka on the toolbar and click the blank area in the canvas.
Figure 1-76 Kafka diagram element
Step 2 Double-click the Kafka diagram element in the canvas to edit it.
Figure 1-77 Page for editing the Kafka Source diagram element
Customer Contextual Awareness
Table 1-33 describes the parameters.
Table 1-33 Kafka Source parameter description
Item Description
Node Name User-defined diagram element.
Broker List Kafka Broker server list, which is automatically obtained from the
system and does not need to be configured.
Topic Names Subscribed topics from which source data are obtained for calculation.
Use commas (,) to separate multiple topics.
Data Encoding Method of encoding data in topics. Only string and byte array are
currently supported.
Data Separator Source data delimiter. Set the parameter based on the site requirements.
Field Names Name of each field in source data. Use commas (,) to separate multiple
fields.
Field Types Type of each field in source data. Use commas (,) to separate multiple
types. Each type maps to a field name.
Only string, int, double, long, float, and boolean are supported, and
only lowercase letters are supported.
----End
Socket Source
You can obtain the source data for calculation from the source data server through the
interface.
Prerequisites
You have logged in to the foreground and designed events on the stream processing design
page.
Procedure
Step 1 Choose Source > Socket on the toolbar and click the blank area in the canvas.
Customer Contextual Awareness
Figure 1-78 Socket Source diagram element
Step 2 Double-click the Socket diagram element in the canvas to edit it.
Figure 1-79 Page for editing the Socket Source diagram element
Table 1-34 describes the parameters.
Table 1-34 Socket Source parameter description
Item Description
Node Name User-defined diagram element.
Server IP IP address of the Socket server.
Server Port Port number of the Socket server.
Customer Contextual Awareness
Item Description
Data Separator Source data delimiter. Set the parameter based on the site
requirements.
Field Names Name of each field in source data. Use commas (,) to
separate multiple fields.
Field Types Type of each field in source data. Use commas (,) to
separate multiple types. Each type maps to a field name.
Only string, int, double, long, float, and boolean are
supported, and only lowercase letters are supported.
----End
Configuring the Calculation Method (Interceptor)
The CAE has preconfigured an algorithm in the Spark to provide multi-functional source data
calculation capabilities.
Projection
You can extract meaningful fields from the source data by projection.
Prerequisites
You have logged in to the foreground and designed events on the stream processing design
page, and the Source has been configured.
Procedure
Step 1 Click the Projection diagram element on the toolbar and click the blank area in the canvas.
Step 2 Connect the Source with the Projection NE by lines.
Customer Contextual Awareness
Figure 1-80 Projection network element
Step 3 Double-click the Projection diagram element in the canvas to edit it.
Figure 1-81 Page for editing the projection
Table 1-35 describes the parameters.
Customer Contextual Awareness
Table 1-35 Projection parameter description
Item Description
Node Name User-defined diagram element.
Input Event Attribute Existing attributes of the source event.
Output Event Attribute: Output fields.
Select existing attributes and click to import them to
the Output Event Attribute list. The output event
attribute sequence is determined by the import sequence.
The icon is used to select all attributes.
The icon is used to deselect the selected attributes.
The icon is used to deselect all attributes.
----End
Filtering
You can filter source data based on values of some fields in the input data. You can define the
filtering expression.
Prerequisites
You have logged in to the foreground and designed events on the stream processing design
page, and the Source has been configured.
Procedure
Step 1 Click the Filtering diagram element on the toolbar and click the blank area in the canvas.
Step 2 Connect the Source with the Filtering NE by lines.
Customer Contextual Awareness
Figure 1-82 Filtering diagram element
Step 3 Double-click the Filtering diagram element in the canvas to edit it.
Figure 1-83 Page for editing the filtering
Table 1-36 describes the parameters.
Customer Contextual Awareness
Table 1-36 Filtering parameter description
Item Description
Node Name User-defined diagram element.
The running result of the
expression must be Boolean.
Filtering condition expression.
Fields of the character type support the following
expressions:
Fieldname.startsWith (String)
Fieldname.endsWith (String)
Fieldname.isEmpty ()
Fieldname.length()>= int
Fieldname.in('1','2','3','4'...)
Example: PhoneNum.length() >=11
Fields of the number type support the following
expressions:
Integer type: Fieldname.in(1,2,3,4...)
Long type: Fieldname.in(1l,2l,3l,4l...)
Double type: Fieldname.in(1.0,2.0,3.0,4.0...)
Float type: Fieldname.in(1.0f,2.0f,3.0f,4.0f...)
Example: UserID.in(001,002)
This version does not support the date-type field
expression.
The JEXL3 expressions are commonly supported, such
as ==, =, <=, >=, !=, +, -, &&, and ||.
----End
Association and Backfill
You can add or update fields in the source data through the dimension table (cache table)
provided by the business.
Customer Contextual Awareness
Function Description
Figure 1-84 Example of the association and backfill function
The dimension table (cache table) contains event update fields. In the preceding figure,
the value of the Place field in the source event is updated according to the cache table 1.
The dimension table (cache table) contains event added fields. In the preceding figure,
the PhoneNum field is added to the source event according to the cache table 2.
Prerequisites
You have logged in to the foreground and designed events on the stream processing design
page, and the Source has been configured.
Procedure
Step 1 Click the Association and Backfill diagram element on the toolbar and click the blank area in
the canvas.
Step 2 Connect the Source with the Association and Backfill NE by lines.
Customer Contextual Awareness
Figure 1-85 Association and backfill diagram element
Step 3 Double-click the Association and Backfill diagram element in the canvas to edit it.
Figure 1-86 Page for editing the association and backfill
Customer Contextual Awareness
Table 1-37 describes the parameters.
Table 1-37 Association and backfill parameter description
Item Description
Node Name User-defined diagram element.
Table Name Click Add.
Configure the table name, path for storing the table, and data delimiter in the
table.
The table file must be stored in the Hadoop HDFS in advance and the
storage path is a path in the HDFS.
For details about how to view and create an HDFS directory on the Hadoop client on the CAE server, see 1.2.3.7.5 Using the Client to View and Create Files in the HDFS.
You can add multiple dimension tables.
Backfill
Rule
Click Add and configure how to backfill the data in the dimension table to
the source data.
Condition: Set this parameter in Backfill condition.
Table Name: Set this parameter to a Dimension Table name. The
default value is tableN.
Backfill Value SN in Dimension Table: cache table column where
values are filled back to data. (The value starts from 1, not 0.)
Target Field: field to which a value is backfilled.
If the value is backfilled to an added field, the name of the field can be defined by the user.
Conversion Type: Update the value of the original field or add a field.
In Figure 1-87, if "condition1" is met, the value of the second field in dimension table 1 is updated to the Place field in the source data. If "Condition2" is met, the value of the second field in dimension table 2 is added to the PhoneNum field in the source data.
Backfill
Condition
Click Add and configure the condition for triggering backfill.
Condition Name: condition name, which is defined by a user.
Table Name: dimension table name. The default value is tableN.
The Index of Table Field: dimension table column where values are
used for comparison.
Target Field: Select a source data field.
In Figure 1-87, Condition 1 indicates that the backfill is triggered if the value of the first field in dimension table 1 is equal to the value of the Place field.
Base
Information
Configures the source data delimiter.
----End
Customer Contextual Awareness
Grouping
You can group the data based on the value of a field. After grouping, you can calculate the
data in the group by functions and export the calculation result to the Sink.
Function Description
Figure 1-87 Grouping function display
After grouping by UserType, calculate the sum of Score in the group and the maximum value
of Age.
The naming rule of the output result field is ${FunctionName}_${FieldName}, for
example, sum_Score,max_Age.
The last column in the output result is key field column, for example, UserType.
Customer Contextual Awareness
Prerequisites
You have logged in to the foreground and designed events on the stream processing design
page, and the Source has been configured.
Procedure
Step 1 Click the GroupBy diagram element on the toolbar and click the blank area in the canvas.
Step 2 Connect the Source with the GroupBy NE with lines.
Figure 1-88 Field distribution diagram element
Step 3 Double-click the GroupBy diagram element in the canvas to edit it.
Customer Contextual Awareness
Figure 1-89 Grouping example
Table 1-38 Grouping parameter description
Item Description
Node Name User-defined diagram element.
Fields To Group On Name of the field based on which groups are configured.
You can select multiple fields here. Events that map both
grouping fields will be classified into the same group.
For example, if the first and second fields are grouping
fields, the following two events will be classified into the
same group:
A,B,22,44,66,88
A,B,11,33,55,77
Available Groupby Functions Functions that are selected to calculate the data in the
group after grouping. You can select multiple functions.
Input Field of XX Field that is selected to calculate the data in the group by
functions after grouping.
Input Field Type of XX This parameter does not need to be set. The system will
automatically extract the configuration information from
the Source.
For the max, min, avg, and sum functions, the input parameter must be of the value type. For the count function, the input parameter can be of any type.
----End
Customer Contextual Awareness
Accumulation
See Typical Configuration Case: Marketing Event Upon Traffic Usage Saturation.
Configuring the Data Output Type (Sink)
You can configure the output method of calculation result data.
Exporting Data to HDFS
You can use the HDFS to store the processing result and specify the path for storing the
processing result.
Prerequisites
You have logged in to the foreground and designed events on the stream processing design
page, and the Source and Interceptor have been configured.
Procedure
Step 1 Choose Sink > HDFS on the toolbar and click the blank area in the canvas.
Figure 1-90 HDFS Sink diagram element
Customer Contextual Awareness
Step 2 Double-click the HDFS diagram element in the canvas to edit it.
Figure 1-91 Page for editing the HDFS Sink diagram element
Table 1-39 describes the parameters.
Table 1-39 HDFS Sink parameter description
Item Description
Node Name User-defined diagram element.
HDFS Storage Path Specified path for storing the result data. The default path is
/tmp/spark_ide.
The parent directory /tmp of the path must be the existing
directory in the HDFS system. The subdirectory /spark_ide can
be defined by the user, and the user-defined subdirectory can
have multiple levels. The CAE system will automatically
generate a user-defined subdirectory.
For details about how to view and create an HDFS directory on the Hadoop client, see 1.2.3.7.5 Using the Client to View and Create Files in the HDFS.
Senior Optional
Properties
You can configure advanced configuration items based on
description on the GUI or use default values for them.
----End
Exporting Data to Kafka
You can use Kafka to store the processing result and specify the topic for storing the
processing result.
Prerequisites
You have logged in to the foreground and designed events on the stream processing design
page, and the Source and Interceptor have been configured.
Customer Contextual Awareness
Procedure
Step 1 Choose Sink > Kafka Sink on the toolbar and click the blank area in the canvas.
Figure 1-92 Kafka Sink diagram element
Step 2 Double-click the Kafka Sink diagram element in the canvas to edit it.
Customer Contextual Awareness
Figure 1-93 Page for editing the Kafka Sink diagram element
Table 1-40 describes the parameters.
Table 1-40 Kafka Sink parameter description
Item Description
Node Name User-defined diagram element.
Topic Names Specified topic name for storing the output data.
Output Data Separator Output data delimiter.
Senior Optional Properties
Event Batch Size The Kafka uses an asynchronous processing distribution
mechanism. This option indicates the number of records
processed in each batch.
Data Serializer Serialization method of output data, the default value is
kafka.serializer.StringEncoder.
Output Fields You can adjust the sequence of output fields and set
fields in sequence based on site requirements.
----End
Task Manager
Prerequisites
In the CAE, the Spark task rule is called rule, and the instantiated rule is called task.
Rules have been instantiated and corresponding tasks have been generated.
Customer Contextual Awareness
Context
You can manage tasks on the CAE GUI, including starting, stopping, deleting, and querying
tasks.
Procedure
Step 1 Log in to the foreground.
Enter http://IP address:Port/console. However, the login URL varies depending on the
installation mode.
Then, enter the login user name, password, and verification code. For details about the default
password, see "Password Change Views" in the Password Change.
The IP address is used for logging in to the Universe.
Step 2 Choose Data Governance in the navigation tree at the upper part. The Data Governance
page is displayed.
Step 3 Choose .
The Realtime Application Management page is displayed, as shown in Figure 1-94.
Figure 1-94 Realtime Application Management
Step 4 Click Task Manager on the left.
The Task Manager page is displayed, as shown in Figure 1-95.
Figure 1-95 Task Manager
Step 5 Query tasks.
Use rule name for fuzzy query.
Use task name for conditional query.
Use status for conditional query.
1. Set the name or status of the task to be queried.
2. Click Query.
Customer Contextual Awareness
For the query by rule name, all tasks generated for the rule will be found.
You can configure search criteria to query tasks, and tasks meeting these criteria will be displayed, as shown in Figure 1-96.
Figure 1-96 Search criteria
Step 6 View task information.
1. Select the task to be viewed.
2. Click see. The tab page for displaying detailed task information is displayed.
Step 7 Change task status
Select a task and change the its status.
You can click start to start the task, click stop to stop the task, and click Delete to delete the
task.
A stopped task can be restarted.
----End
Preconfigured Task Rules
The CAE preconfigures six Spark task rules that can be used for corresponding service
calculation.
Preconfigured rule overview
Table 1-41 Preconfigured rule overview
Rule Function Example
1.2.4.7.1
NetcatWordCount
Calculates the number of
words in the input character
strings within 10 minutes and
the number of the occurrence
times of each word.
The Spark reads character
strings from a specified
Socket Server, calculates the
number of words in the
character strings, and records
Input data:
Hello World Hello
Output data:
Hello 2
World 1
Customer Contextual Awareness
Rule Function Example
the calculation result to a
specified directory in the
HDFS
(/spark/NetcatWordCount-x
xxx.output, where xxxx
indicates the timestamp).
KafkaWordCount Calculates the number of
words in the input character
strings within 10 minutes and
the number of the occurrence
times of each word.
The Spark reads character
strings from specified Kafka
topics, calculates the number
of words in the character
strings, and records the
calculation result to Kafka
topics.
MicroMarketing Performs Spark SQL
statements on input data and
sends the result to the Kafka
or Oracle database in Tianjin
micromarketing scenario.
Input data:
Xiaoming|15|80
Zhangsan|16|92
Lisi|14|85
The Spark SQL statement is
SELECT MAX(age) as name
from table1, which finds the
largest age from the input data.
Output data:
16
RoamingAwareness Calculates the number of users
who roam to a specified city
and are of the specified
roaming type and writes the
IMSIs of these users to the
Kafka.
Input data:
1|460023700411005|354096053181
3501|8615879005652|10.211.76.1
90|6432|255|6299|221.177.147.2
30|221.177.147.230|221.177.151
.163|221.177.151.169|2|cmnet|1
03|1460441839|1460441839905|14
60441871060|1|110|0|43426|117.
169.71.158|80|460|0|666|6995|1
0|9|0|0|0|0|0|0|6|200|3|1|1|60
0|mmocgame.qpic.cn|mmocgame.qp
ic.cn/wechatgame/mEMdfrX5RU1ib
Nvae0bPXE6eyejGjTo1wicricDldmQ
2iazRDV56uOc0B9L2QAudt0v0/0|mm
ocgame.qpic.cn|Dalvik/1.6.0
(Linux; U; Android 4.0.4;
GT-S7562i
Build/IMM76I)|image/png|||6323
|0|0|0|||2322300259|||China
Mobile|China|791|0791||Shenzhe
Customer Contextual Awareness
Rule Function Example
n|2||||Samsung|S7562I|Mobile
Specify the roaming city to
Shenzhen and roaming type to 2.
Output data:
460023700411005
LocationRemainAw
areness
Finds the users who stay in the
target area for a period longer
than the specified period and
writes the IMSIs and location
codes of these users to the
Kafka.
Input data:
1|460023700411007|354096053181
3501|8615879005652|10.211.76.1
90|200|0755|6299|221.177.147.2
30|221.177.147.230|221.177.151
.163|221.177.151.169|2|cmnet|1
03|1460441839|1460441839905|14
60441871060|1|110|0|43426|117.
169.71.158|80|460|0|15001301|2
5001301|10|9|0|0|0|0|0|0|6|200
|3|1|1|600|mmocgame.qpic.cn|mm
ocgame.qpic.cn/wechatgame/mEMd
frX5RU1ibNvae0bPXE6eyejGjTo1wi
cricDldmQ2iazRDV56uOc0B9L2QAud
t0v0/0|mmocgame.qpic.cn|Dalvik
/1.6.0 (Linux; U; Android 4.0.4;
GT-S7562i
Build/IMM76I)|image/png|||6323
|0|0|0|||2322300259||||China
Mobile||China|791|0791||Shenzh
en|2||||Samsung|S7562I|Mobile
In the preceding information, data in bold indicates the LAC and RAC of users. A LAC and an RAC of a user identify the location of the user. If the input data within the period indicates that a user's location remains unchanged, the user meets the requirements.
Output data:
460023700411007,200,0755
The output data contains the IMSI and location code of the user who meets the requirements.
AppTrafficAwarene
ss
Finds out users whose traffic
usage generated for using a
specified app reaches the
specified threshold in a day or
month and writes the IMSI,
traffic usage, and the date
when the traffic usage reaches
the threshold to the Kafka.
Input data:
1|460023700411005|354096053181
3501|8615879005652|10.211.76.1
90|200|0755|6299|221.177.147.2
30|221.177.147.230|221.177.151
.163|221.177.151.169|2|cmnet|1
03|1460441839|1460441839905|14
60441871060|1|110|0|43426|117.
169.71.158|80|460|0|15001301|2
Customer Contextual Awareness
Rule Function Example
5001301|10|9|0|0|0|0|0|0|6|200
|3|1|1|600|mmocgame.qpic.cn|mm
ocgame.qpic.cn/wechatgame/mEMd
frX5RU1ibNvae0bPXE6eyejGjTo1wi
cricDldmQ2iazRDV56uOc0B9L2QAud
t0v0/0|mmocgame.qpic.cn|Dalvik
/1.6.0 (Linux; U; Android 4.0.4;
GT-S7562i
Build/IMM76I)|image/png|||6323
|0|0|0|||2322300259||||China
Mobile||China|791|0791||Shenzh
en|2||||Samsung|S7562I|Mobile
In the preceding information, data in bold indicates the upstream and downstream traffic, and the sum of them indicates the traffic usage generated when using the app.
Output data:
460023700411005,4.0002602E7,20
16-04-12
In the preceding information, "4.0002602E7" indicates the traffic usage generated for using the app, where "E7" indicates the seventh power of 10, and "2016-04-12" indicates the date when the traffic usage reaches the threshold. The unit of traffic usage is byte.
NetcatWordCount
This task calculates the number of words in the input character strings within 10 minutes and
the number of the occurrence times of each word. The Spark reads character strings from a
specified Socket Server, calculates the number of words in the character strings, and records
the calculation result to a specified directory in the HDFS
(/spark/NetcatWordCount-xxxx.output, where xxxx indicates the timestamp).
Input Parameter Description
Table 1-42 Input parameter description
Parameter Description Example
Application Properties
source.port Communication port of the Socket data source. 9999
Customer Contextual Awareness
Parameter Description Example
source.ip IP address of the Socket data source. The default
value is 127.0.0.1. Change it based on the site
requirements.
10.0.0.1
Senior Optional Properties
driver-memory Memory allocated to the driver, in GB. The value
must be an integer. Increase the value if the
service logic is complex.
1
driver-cores Number of CPU cores allocated to the driver. The
value must be an integer.
1
executor-memory Memory allocated to one executor, in GB. The
value must be an integer. Increase the value of
this parameter if the service logic is complex.
1
executor-cores Number of CPUs allocated to one executor. The
value must be an integer. Increase the value if the
service logic is complex.
2
num-executors Number of executors allocated to the current
Spark task. The value must be an integer. The
value of this parameter determines Spark
calculation concurrency. Change the value based
on the calculation requirements.
1
Verification
Step 1 Choose Realtime Awareness > Application Management > Realtime Application
Management.
The Realtime Application Management page is displayed.
Step 2 Click next to a preconfigured rule.
On the instantiation page, set parameters according to examples in Table 1-43.
Step 3 Send data on the Socket server.
Hello World Hello
Step 4 Use a client to log in to the HDFS and check output data in
/spark/NetcatWordCount-xxxx.output (where xxxx indicates the timestamp).
Hello 2
World 1
For details about how to view files in the HDFS, see 1.2.3.7.5 Using the Client to View and Create Files in the HDFS.
----End
Customer Contextual Awareness
KafkaWordCount
This task calculates the number of words in the input character strings within 10 minutes and
the number of the occurrence times of each word. The Spark reads character strings from
specified Kafka topics, calculates the number of words in the character strings, and records
the calculation result to logs.
Input Parameter Description
Table 1-43 Input parameter description
Parameter Description Example
Key
Application Properties
outputTopic Topic to which results are generated. If
multiple topics exist, use commas (,) to
separate them.
KafkaWordCount_ou
tput
inputTopic Input topic. If multiple topics exist, use
commas (,) to separate them.
KafkaWordCount_in
put
Senior Optional Properties
driver-memory Memory allocated to the driver, in GB.
The value must be an integer. Increase the
value if the service logic is complex.
1
driver-cores Number of CPU cores allocated to the
driver. The value must be an integer.
1
executor-memory Memory allocated to one executor, in GB.
The value must be an integer. Increase the
value of this parameter if the service logic
is complex.
1
executor-cores Number of CPUs allocated to one
executor. The value must be an integer.
Increase the value if the service logic is
complex.
2
num-executors Number of executors allocated to the
current Spark task. The value must be an
integer. The value of this parameter
determines Spark calculation concurrency.
Change the value based on the calculation
requirements.
1
Verification
Step 1 Choose Realtime Awareness > Application Management > Realtime Application
Management.
Customer Contextual Awareness
The Realtime Application Management page is displayed.
Step 2 Click next to a preconfigured rule.
On the instantiation page, set parameters according to examples in Table 1-44.
Step 3 Send data in the Kafka topic specified by Table 1-44.
1. Log in to the Hadoop Client.
2. Initialize the environment variable.
% source bigdata_env
3. Initialize the ticket and log in to the client in secure mode.
% kinit <user name>
Generally, the admin user is used for login. Enter the user password as prompted.
4. Send data in the source topic in the Kafka.
% kafka-console-producer.sh --broker-list
10.0.0.1:21005,10.0.0.2:21005,10.0.0.3:21005 --topic KafkaWordCount_input
Input data: Hello World Hello
5. Use the Kafka client tool to check whether result data is received.
% kafka-console-consumer.sh --zookeeper 10.0.0.4:24002/kafka --topic
KafkaWordCount_output --from-beginning
The following information is displayed:
Hello 2
World 1
The information indicates that the Kafka has received the data.
----End
MicroMarketing
Performs Spark SQL statements on input data and sends the result to the Kafka or Oracle
database in Tianjin micromarketing scenario.
Input Parameter Description
Table 1-44 Input parameter description
Parameter Description Example
Key
Application Properties
outputTopic Topic to which results are
generated. If multiple topics exist,
use commas (,) to separate them.
MicroMarketing_output
inputTopic Input topic. If multiple topics
exist, use commas (,) to separate
them.
["MicroMarketing_input"]
Customer Contextual Awareness
Parameter Description Example
windowSize Time window size of the Spark
Streaming, in seconds.
4
slideInterval Time window slide interval of the
Spark Streaming, in seconds.
1
sparkColumns Field description in the input data. [{"index":1,"columnType":"String
","columnName":"name"},{"inde
x":2,"columnType":"Long","colu
mnName":"age"},{"index":3,"col
umnType":"Float","columnName"
:"score"}]
fieldSeperator Source data delimiter. |
outputDestinatio
n
System for storing output data,
Kafka or Oracle database.
kafka
kafka.consumer.
group
Specifies the consumer group of
the Kafka.
test_group
sparkSql Spark SQL statement that
specifies the calculation logic.
SELECT MAX(age) as name
from table1
Senior Optional Properties
driver-memory Memory allocated to the driver, in
GB. The value must be an integer.
Increase the value if the service
logic is complex.
1
driver-cores Number of CPU cores allocated to
the driver. The value must be an
integer.
1
executor-memor
y
Memory allocated to one
executor, in GB. The value must
be an integer. Increase the value
of this parameter if the service
logic is complex.
1
executor-cores Number of CPUs allocated to one
executor. The value must be an
integer. Increase the value if the
service logic is complex.
2
num-executors Number of executors allocated to
the current Spark task. The value
must be an integer. The value of
this parameter determines Spark
calculation concurrency. Change
the value based on the calculation
requirements.
1
Customer Contextual Awareness
Verification
The following example finds the maximum age from the input data.
Step 1 Choose Realtime Awareness > Application Management > Realtime Application
Management.
The Realtime Application Management page is displayed.
Step 2 Click next to a preconfigured rule.
On the instantiation page, set parameters according to examples in Table 1-45.
Step 3 Send data in the Kafka topic specified by inputTopics.
1. Log in to the Hadoop Client.
2. Initialize the environment variable.
% source bigdata_env
3. Initialize the ticket and log in to the client in secure mode.
% kinit <user name>
4. Send data in the source topic in the Kafka.
% kafka-console-producer.sh --broker-list
10.0.0.1:21005,10.0.0.2:21005,10.0.0.3:21005 --topic MicroMarketing_input
Input data:
Xiaoming|15|80
Zhangsan|16|92
Lisi|14|85
5. Use the Kafka client tool to check whether result data is received.
% kafka-console-consumer.sh --zookeeper 10.0.0.4:24002/kafka --topic
MicroMarketing_output --from-beginning
The following information is displayed:
16
The information indicates that the Kafka has received the data.
----End
RoamingAwareness
Calculates the number of users who roam to a specified city and are of the specified roaming
type and writes the IMSIs of these users to the Kafka.
Input Parameter Description
Table 1-45 Input parameter description
Parameter Description Example
Key
Application Properties
Customer Contextual Awareness
Parameter Description Example
outputTopic Topic to which results are
generated. If multiple topics exist,
use commas (,) to separate them.
RoamingAwareness_output
inputTopic Input topic. If multiple topics
exist, use commas (,) to separate
them.
RoamingAwareness_input
imsi.position.in.
event
Location of the IMSI field in the
input event.
The value starts from 0. For
example, if this parameter is set to
1, the IMSI field is the second
field in the input event.
1
roaming.type.po
sition.in.event
Location of the roaming type field
in the input event.
64
roaming.in.city.
position.in.event
Location of the roaming city field
in the input event.
63
roam.to.city City to which users roam. Philadelphia
roam.type Roaming type, which is an integer. 2
spark.checkpoin
t.path
Checkpoint path of the Spark. checkpoint_test
spark.batch.inter
val
Batch processing interval, in
milliseconds.
1
event.field.separ
ator
Field delimiter in source data. |
consumerGroup Specifies the consumer group of
the Kafka.
test_group
Senior Optional Properties
driver-memory Memory allocated to the driver, in
GB. The value must be an integer.
Increase the value if the service
logic is complex.
1
driver-cores Number of CPU cores allocated to
the driver. The value must be an
integer.
1
executor-memor
y
Memory allocated to one executor,
in GB. The value must be an
integer. Increase the value of this
parameter if the service logic is
complex.
1
executor-cores Number of CPUs allocated to one
executor. The value must be an
integer. Increase the value if the
2
Customer Contextual Awareness
Parameter Description Example
service logic is complex.
kafka.consumer.
group
Specifies the consumer group of
the Kafka.
test_group
Verification
The following example finds out users who roam to Philadelphia and are of roaming type 2 and records the IMSIs of the users.
Step 1 Choose Realtime Awareness > Application Management > Realtime Application
Management.
The Realtime Application Management page is displayed.
Step 2 Click next to a preconfigured rule.
On the instantiation page, set parameters according to examples in Table 1-46.
Step 3 Send data in the source topic in the Kafka.
1. Log in to the Hadoop Client.
2. Initialize the environment variable.
% source bigdata_env
3. Initialize the ticket and log in to the client in secure mode.
% kinit <user name>
Generally, the admin user is used for login. Enter the user password as prompted.
4. Send data in the source topic in the Kafka.
% kafka-console-producer.sh --broker-list
10.0.0.1:21005,10.0.0.2:21005,10.0.0.3:21005 --topic RoamingAwareness_input
Input data:
1|460023700411005|3540960531813501|8615879005652|10.211.76.190|6432|255|6299|22
1.177.147.230|221.177.147.230|221.177.151.163|221.177.151.169|2|cmnet|103|14604
41839|1460441839905|1460441871060|1|110|0|43426|117.169.71.158|80|460|0|666|699
5|10|9|0|0|0|0|0|0|6|200|3|1|1|600|mmocgame.qpic.cn|mmocgame.qpic.cn/wechatgame
/mEMdfrX5RU1ibNvae0bPXE6eyejGjTo1wicricDldmQ2iazRDV56uOc0B9L2QAudt0v0/0|mmocgam
e.qpic.cn|Dalvik/1.6.0 (Linux; U; Android 4.0.4; GT-S7562i
Build/IMM76I)|image/png|||6323|0|0|0|||2322300259|||China
Mobile|China|791|0791||Shenzhen|2||||Samsung|S7562I|Mobile phone
Table 1-46 describes fields in the preceding information.
Customer Contextual Awareness
Table 1-46 PS_HTTP_Event event
No. Attribute Type Length Description
1 Interface unsigned byte 1 Interface type. The options
are as follows:
1: Gn
2: reserve
3: IuPS
4: Gb
2 IMSI string 15 User IMSI (in TBCD
encoding format).
3 IMEI string 16 User IMEI (in TBCD
encoding format).
4 MSISDN string 24 User number.
5 USER_IP unsigendInt 16 IP address of a user. If the
user uses an IPv6 address,
set this field to the IPv6
address. An IPv6 address
contains 128 bits.
If the user uses an IPv4
address (consisting of 32
bits), convert the IPv4
address to an IPv6
address, where the first 10
bytes are all zeros, the
middle two bytes are all f
(hexadecimal).
Both the IPv4 address and
IPv6 address must be set
in binary mode.
6 LAC int 4 LAC.
The LAC is used for location management in the CS (voice service).
7 RAC int 4 RAC.
The RAC is used for location management in the PS (data service).
8 CID int 4 CI (SAC, ECI).
9 SGSN_C_IP unsigendInt 4 IP address in the SGSN
signaling plane.
10 SGSN_U_IP unsigendInt 4 IP address in the SGSN
user plane.
Customer Contextual Awareness
No. Attribute Type Length Description
11 GGSN_C_IP unsigendInt 4 IP address in the GGSN
signaling plane.
12 GGSN_U_IP unsigendInt 4 IP address in the GGSN
user plane.
13 RAT string 32 0-None
1-UTRAN
2-GERAN
3-WLAN
4-GAN
5-HSPA Evolution
6-EUTRAN
14 APN string 32 -
15 HTTP Service
xDR Type Code
unsigned byte 1 All Fs.
16 procedure ID unsigned byte 8 All Fs.
17 Start Time (ms) dateTime 8 1970/1/1 0:00
18 End Time (ms) dateTime 8 1970/1/1 0:00
19 App Category unsigned byte 2 All Fs.
20 App
Subcategory
unsigned byte 2 All Fs.
21 L4 Protocol unsigned byte 1 All Fs.
22 User Port unsigned byte 2 0
23 Server IP unsigned byte 16 0
24 Server Port unsigned byte 2 0
25 Country Code int 4 -1
26 Network ID int 4 -1
27 Upstream Traffic unsigned byte 4 0
28 Downstream
Traffic
unsigned byte 4 0
29 Upstream IP
Packet Count
unsigned byte 4 0
30 Downstream IP
Packet Count
unsigned byte 4 0
31 Disordered
Upstream TCP
Packet Count
unsigned byte 4 0
Customer Contextual Awareness
No. Attribute Type Length Description
32 Disordered TCP
Downstream
Packet Count
unsigned byte 4 0
33 Retransmitted
Upstream TCP
Packet Count
unsigned byte 4 0
34 Retransmitted
Downstream
TCP Packet
Count
unsigned byte 4 0
35 UL_IP_FRAG_P
ACKETS
unsigned byte 4 0
36 DL_IP_FRAG_P
ACKETS
unsigned byte 4 0
37 Transaction
Type
unsigned byte 2 All Fs.
38 Transaction
Response Code
unsigned byte 2 All Fs.
39 HTTP Version unsigned byte 1 All Fs.
40 First HTTP
Response Delay
(ms)
unsigned byte 4 0
41 Last HTTP
Content Packet
Delay (ms)
unsigned byte 4 0
42 Last ACK
Confirmation
Packet Delay
(ms)
unsigned byte 4 0
43 HOST String 128 All Fs.
44 URL String 256 Request URL.
45 X-Online-Host String 128 All Fs.
46 User-Agent char 64 All Fs.
47 HTTP_content_t
ype
char 64 All Fs.
48 refer_URI char 128 All Fs.
49 Cookie char - All Fs.
50 Content-Length unsigned byte 4 0
51 Target Behavior unsigned byte 1 All Fs.
Customer Contextual Awareness
No. Attribute Type Length Description
52 WTP
Interruption
Type
unsigned byte 1 -
53 WTP
Interruption
Reason
unsigned byte 1 -
54 title String 256 This field is the title field
in an HTTP packet.
55 keyword String 256 This field is the keyword
field in an HTTP packet.
56 ChargeID unsignedInt 2 Charging information.
57 Cell Type String 16 Cell type (defined in
thirteen scenarios on
network optimization
platform).
58 Coverage Area String 16 Countryside, county, and
city.
59 Carrier String 16 Carrier to which the user
belongs.
60 Country String 16 Home country of a user.
61 Home Province String 16 Province to which a user
belongs.
62 Home City String 16 City to which a user
belongs.
63 Roaming
Province
String 16 Province to which a user
roams.
64 Roaming City String 16 City to which a user
roams.
65 Roaming Type String 16 User roaming type. The
options are as follows:
1-International
roaming
2-Inter-province
roaming
3-Intra-province
roaming
4-local
66 SGSN Name String 16 SGSN
67 GGSN Name String 16 GGSN
68 BSC/RNC Name String 16 BSC/RNC
Customer Contextual Awareness
No. Attribute Type Length Description
69 Terminal
Manufacturer
String 16 Manufacturer of a user's
terminal.
70 Terminal Model String 16 Model of a user terminal.
71 Terminal Type String 16 Type of a terminal, for
example, 2G mobile
phone, 3G mobile phone,
2G WAN card, 3G WAN
card, and 3G notebook.
This Spark task requires only the second, sixty-fourth, and sixty-fifth fields (IMSI, Roaming City, and Roaming Type respectively).
5. Use the Kafka client tool to check whether result data is received.
% kafka-console-consumer.sh --zookeeper 10.0.0.4:24002/kafka --topic
RoamingAwareness_output --from-beginning
The following information is displayed:
460023700411005
The information indicates that the Kafka has received the data.
----End
LocationRemainAwareness
Finds the users who stay in the target area for a period longer than the specified period and
writes the IMSIs and location codes of these users to the Kafka.
Input Parameter Description
Table 1-47 Input parameter description
Parameter Description Example
Application Properties
outputTopic Topic to which results are generated. If
multiple topics exist, use commas (,) to
separate them.
LocationRemainAwareness
_output
inputTopic Input topic. If multiple topics exist, use
commas (,) to separate them.
LocationRemainAwareness
_input
imsi.position.in.
event
Location of the IMSI field in the input
event.
The value starts from 0. For example, if
this parameter is set to 1, the IMSI field
is the second field in the input event.
1
Customer Contextual Awareness
Parameter Description Example
rac.position.in.e
vent
Location of the RAC field in the input
event.
6
lac.position.in.e
vent
Location of the LAC field in the input
event.
5
monitor.remain.
time
Stay duration, in minutes. 30
monitor.location Location code (province, city). (200,0755)
spark.checkpoin
t.path
Checkpoint path of the Spark. checkpoint_test
spark.batch.inter
val
Batch processing interval, in
milliseconds.
1
event.field.separ
ator
Field delimiter in source data. |
kafka.consumer.
group
Specifies the consumer group of the
Kafka.
test_group
Senior Optional Properties
driver-memory Memory allocated to the driver, in GB.
The value must be an integer. Increase
the value if the service logic is complex.
1
driver-cores Number of CPU cores allocated to the
driver. The value must be an integer.
1
executor-memor
y
Memory allocated to one executor, in
GB. The value must be an integer.
Increase the value of this parameter if
the service logic is complex.
1
executor-cores Number of CPUs allocated to one
executor. The value must be an integer.
Increase the value if the service logic is
complex.
2
num-executors Number of executors allocated to the
current Spark task. The value must be an
integer. The value of this parameter
determines Spark calculation
concurrency. Change the value based on
the calculation requirements.
1
Verification
The following example calculates the number of users who stays in the (200,0755) area for more 30 minutes and writes the result to the Kafka.
Customer Contextual Awareness
Step 1 Choose Realtime Awareness > Application Management > Realtime Application
Management.
The Realtime Application Management page is displayed.
Step 2 Click next to a preconfigured rule.
On the instantiation page, set parameters according to examples in Table 1-48.
Step 3 Send data in the source topic in the Kafka.
1. Log in to the Hadoop Client.
2. Initialize the environment variable.
% source bigdata_env
3. Initialize the ticket and log in to the client in secure mode.
% kinit <user name>
Generally, the admin user is used for login. Enter the user password as prompted.
4. Send data in the source topic in the Kafka.
% kafka-console-producer.sh --broker-list
10.0.0.1:21005,10.0.0.2:21005,10.0.0.3:21005 --topic LocationRemainAwareness_input
Send the following input data for 30 consecutive minutes:
1|460023700411007|3540960531813501|8615879005652|10.211.76.190|200|0755|6299|22
1.177.147.230|221.177.147.230|221.177.151.163|221.177.151.169|2|cmnet|103|14604
41839|1460441839905|1460441871060|1|110|0|43426|117.169.71.158|80|460|0|1500130
1|25001301|10|9|0|0|0|0|0|0|6|200|3|1|1|600|mmocgame.qpic.cn|mmocgame.qpic.cn/w
echatgame/mEMdfrX5RU1ibNvae0bPXE6eyejGjTo1wicricDldmQ2iazRDV56uOc0B9L2QAudt0v0/
0|mmocgame.qpic.cn|Dalvik/1.6.0 (Linux; U; Android 4.0.4; GT-S7562i
Build/IMM76I)|image/png|||6323|0|0|0|||2322300259|||China
Mobile|China|791|0791||Shenzhen|2||||Samsung|S7562I|Mobile phone
This Spark task requires only the second, sixth, and seventh fields (IMSI, LAC, and RAC respectively).
5. Use the Kafka client tool to check whether result data is received.
% kafka-console-consumer.sh --zookeeper 10.0.0.4:24002/kafka --topic
LocationRemainAwareness_output --from-beginning
The following information is displayed:
460023700411007,200,0755
The information indicates that the Kafka has received the data.
----End
AppTrafficAwareness
Finds out users whose traffic usage generated for using a specified app reaches the specified
threshold in a day or month and writes the IMSI, traffic usage, and the date when the traffic
usage reaches the threshold to the Kafka.
Customer Contextual Awareness
Input Parameter Description
Table 1-48 Input parameter description
Parameter Description Example
Application Properties
outputTopic Topic to which the calculation result
is recorded, which must be the same
as that set in Key.
AppTrafficAwareness_outpu
t
inputTopic Input topic. If multiple topics exist,
use commas (,) to separate them.
AppTrafficAwareness_input
imsi.position.in.ev
ent
Location of the IMSI field in the
input event.
The value starts from 0. For example,
if this parameter is set to 1, the IMSI
field is the second field in the input
event.
1
app.id.position.in.
event
Location of the APP_ID field in the
input event.
19
date.position.in.ev
ent
Location of the DATE field in the
input event.
16
down.flow.positio
n.in.event
Location of the DOWN_FLOW field
in the input event.
27
up.flow.position.i
n.event
Location of the UP_FLOW field in
the input event.
26
appID ID of the app to be monitored. 110
checkpointPath Checkpoint path of the Spark. checkpoint_test
app.id.to.monitor ID of the app to be monitored. 110
statistic.period Statistics period, day or month. day
flow.threshold Traffic usage threshold, in MB. 30
spark.checkpoint.p
ath
Checkpoint path of the Spark. checkpoint_test
spark.batch.interv
al
Batch processing interval, in
milliseconds.
1
event.field.separat
or
Field delimiter in source data. |
kafka.consumer.gr
oup
Specifies the consumer group of the
Kafka.
test_group
Senior Optional Properties
driver-memory Memory allocated to the driver, in
GB. The value must be an integer.
1
Customer Contextual Awareness
Parameter Description Example
Increase the value if the service logic
is complex.
driver-cores Number of CPU cores allocated to the
driver. The value must be an integer.
1
executor-memory Memory allocated to one executor, in
GB. The value must be an integer.
Increase the value of this parameter if
the service logic is complex.
1
executor-cores Number of CPUs allocated to one
executor. The value must be an
integer. Increase the value if the
service logic is complex.
2
num-executors Number of executors allocated to the
current Spark task. The value must be
an integer. The value of this
parameter determines Spark
calculation concurrency. Change the
value based on the calculation
requirements.
1
Verification
The following example finds out users whose traffic usage generated for using the app whose ID is 110 reaches 30 MB in a day and writes the IMSI, traffic usage, and date of the users to the Kafka.
Step 1 Choose Realtime Awareness > Application Management > Realtime Application
Management.
The Realtime Application Management page is displayed.
Step 2 Click next to a preconfigured rule.
On the instantiation page, set parameters according to examples in Table 1-48.
Step 3 Send data in the source topic in the Kafka.
1. Log in to the Hadoop Client.
2. Initialize the environment variable.
% source bigdata_env
3. Initialize the ticket and log in to the client in secure mode.
% kinit <user name>
Generally, the admin user is used for login. Enter the user password as prompted.
4. Send data in the source topic in the Kafka.
% kafka-console-producer.sh --broker-list
10.0.0.1:21005,10.0.0.2:21005,10.0.0.3:21005 --topic AppTrafficAwareness_input
Input data:
Customer Contextual Awareness
1|460023700411005|3540960531813501|8615879005652|10.211.76.190|200|0755|6299|22
1.177.147.230|221.177.147.230|221.177.151.163|221.177.151.169|2|cmnet|103|14604
41839|1460441839905|1460441871060|1|110|0|43426|117.169.71.158|80|460|0|1500130
1|25001301|10|9|0|0|0|0|0|0|6|200|3|1|1|600|mmocgame.qpic.cn|mmocgame.qpic.cn/w
echatgame/mEMdfrX5RU1ibNvae0bPXE6eyejGjTo1wicricDldmQ2iazRDV56uOc0B9L2QAudt0v0/
0|mmocgame.qpic.cn|Dalvik/1.6.0 (Linux; U; Android 4.0.4; GT-S7562i
Build/IMM76I)|image/png|||6323|0|0|0|||2322300259|||China
Mobile|China|791|0791||Shenzhen|2||||Samsung|S7562I|Mobile phone
This Spark task requires only the second, seventeenth, twentieth, twenty-seventh, twenty-eighth fields (IMSI, Start Time, App Category, Upstream Traffic, and Downstream Traffic respectively).
5. Use the Kafka client tool to check whether result data is received.
% kafka-console-consumer.sh --zookeeper 10.0.0.4:24002/kafka --topic
AppTrafficAwareness_output --from-beginning
The following information is displayed:
460023700411005,4.0002602E7,2016-04-12
The information indicates that the Kafka has received the data.
In the preceding information, "4.0002602E7", where "E7" indicates the seventh power of 10, indicates the traffic usage generated for using the app. The value is the sum of the values of the twenty-seventh and twenty-eighth fields, in bytes.
In the preceding information, "2016-04-12" indicates the date when the traffic usage reaches the maximum, which is converted from the value of the seventeenth field.
----End
Typical Configuration Case: Marketing Event Upon Traffic Usage Saturation
The system monitors the traffic usage of a specific app, finds users whose traffic usage
reaches the threshold in the specified time, and sends the user information to the message
middleware for storage.
Customer Contextual Awareness
Context
Figure 1-97 Service of monitoring users' traffic usage
Traffic usage accumulation result format: Key field,Timestamp,Accumulated value
APP_ID1: app category; APP_ID2: app subcategory
Within an hour after 03:00:00 on May 6, a user's traffic usage does not reach the maximum.
The system deletes the result and starts accumulation again. Within an hour after 04:00:00 on
May 6, a user's traffic usage reaches the maximum. The system generates the data.
Procedure
Step 1 Log in to the foreground.
Enter http://IP address:Port/console. However, the login URL varies depending on the
installation mode.
Then, enter the login user name, password, and verification code. For details about the default
password, see "Password Change Views" in the Password Change.
Step 2 Choose Data Governance in the navigation tree at the upper part. The Data Governance
page is displayed.
Step 3 Choose Realtime Awareness > Streaming Studio > Stream Computing Design.
The Stream Computing Design page is displayed, as shown in Figure 1-98.
Customer Contextual Awareness
Figure 1-98 Stream Computing Design
Step 4 Click Add. The rule orchestration page is displayed.
Set parameters. Figure 1-99 shows the parameter settings.
Figure 1-99 Rule orchestration
The following describes configurations of each module:
Kafka Source configuration
Customer Contextual Awareness
Figure 1-100 Kafka Source configuration
Table 1-49 Kafka Source parameters
Parameter Value Description
Topic Name sumInput Topic from which source data is read for
calculation.
Data Type string Both the string and bytearray types are
supported. Set this parameter based on the
type of source data.
Data Separator , Set this parameter based on the source data.
Field Name MSISDN,FLUX,APP_
ID1,APP_ID2,TIME
Set this parameter based on the source data.
Use commas (,) to separate multiple values.
Field Type string,double,int,string,
long
Set this parameter based on the source data.
Use commas (,) to separate multiple values.
The value options are as follows: string,
int, double, long, float, and boolean. The
value must be in lower-case
For a timestamp field, set this parameter to long.
Customer Contextual Awareness
Filter interceptor configuration
Figure 1-101 Filter interceptor configuration
Table 1-50 Filter interceptor parameters
Parameter Value Description
The running result
of the expression
must be Boolean.
APP_ID1==2 and
(APP_ID2=='aiqiyi' or
APP_ID2=='kugou')
Filter data whose app category is
2 and app subcategory is aiqiyi
or kugou.
Accumulation interceptor configuration
Customer Contextual Awareness
Figure 1-102 Accumulation interceptor configuration
Table 1-51 Accumulation interceptor parameters
Parameter Value Description
Key Field MSISDN Group and accumulate values by MSISDN.
Accumulate
Field
FLUX Accumulate the FLUX field.
Accumulate Field can only be numeric, such as int, long,
double, float.
Trigger
Threshold
300 When the accumulation result reaches the trigger
threshold, accumulation stops and result outputs.
Clear cycle Fixed time: 1
hour
Period of resetting the accumulation result. When a period
ends, the accumulation result is set to 0. In the next
period, the accumulation starts from 0. The options are as
follows:
Natural month
Natural day
Fixed time. The unit is hour. Only positive integers are
supported. The maximum value is 31 x 24.
Whether to
use
timestamp
field
Yes Yes: Trust time in the source data. Select a timestamp
field of the long type.
No: Not trust time in the source data. Use the system
time of the CAE as the data generation time (in this
case, the timestamp in the output data is also the
system time).
Customer Contextual Awareness
The accumulation result is in Key field,Timestamp,Accumulated value format.
Kafka Sink configuration
Figure 1-103 Kafka Sink configuration
Table 1-52 Kafka Sink parameters
Parameter Value Description
Topic Name sumOutput Topic that stores the calculation result.
Output Data Separator , Separator of the output data.
Step 5 After the orchestration is complete, click Publish.
The Realtime Application Management page is displayed after a successful publishing.
Step 6 Instantiate the Spark task rule.
Click Operation > Instantiation next to the Spark rule.
Set instantiation parameters. Customize the task name, use the default values for other
parameters, and click Confirm.
In the instantiation configuration, the application.batch.milliseconds parameter indicates the time interval for the Spark to process each batch of tasks. The default value is 3000ms (3s).
The Task Manager page is displayed. You can view the instantiated task name on this page.
Step 7 Select the task and click "Start".
----End
Customer Contextual Awareness
Verification
Step 1 Log in to the Hadoop Client.
Step 2 Initialize the environment variable.
% source bigdata_env
Step 3 Initialize the ticket and log in to the client in secure mode.
% kinit <user name>
Generally, the admin user is used for login. Enter the user password as prompted.
Step 4 Send the data in the data source topic of Kafka.
% kafka-console-producer.sh --broker-list 10.0.0.1:21005,10.0.0.2:21005,10.0.0.3:21005
--topic sumInput
Enter the following information:
13800000001,100,2,aiqiyi,1494010800
13800000001,100,2,kugou,1494014400
13800000001,100,2,aiqiyi,1494016200
13800000001,100,2,kugou,1494017400
Step 5 Use the Kafka client tool to check whether result data is received.
% kafka-console-consumer.sh --zookeeper 10.0.0.4:24002/kafka --topic test_output
--from-beginning
The following information is displayed:
13800000001,1494017400,300.0
The preceding information indicates that the calculation is successful.
The traffic usage accumulation result is in Key field,Timestamp,Accumulated value format.
----End
Best Practices
Typical Configuration Cases: Location Event Log Collection
Principle
Location and communication signaling is ingested from original service data provided by
carriers and used for identifying users who stayed in a specified location for a specified period,
based on which the system conducts precision marketing.
The Flume is a common CDR ingestion platform. The Contextual Awareness Engine is
required to create events, specify event ingestion rules, and release the rules to the Flume.
Then the Flume ingests original CDRs based on the rules. CDR files ingested by the Flume
can be provided for the Campaign Management for real-time marketing.
Figure 1-104 shows the data direction for Commissioning Location Event Log Collection.
Customer Contextual Awareness
Figure 1-104 Data direction for Commissioning Location Event Log Collection
1. The Location_In event (indicating signaling CDRs from the A interface), the
Location_Basic event (indicating collected location data), and the Location_Out event
(indicating the result of Timing calculation) need to be created for the signaling ingestion
commissioning process.
2. The Flume ingests source data (Location_In events) from service signaling, filters the
ingested data to obtain the required data (Location_Basic events), and then stores the
required data to the Kafka.
3. The event rule template User Staying at Specified Area for a Time Period is used in
subsequent commissioning. The Location_Basic event is used as the input of location
moment rule templates.
4. After moment calculation, data meeting the requirements (Location_Out event ) will be
sent to the Campaign system for conducting related marketing activities.
CAE system includes the following modules:
CAE Server: Huawei-developed CAEServer performs unified task scheduling for stream processing components at the bottom layer. CAE Server enables operators to process streaming data without paying attention to components at the bottom layer, simplifying operations and improving processing efficiency.
Flume: connected Hadoop component, for data ingestion.
Kafka: connected Hadoop component, for data distribution.
Spark: connected Hadoop component, for data computing.
Creating Location Events
Procedure
Step 1 Log in to the Universe as the user user.
1. Enter http://Floating IP address of the SLB:9010/console/login.action in the address box
of the browser and press Enter.
2. Enter the user name and password of the user user and click Login.
Step 2 Choose Data Governance.
Step 3 Choose Realtime Awareness > Event Management Center > Event Design.
Step 4 Click on the left and create an event type.
Customer Contextual Awareness
Figure 1-105 Creating an event type
The extension attribute isSync indicates whether an event is a synchronous event. This
attribute is used by the Campaign and the default value is 0.
Step 5 Create the basic location event Location_In in this directory. The event contains the userID,
MSISDN, IMSI, LacTac, and CI attributes.
Customer Contextual Awareness
Figure 1-106 Creating the Location_In event
This event is used for ingesting user location data. Leave Associated Query empty on the
Event Attr page.
Step 6 Create the basic location event Location_Basic in this directory. The event contains the
MSISDN, LacTac, and CI attributes.
Figure 1-107 Creating the Location_Basic event
This event is used for ingesting user location data. Leave Associated Query empty on the
Event Attr page.
Customer Contextual Awareness
Step 7 Create the output template event Location_Out. The event contains the MSISDN, LacTac,
CI, and Addr attributes.
Figure 1-108 Creating the Location_Out event
This event is used to define the locating staying moment template. Leave Associated Query
empty on the Event Attr page.
Step 8 Select the three new events and click Online.
----End
Creating the Event Ingestion Process
Procedure
Step 1 Log in to the Universe as the user user.
1. Enter http://Floating IP address of the SLB:9010/console/login.action in the address
box of the browser and press Enter.
2. Enter the user name and password of the user user and click Login.
Step 2 Choose Data Governance > Realtime Awareness > Streaming Studio > Streaming
Processing Design.
Step 3 Create the ingestion process collect_location under the FLUME node on the left.
1. Select the basic operation item Directly Create.
2. Configure basic information.
Set the process name to collect_location.
Customer Contextual Awareness
3. Select a Flume node.
Select the IP address of the Flume node contains prepared data.
4. Click to show the tool bar and edit the process.
5. Drag the Spooling Directory Source diagram element from the tool bar to the process
editing area and double-click the diagram element to edit it, as shown in Figure 1-109.
Figure 1-109 Configuring the Spooling Directory Source diagram element
− Source Event: Set it to the basic event Location_In.
− Data Source Directory:Set it to /opt/huawei/universe/data/location.
The directory levels in each Flume vary. The /opt/huawei/universe/data/location directory is used as an example. Modify it based on the site requirements. The omm user must have the read, write, and execute permissions on the data source directory.
6. Drag the Field Projecting diagram element and connect it to the Spooling Directory
Source diagram element. Double-click the Field Projecting diagram element and
configure it.
Figure 1-110 shows the configuration.
Customer Contextual Awareness
Figure 1-110 Configuring the Field Projecting diagram element
The Field Projecting diagram element is used to filter fields in the source data and find
out fields required by services.
Select the MSISDN, LacTac, and CI fields in sequence. Other fields will be filtered out.
7. Drag the Memory Channel diagram element, double-click it, and set Node Name.
8. Drag the Kafka Sink diagram element, double-click it, and set Node Name.
Figure 1-111 shows the configuration.
Figure 1-111 Configuring the Field Projecting diagram element
Customer Contextual Awareness
Kafka Topic: Set it to the sdi_Location_Basic topic corresponding to the basic event.
9. Connect the diagram elements using connection lines, as shown in Figure 1-112.
Figure 1-112 Ingestion process
Step 4 Click Save to save the configuration.
Step 5 Click Release to release the process.
----End
Verification
Importing Test Data
Log in to the Flume node as the omm user and create the /opt/huawei/universe/data/location
directory.
> cd /opt/huawei/universe/data
> mkdir location
The omm user must have the read, write, and execute permissions on this directory.
Create a .text file in the /opt/huawei/universe/data/location directory.
> vi location.txt
Customer Contextual Awareness
1,13810031351,460002198001011234,1000,1001
The directory levels in each FusionInsight vary. The /opt/huawei/universe/data directory is used as
an example. Modify it based on the site requirements. In principle, the file storage directory must be the same as the data source directory configured in the ingestion flow and the omm user have the read, write, and execute permissions on this directory.
The delimiter in the .txt file must be the same as the input event delimiter.
The field sequence in the .txt file must be the same as the sequence of input event attributes.
Customer numbers must be from the prepared customer segment.
For the LacTac and CI fields, use data for which the mapping already exists.
Checking Data Ingestion
Log in to the Flume node as the omm user, go to the service data storage directory
/opt/huawei/universe/data/location, and check whether the file is suffixed
by .COMPLETED.
> ll
drwxr-xr-x 3 omm ficommon 4096 Nov 16 23:52 ./
drwxr-xr-x 4 omm ficommon 4096 Nov 17 00:53 ../
drwx------ 2 omm wheel 4096 Nov 16 19:23 .flumespool/
-rw------- 1 omm wheel 22 Nov 16 11:33 location.txt.COMPLETED
If yes, data ingestion is successful.
Checking Kafka Consumption
Use the Kafka client tool to check whether ingested data is received.
1. Log in to the Hadoop Client.
2. Initialize the environment variable.
% source bigdata_env
3. Initialize the ticket and log in to the client in secure mode.
% kinit <user name>
Generally, the admin user is used for login. Enter the user password as prompted.
4. Check whether the Kafka has received data.
% kafka-console-consumer.sh --zookeeper 10.0.0.1:24002/kafka --topic
sdi_Location_Basic --from-beginning
In the command, the IP address is that of the ZooKeeper.
The following information is displayed:
13810031351,1000,1001
The information indicates that the Kafka has received the data.
Customer Contextual Awareness
Typical Configuration Example: Projection and Grouping Calculation
Context
Figure 1-113 Display of projection and grouping calculation functions
Customer Contextual Awareness
Procedure
Step 1 Log in to the foreground.
Enter http://IP address:Port/console. However, the login URL varies depending on the
installation mode.
Then, enter the login user name, password, and verification code. For details about the default
password, see "Password Change Views" in the Password Change.
Step 2 Choose Data Governance in the navigation tree at the upper part. The Data Governance
page is displayed.
Step 3 In the navigation bar on the upper part, choose Realtime Awareness > Streaming Studio >
Stream Computing Design.
The Realtime Application Management page is displayed, as shown in Figure 1-114.
Figure 1-114 Stream Computing Design page
Step 4 Click Add to access the rule orchestration page.
Complete the configuration based on the Figure 1-115 method.
Customer Contextual Awareness
Figure 1-115 Rule orchestration
The configurations of each module are as follows:
Kafka Source configuration
Customer Contextual Awareness
Figure 1-116 Kafka Source configuration
Table 1-53 Kafka Source parameter description
Parameter Value Description
Topic Names test_input This parameter reads the source
data for calculation from the
test_input topic.
Data Encoding string This parameter has string and
bytearray types. Set this
parameter based on the actual
type of source data.
Data Separator , Set this parameter based on the
site requirements of source data.
Field Names MSISDN,IMSI,Terminal_ID,UP_F
LUX,DOWN_FLUX,SUM_FLUX,
APP_ID
Set this parameter based on the
site requirements of source data.
Use commas (,) to separate
multiple values.
Field Types string,string,string,int,int,int,string Set this parameter based on the
site requirements of source data.
Use commas (,) to separate
multiple values.
Projection Interceptor configuration
Customer Contextual Awareness
Figure 1-117 Projection Interceptor configuration
Table 1-54 Projection Interceptor parameter description
Parameter Value Description
Output Event
Attribute:
MSISDN,IMSI,SUM_FLUX,AP
P_ID
Select the field to be exported
and adjust the export sequence.
Grouping Interceptor configuration
Figure 1-118 Grouping Interceptor configuration
Customer Contextual Awareness
Table 1-55 Grouping Interceptor parameter description
Parameter Value Description
Fields To Group
On
MSISDN Perform the grouping based on
the value of MSISDN.
Input Field of Sum SUM_FLUX Calculate the sum based on the
value of SUM_FLUX.
Kafka Sink configuration
Figure 1-119 Kafka Sink configuration
Table 1-56 Kafka Sink parameter description
Parameter Value Description
Topic Names test_ouput Topic for storing the calculation
result data.
Output Data
Separator
, Specified output data delimiter.
Step 5 After the orchestration, click to release.
After the release, the Realtime Application Management page is displayed.
Step 6 Instantiate the generated Spark task rule.
Click in the Operation column next to a Spark rule name.
Set instantiated parameters. Customize the task name, use the default values for other
parameters, and click Confirm.
The Task Manager page is displayed, on which the instantiated task name can be viewed.
Customer Contextual Awareness
Step 7 Select the created task and click "Start".
----End
Verifying the Result
Step 1 Log in to the Hadoop Client.
Step 2 Initialize the environment variable.
% source bigdata_env
Step 3 Initialize the ticket and log in to the client in secure mode.
% kinit <user name>
Generally, the admin user is used for login. Enter the user password as prompted.
Step 4 Send the data in the data source topic of Kafka.
% kafka-console-producer.sh --broker-list 10.0.0.1:21005,10.0.0.2:21005,10.0.0.3:21005
--topic test_input
Inputs:
13800000001,13800000001,211,100,200,300,001
13800000002,13800000002,111,150,200,350,001
13800000002,13800000002,111,350,200,550,002
13800000001,13800000001,211,120,220,340,003
13800000001,13800000001,211,120,120,240,002
13800000003,13800000001,211,120,120,240,002
Step 5 Use the Kafka client tool to check whether result data is received.
% kafka-console-consumer.sh --zookeeper 10.0.0.4:24002/kafka --topic test_output
--from-beginning
The system displays the following information.
880,13800000001
900,13800000002
240,13800000003
Indicates that the calculation is successful.
The first column indicates the sum of SUM_FLUX fields in each group, and the second
column indicates that the grouping is performed based on the MSISDN field.
----End
top related