best practices - support.huaweicloud.com · best practices contents 2020-01-19 ii. 1 using spark to...

35
MapReduce Service Best Practices Date 2020-01-19

Upload: others

Post on 27-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

MapReduce Service

Best Practices

Date 2020-01-19

Page 2: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Contents

1 Using Spark to Analyze IoV Drivers' Driving Behavior................................................... 1

2 Using MRS to Analyze Traffic and Vehicle Status Data............................................... 102.1 Overview.................................................................................................................................................................................. 102.2 Implementation Methods.................................................................................................................................................. 112.2.1 Applying for VPCs.............................................................................................................................................................. 122.2.2 Creating MRS Streaming Clusters................................................................................................................................132.2.3 Using Kafka to Collect Data in Real Time................................................................................................................ 182.2.4 Starting the Storm Program for Analyzing Fake Plate Vehicles........................................................................212.2.5 Starting the Storm Program to Calculate Checkpoint Traffic............................................................................ 222.2.6 Creating MRS Analysis Clusters.................................................................................................................................... 232.2.7 Analyzing Checkpoint Traffic and Vehicle Travel Paths....................................................................................... 262.2.8 Displaying Data and Reports on the Web UI.......................................................................................................... 32

MapReduce ServiceBest Practices Contents

2020-01-19 ii

Page 3: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

1 Using Spark to Analyze IoV Drivers'Driving Behavior

This document is based on practices of MapReduce Service (MRS) on HUAWEICLOUD and is used to walk you through how to use Spark to analyze drivers'driving behavior.

Content is as follows:

1. Scenario2. Creating a Cluster3. Preparing a Spark Sample Program and Sample Data4. Creating a Job5. Viewing the Job Execution Results

Scenario

Objective:

Understand basic functions of MRS. Use the Spark component of MRS to analyzedrivers' driving behavior and obtain the driving behavior analysis result.

Scenario:

In this case, raw data is vehicle owners' driving behavior information, includingabrupt acceleration, abrupt deceleration, neutral sliding, overspeed, and fatiguedriving. With the powerful analysis capability of the Spark component, you cananalyze and collect statistics on the number of drivers' violations in a specifiedperiod, including abrupt acceleration, abrupt deceleration, neutral sliding,overspeed, and fatigue driving.

This document applies only to MRS 1.8.x. Create a cluster as instructed.

Creating a Cluster

Step 1 Log in to the HUAWEI CLOUD management console.

MapReduce ServiceBest Practices

1 Using Spark to Analyze IoV Drivers' DrivingBehavior

2020-01-19 1

Page 4: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Step 2 Click Service List and choose EI Enterprise Intelligence > MapReduce Service.

Step 3 In the upper right corner of the page, click Buy Cluster. The Buy Cluster page isdisplayed.

Step 4 Go to the cluster purchase page of the new version and click Custom Config.

Configure cluster software according to Table 1-1.

Table 1-1 Software configuration

Parameter Configuration

Region Select AP-Hong Kong.NOTE

This document uses AP-Hong Kong as an example. If you want toperform operations in other regions, ensure that all operations areperformed in the same region.

Cluster Name mrs_demo

Cluster Version Select MRS 1.8.10.NOTE

This document applies only to MRS 1.8.x. Select a proper version tocreate a cluster.

Cluster Type Select Analysis cluster to analyze offline data.

Component Select all components.

KerberosAuthentication Click to disable Kerberos authentication.

Username Name of the administrator of MRS Manager. admin is used bydefault.

Password Password of the MRS Manager administrator

MapReduce ServiceBest Practices

1 Using Spark to Analyze IoV Drivers' DrivingBehavior

2020-01-19 2

Page 5: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Figure 1-1 Software configuration

Step 5 Click Next Configure Hardware.

Configure cluster hardware according to Table 1-2.

Table 1-2 Hardware configuration

Parameter Configuration

Billing Mode Pay-per-use

AZ AZ2

VPC Select the VPC for which you want to create a cluster and clickView VPC to view the name and ID of the VPC. If no VPC isavailable, create one.

MapReduce ServiceBest Practices

1 Using Spark to Analyze IoV Drivers' DrivingBehavior

2020-01-19 3

Page 6: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Parameter Configuration

Subnet Select the subnet for which you want to create a cluster toenter the VPC and view the name and ID of the subnet. If nosubnet is created under the VPC, click Create Subnet to createone.

Security Group Select Auto Create.

EIP Select Bind later.

EnterpriseProject

Retain the default value.

Cluster HA Retain the default value.

Cluster Node Retain the default value.

Login Mode Select Password.

Username Name of the user for logging in to ECSs. The default usernameis root.

Password Password for logging in to ECSs.

MapReduce ServiceBest Practices

1 Using Spark to Analyze IoV Drivers' DrivingBehavior

2020-01-19 4

Page 7: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Figure 1-2 Hardware configuration

Step 6 Click Next Set Advanced Options, and skip parameter configuration on this page.

Step 7 Click Buy Now. The page is displayed showing that the task has been submitted.

Step 8 Click Back to Cluster List to view the cluster status.

Cluster creation takes some time. The initial status of the cluster is Starting. Afterthe cluster has been created successfully, the cluster status becomes Running.

----End

MapReduce ServiceBest Practices

1 Using Spark to Analyze IoV Drivers' DrivingBehavior

2020-01-19 5

Page 8: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Preparing a Spark Sample Program and Sample DataStep 1 Create an OBS bucket to store the Spark sample program, sample data, job

execution results, and logs.

1. Log in to the HUAWEI CLOUD management console.2. Click Service List and choose Storage > Object Storage Service.3. Click Create Bucket to create a bucket named obs-demo-analysis-hwt4.

Retain the default values for parameters, such as Storage Class and BucketPolicy.

Step 2 Click obs-demo-analysis-hwt4. The bucket list page is displayed. In the leftnavigation pane, choose Objects. On the Objects tab page, click Create Folder tocreate the program and input folders, as shown in Figure 1-3.

Figure 1-3 Creating a folder

Step 3 Download the sample program driver_behavior.jar from https://mrs-obs-ap-southeast-1.obs.ap-southeast-1.myhuaweicloud.com/mrs-demon-samples/demon/driver_behavior.jar to the local PC.

Step 4 Go to the program folder. Click Upload Object, and select the localdriver_behavior.jar sample program.

Step 5 Click Upload to upload the sample program to the OBS bucket.

Step 6 Obtain Spark sample data from https://mrs-obs-ap-southeast-1.obs.ap-southeast-1.myhuaweicloud.com/mrs-demon-samples/demon/detail-records.zip and decompress it.

Step 7 Go to the input folder. Click Upload Object, and select the Spark sample datastored on the local PC.

Step 8 Click Upload to upload the sample data to the OBS bucket.

----End

Creating a JobStep 1 In the left navigation pane of the MRS management console, choose Clusters >

Active Clusters. Click the mrs_demo cluster.

MapReduce ServiceBest Practices

1 Using Spark to Analyze IoV Drivers' DrivingBehavior

2020-01-19 6

Page 9: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Step 2 On the cluster details page, click the Jobs tab and then click Create. The CreateJob page is displayed.

Step 3 After configuring job parameters according to Figure 1-4, click OK to submit thejob.

Table 1-3 Configuring job information

Parameter Configuration

Type Select SparkSubmit.

Name Enter driver_behavior_task.

Program Path Click OBS and select Preparing a Spark SampleProgram and Sample Data.Upload the driver_behavior.jar package.

Program Parameter Select --class in Parameter, and entercom.huawei.bigdata.spark.examples.DriverBehavior in Value.

Parameters Enter AK SK 1 Input path Output path. You can clickOBS to select an input path, and enter an outputpath that does not exist, for example, s3a://obs-demo-analysis-hwt4/output/.NOTE

To obtain the AK/SK, perform the following steps:1. Log in to the HUAWEI CLOUD management console.2. Click the username in the upper right corner and choose

My Credentials.3. On the displayed My Credentials page, click the Access

Keys tab.4. Click Add Access Key to add a key. Enter the password

and verification code as prompted. The browserautomatically downloads the credentials.csv file. Thefile is in CSV format and separated by commas (,). In thefile, the middle part is AK and the last part is SK.

Service Parameter This parameter is left blank by default. Retain thedefault settings.

MapReduce ServiceBest Practices

1 Using Spark to Analyze IoV Drivers' DrivingBehavior

2020-01-19 7

Page 10: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Figure 1-4 Adding a job

Step 4 Click OK to start executing the program.

----End

Viewing the Job Execution Results

Step 1 Go to the Jobs tab page to view job execution status.

Figure 1-5 Viewing job execution status

Step 2 Wait 1 to 2 minutes and log in to OBS Console. Go to the output path of the obs-demo-analysis-hwt4 bucket to view the execution result. Click Download in theOperation column of the generated CSV file to download the file to the local PC.

Figure 1-6 Viewing the job execution results

MapReduce ServiceBest Practices

1 Using Spark to Analyze IoV Drivers' DrivingBehavior

2020-01-19 8

Page 11: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Step 3 Export the downloaded CSV file to Excel to open the file on the local PC, andclassify the data in each column according to the fields defined in the program.The job execution results are obtained.

Figure 1-7 Execution result

----End

MapReduce ServiceBest Practices

1 Using Spark to Analyze IoV Drivers' DrivingBehavior

2020-01-19 9

Page 12: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

2 Using MRS to Analyze Traffic andVehicle Status Data

2.1 Overview

Scenario

With the popularization of vehicles, road traffic problems are becoming more andmore serious, such as congestion, frequent accidents, and violation of licenseregulations.

With HUAWEI CLOUD computing and big data processing capabilities, you cananalyze and mine traffic and vehicle conditions to assist transportationdepartments in mitigating traffic problems.

This document applies only to MRS 1.8.x. Create a cluster as instructed.

Figure 2-1 Solution architecture

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 10

Page 13: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Preparations● Register a HUAWEI CLOUD account.

● Purchase MapReduce Service (MRS), Elastic Cloud Server (ECS), Virtual PrivateCloud (VPC), and Object Storage Service (OBS).

Tutorial Tasks1. Detection of vehicles with fake license plates

Collect and analyze road checkpoint data in real time. If two identical licensesare found within a certain period of time and scope, the license plate number,location, and time are immediately displayed on the map so that traffic policecan be informed of investigation in time.

2. Real-time checkpoint traffic monitoring

Collect and analyze road checkpoint data in real time. Collect informationabout vehicles at each checkpoint in real time, and calculate whichcheckpoints are the most crowded in real time.

3. Checkpoint traffic analysis

Collect traffic statistics of each checkpoint every hour based on massive data,and find the most crowded checkpoint of the day.

4. Vehicle travel path analysis

Perform checkpoint correlation analysis and outline a panoramic vehicle travelpath view of the city.

Required Services● MRS

MapReduce Service (MRS) provides enterprise-level big data clusters on thecloud. Tenants can fully control the clusters and run big data componentssuch as Hadoop, Spark, HBase, Kafka, and Storm in the clusters.

● VPC

Virtual Private Cloud (VPC) is a secure, isolated, logical network on HUAWEICLOUD. You can configure IP address segments, subnets, and security groups,assign EIPs, and allocate bandwidth in a VPC.

● ECS

Elastic Cloud Server (ECS) provides scalable, on-demand cloud servers forsecure, flexible, and efficient application environments, ensuring reliable,uninterrupted services.

● OBS

Object Storage Service (OBS) is a stable, secure, efficient, and easy-to-usecloud storage service. With the Representational State Transfer (REST)application programming interface (API), OBS is able to store any amountand form of unstructured data.

2.2 Implementation Methods

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 11

Page 14: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

2.2.1 Applying for VPCs

ScenarioA VPC is a secure, isolated, and logical network environment.

Procedure

Step 1 Log in to the HUAWEI CLOUD management console.

Step 2 In Service List, choose Network > Virtual Private Cloud. In the upper rightcorner of the Network Console page, click Create VPC.

Step 3 Set VPC parameters as prompted and click Create Now.

This document uses AP-Hong Kong as an example. If you want to perform operations inother regions, ensure that all operations are performed in the same region.

● Region: Select AP-Hong Kong.● Name: Set it to vpc-mrs-demo. You can also enter a new name according to

the naming rules.● CIDR Block: Use the default value.● Enterprise Project: Select default.● Tag: Use the default value.● AZ: Select AZ2.● Name of the subnet: Set it to subnet-mrs-demo. You can also enter a new

subnet name according to the naming rules.● CIDR Block of the subnet: Use the default value.● Advanced Settings: Select Default.

Figure 2-2 shows the VPC parameter configurations.

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 12

Page 15: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Figure 2-2 Configuring VPC parameters

----End

2.2.2 Creating MRS Streaming Clusters

Creating an MRS Streaming Cluster

Step 1 Log in to the HUAWEI CLOUD management console and choose EI EnterpriseIntelligent > MapReduce Service in Service List. The MRS management consoleis displayed.

Step 2 In the upper right corner of the MRS management console, click Buy Cluster. TheBuy Cluster page is displayed.

Step 3 Go to the cluster purchase page of the new version and click Custom Config.

Step 4 Configure cluster software information.

This document uses AP-Hong Kong as an example. If you want to perform operations inother regions, ensure that all operations are performed in the same region.

Set the parameters as follows:

● Region: Select AP-Hong Kong.● Cluster Name: Enter mrs_demo or specify a name according to naming rules.

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 13

Page 16: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

● Cluster Version: Select MRS 1.8.10. This document applies only to MRS 1.8.x.Select a proper version to create a cluster.

● Cluster Type: Select Streaming cluster.

● Select all components of the streaming cluster.

● Kerberos Authentication: Click to disable Kerberos authentication.

● Username: The default value is admin.

● Password: Set it to the password of the MRS Manager administrator.

Figure 2-3 Software configuration

Step 5 Click Next Configure Hardware.

Set the parameters as follows:

● Billing Mode: Select Pay-per-use.

● AZ: Select AZ2.

● VPC and Subnet: Select the VPC and subnet created in Applying for VPCs,respectively.

● Security Group: Use the default value Auto create.

● EIP: Bind later is selected by default.

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 14

Page 17: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

● Enterprise Project: Select default.

● Cluster HA: Use the default settings, that is, enabled.

● Disk LVM: This function is disabled by default.

● Cluster Node: Use the default values of instance specifications for Master andCore nodes. Use the default values for the instance count as well as data disktype and size. Do not add Task nodes.

● Login Mode: Select Password. Enter a password and confirm the passwordfor user root.

Figure 2-4 Hardware configuration

Step 6 Click Next Set Advanced Options, and skip parameter configuration on this page.

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 15

Page 18: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Step 7 Click Buy Now. The page is displayed showing that the task has been submitted.

Step 8 Click Back to Cluster List. You can view the status of the cluster on the ActiveClusters page.

Cluster creation takes some time. The initial status of the cluster is Starting. Afterthe cluster has been created successfully, the cluster status becomes Running.

----End

Adding a Security Group Rule and Binding to an EIP

Step 1 Log in to the HUAWEI CLOUD management console and choose EI EnterpriseIntelligent > MapReduce Service in Service List. The MRS management consoleis displayed.

Step 2 In the left navigation pane, choose Clusters > Active Clusters.

Step 3 On the Active Clusters page, click mrs_demo. On the Nodes tab page, locate theMaster2 node and click the node name. The ECS details page is displayed.

Figure 2-5 Node information

Step 4 On the ECS details page, click the Security Groups tab and click a security groupname. Click the security group ID in the expanded page. See Figure 2-6.

Figure 2-6 Security group

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 16

Page 19: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Step 5 On the displayed Security Groups page, click Add Inbound Rule and then clickAdd Rule. After the configuration, click OK.

To quickly experience MRS and reduce the rule restrictions, you are advised to setProtocol & Port to All for both inbound and outbound rules and Source to All.See Figure 2-7.

Figure 2-7 Adding an inbound rule

Step 6 On the ECS details page, click the EIPs tab and then click Bind EIP.

Figure 2-8 EIP

Step 7 On the Bind EIP page, select an available EIP and click OK. If no EIP is available,click View EIP to create one. The region of the new EIP must be the same as theregion of the cluster. See Figure 2-9.

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 17

Page 20: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Figure 2-9 Binding an EIP

----End

2.2.3 Using Kafka to Collect Data in Real TimeStep 1 Use the MobaXterm tool to log in to the EIP of the master2 node. The login

username is root. Set the parameters according to Figure 2-10 and click OK. Enterthe password as prompted.

Download MobaXterm at https://mobaxterm.mobatek.net/.

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 18

Page 21: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Figure 2-10 MobaXterm configuration details

Step 2 After the login is successful, run the following command to switch to user root:sudo -s

Step 3 Run the following command to configure environment variables:source /opt/client/bigdata_env

Step 4 Log in to the MRS management console and choose Clusters > Active Clusters.Click mrs_demo to access the basic cluster information page. Click View next toCluster Manager to access MRS Manager.

Figure 2-11 Clicking View

Step 5 On MRS Manager, choose Service > ZooKeeper > Instance to query the IPaddresses of ZooKeeper instances. See Figure 2-12.

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 19

Page 22: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Figure 2-12 IP addresses of the ZooKeeper instances

Record any IP address of a ZooKeeper instance, for example, 192.168.0.111.

Step 6 Create a Kafka input topic.

The topic name is input_topic. Replace the ZooKeeper cluster IP address with theactual IP address.

kafka-topics.sh --create --zookeeper <IP address of the node where the ZooKeeper role instance resides:2181/kafka> --partitions 2 --replication-factor 2 --topic input_topic

Step 7 Run the wget command to download the upload_kafka_tool.tar.gz package tothe master2 node and run the following command to decompress the package:wget https://mrs-obs-cn-north-4.obs.cn-north-4.myhuaweicloud.com/mrs-demon-samples/demon/upload_kafka_tool.tar.gztar -zxvf upload_kafka_tool.tar.gz

Step 8 In the dis.properties file in the upload_kafka_tool directory, change the value ofbroker_list to the IP address of the Kafka broker in the cluster, and check whetherthe value of topic_name is the topic created in Step 6.#Kakfa configurations (topic creation command bin/kafka-topics.sh --create --partitions 1 --replication 2 --zookeeper 192.168.0.17:2181 --topic topic0929)broker_list=192.168.0.220:9092,192.168.0.246:9092,192.168.0.242:9092topic_name=input_topic

#Sink type (kafka or dis)sink_type=kafka

Step 9 Run the following command to start the data generation process:sh upload_kafka_tool/bin/linux/startProducer.sh

Step 10 Run the following commands in sequence to read data from input_topic andensure that data has been written to Kafka:source /opt/client/bigdata_envkafka-console-consumer.sh --zookeeper <IP address of the node where the ZooKeeper role instance resides:2181/kafka> --topic input_topic --from-beginning

----End

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 20

Page 23: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

2.2.4 Starting the Storm Program for Analyzing Fake PlateVehicles

Step 1 Use the MobaXterm tool to log in to the master2 node. The login username isroot. Set the parameters according to Figure 2-13 and click OK. Enter thepassword as prompted.

Download link of MobaXterm: https://mobaxterm.mobatek.net/

Figure 2-13 MobaXterm configuration details

Step 2 Use WinSCP to upload the car_analysis.jar JAR file to the master2 node.● Download link of car_analysis.jar: https://mrs-obs-ap-southeast-1.obs.ap-

southeast-1.myhuaweicloud.com/mrs-demon-samples/demon/car_analysis.jar

● Download link of WinSCP: http://winscp.net/

Step 3 Obtain the IP address of the node where the Kafka role instance resides.

On MRS Manager, choose Service > Kafka > Instance to query the IP addresses ofKafka instances. Record any IP address of a Kafka instance, for example,192.168.0.111.

Step 4 Run the following command to start the program for analyzing fake plate vehicles:source /opt/client/bigdata_envstorm jar car_analysis.jar com.huawei.storm.hcc.SearchXCarTopology xcar input_topic fake_car_output <IP address of the node where the Kafka role instance resides:9092> 200 10 30 30

The relevant parameters are as follows:

1. topology_name: indicates a topology name.2. stream_name: indicates an input stream name, that is, the name of the topic

inputted from Kafka.

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 21

Page 24: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

3. stream_name: indicates an output stream name, that is, the name of thetopic outputted to Kafka.

4. broker list: indicates the broker list of the Kafka cluster.5. Size of the sliding window6. Update frequency (unit: second); window length7. Time in the fake plate vehicle rule (unit: minute)8. Distance in the fake plate vehicle rule (unit: km)

The preceding command is used to submit a Storm program. The Storm programreads data from input_topic and calculates to check whether there are fake platevehicles. If fake plate vehicles are found, the program sends the result to thefake_car_output Kafka topic.

Step 5 After the program running, run the following command to view the calculatedfake plate vehicle data:source /opt/client/bigdata_envkafka-console-consumer.sh --zookeeper <IP address of the node where the ZooKeeper role instance resides:2181/kafka> --topic fake_car_output --from-beginning

----End

2.2.5 Starting the Storm Program to Calculate CheckpointTraffic

Step 1 Use the MobaXterm tool to log in to the master2 node. The login username isroot. Set the parameters according to Figure 2-14 and click OK. Enter thepassword as prompted.

Download MobaXterm at https://mobaxterm.mobatek.net/.

Figure 2-14 MobaXterm configuration details

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 22

Page 25: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Step 2 Use WinSCP to upload the car_analysis.jar JAR file to the master2 node.

Step 3 Run the following command to start the program for analyzing fake plate vehicles:source /opt/client/bigdata_envstorm jar car_analysis.jar com.huawei.storm.hcc.SearchCarSumTopology xcar-sum input_topic sum_car_output <IP address of the Kafka cluster:9092> 60 5

The relevant parameters are as follows:

1. topology_name: indicates a topology name.2. stream_name: indicates an input stream name, that is, the name of the topic

inputted from Kafka.3. stream_name: indicates an output stream name, that is, the name of the

topic outputted to Kafka.4. broker list: indicates the broker list of the Kafka cluster.5. Window size6. Update frequency

The preceding command is used to submit a Storm program. The Storm programreads data from input_topic, calculates the vehicle traffic of each checkpoint, andwrites the result to the sum_car_output Kafka topic.

Step 4 After the program running, run the following command to view the calculatedcheckpoint traffic:source /opt/client/bigdata_envkafka-console-consumer.sh --zookeeper <IP address of the ZooKeeper cluster:2181/kafka> --topic sum_car_output --from-beginning

----End

2.2.6 Creating MRS Analysis ClustersStep 1 Log in to the HUAWEI CLOUD management console and choose EI Enterprise

Intelligent > MapReduce Service in Service List. The MRS management consoleis displayed.

Step 2 In the upper right corner of the MRS management console, click Buy Cluster. TheBuy Cluster page is displayed.

Step 3 Go to the cluster purchase page of the new version and click Custom Config.

Step 4 Configure cluster software information.

This document uses AP-Hong Kong as an example. If you want to perform operations inother regions, ensure that all operations are performed in the same region.

Set the parameters as follows:

● Region: Select AP-Hong Kong.● Cluster Name: Enter mrs_analysis or specify a name according to naming

rules.● Cluster Version: Select MRS 1.8.10. This document applies only to MRS 1.8.x.

Select a proper version to create a cluster.● Cluster Type: Select Analysis cluster.

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 23

Page 26: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

● Select all components of the analysis cluster.

● Kerberos Authentication: Click to disable Kerberos authentication.

● Username: The default value is admin.

● Password: Set it to the password of the MRS Manager administrator.

Figure 2-15 Software configuration

Step 5 Click Next Configure Hardware.

Set the parameters as follows:

● Billing Mode: Select Pay-per-use.

● AZ: Select AZ2.

● VPC and Subnet: Select the VPC and subnet created in Applying for VPCs,respectively.

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 24

Page 27: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

● Security Group: Use the default value Auto create.● EIP: Bind later is selected by default.● Enterprise Project: Select default.● Cluster HA: Use the default settings, that is, enabled.● Cluster Node: Use the default values of instance specifications for Master and

Core nodes. Use the default values for the instance count as well as data disktype and size. Do not add Task nodes.

● Login Mode: Select Password. Enter a password and confirm the passwordfor user root.

Figure 2-16 Hardware configuration

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 25

Page 28: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Step 6 Click Next Set Advanced Options, and skip parameter configuration on this page.

Step 7 Click Buy Now. The page is displayed showing that the task has been submitted.

Step 8 Click Back to Cluster List. You can view the status of the cluster on the ActiveClusters page.

Cluster creation takes some time. The initial status of the cluster is Starting. Afterthe cluster has been created successfully, the cluster status becomes Running.

----End

2.2.7 Analyzing Checkpoint Traffic and Vehicle Travel PathsStep 1 Prepare data files and programs.

● Download link of mapreduce-example-normal.jar: https://mrs-obs-ap-southeast-1.obs.ap-southeast-1.myhuaweicloud.com/mrs-demon-samples/demon/mapreduce-example-normal.jar

● co.csv: Download mapreduce-example-normal.jar and decompress it. Obtainco.csv from the mapreduce-example-normal\resource folder.

● dis.hcc-demo.0UWTDLOkFT2t5LzvJN9: Download mapreduce-example-normal.jar and decompress it. Obtain dis.hcc-demo.0UWTDLOkFT2t5LzvJN9from the mapreduce-example-normal\resource folder.

Step 2 Upload the data and programs to OBS.

1. Log in to the HUAWEI CLOUD management console.2. In Service List, choose Storage > Object Storage Service.3. Click Create Bucket to create a bucket named mrscardata11.

mrscardata11 is only an example. The bucket name must be globally unique.Otherwise, the bucket fails to be created.

The region of the OBS bucket must be the same as the region of the MRS cluster. Thisdocument uses AP-Hong Kong as an example.

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 26

Page 29: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Figure 2-17 Creating a bucket

4. In the bucket list, click mrscardata11 to go to the Summary page. In the leftnavigation pane, click Objects. The page for uploading objects is displayed.

5. Click Create Folder to create the program, pro, output, input, andfrontendcollector folders. Figure 2-18 shows the folders.

– program: stores mapreduce-example-normal.jar.

– pro: stores the co.csv file.

– input: stores dis.hcc-demo.0UWTDLOkFT2t5LzvJN9.

– output: stores job output files.

– frontendcollector: stores the output file recording vehicle travel paths.

Figure 2-18 Folder list

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 27

Page 30: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

6. Go to the program folder and click Upload Object. In the Upload Objectdialog box, click Add File, select the program downloaded in Step 1 from thelocal host, and click Upload. Wait until the object is successfully uploaded.

Figure 2-19 Uploading objects

– Go to the pro folder and upload the co.csv file.– Go to the input folder and upload the dis.hcc-demo.

0UWTDLOkFT2t5LzvJN9 file.

Step 3 Log in to the MRS management console. In the left navigation pane, chooseClusters > Active Clusters and click the cluster named mrs_analysis.

Step 4 On the cluster information page, click the Jobs tab. On the Jobs tab page, clickCreate. The Create Job page is displayed.

Step 5 On the Create Job page, set the following parameters according to Figure 2-20and click OK. Wait until the job is successfully executed.● Set Type to MapReduce.● Set Name to Preprocess.● Set Program Path to the path of the mapreduce-example-normal.jar

program on OBS, for example, s3a://mrscardata11/program/mapreduce-example-normal.jar.

● In Parameters, enter the following information:com.huawei.mrs.preprocess.Preprocess -Dfs.s3a.access.key=xxxxxx -Dfs.s3a.secret.key=xxxxxx s3a://mrscardata11/input/ /tmp/output/2018-06-10 2019-12-30 0 s3a://mrscardata11/pro/co.csv

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 28

Page 31: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

– The time range that you enter in Parameters must include the time when youcreate the job. For example, if you created the job on June 15, 2018, the time rangemust start before and end after June 15, 2018.

– The OBS bucket name in the s3a://mrscardata11/input/ parameter must bereplaced with the name of the bucket you create.

– The OBS bucket name in the s3a://mrscardata11/pro/co.csv parameter must bereplaced with the name of the bucket you create.

– To obtain the AK/SK, perform the following steps:

1. Log in to the HUAWEI CLOUD management console.

2. Click the username in the upper right corner and choose My Credentials.

3. On the displayed My Credentials page, click the Access Keys tab.

4. Click Create Access Key to create a key. Enter the password and verificationcode as prompted. The browser automatically downloads the credentials.csvfile. The file is in CSV format and separated by commas (,). In the file, themiddle part is AK and the last part is SK.

● You do not need to set service parameters.

Figure 2-20 Parameter configuration of the Preprocess job

Step 6 After the Preprocess job is executed, you can view that the job status isCompleted and the execution result is Successful in the job list. See Figure 2-21.

Figure 2-21 Successful job execution

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 29

Page 32: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

Step 7 Repeat Step 5 to create a job named Traffic_statics to analyze checkpoint traffic.See Figure 2-22.● Set Type to MapReduce.● Set Name to Traffic_statics.● Set Program Path to the path of the mapreduce-example-normal.jar

program on OBS, for example, s3a://mrscardata/program/mapreduce-example-normal.jar.

● In Parameters, enter the following information:com.huawei.mrs.portflowstatistics.FlowCollector -Dfs.s3a.access.key=xxxxxx -Dfs.s3a.secret.key=xxxxxx /tmp/output/ s3a://mrscardata11/output/

The OBS bucket name in the s3a://mrscardata11/output/ parameter must bereplaced with the name of the bucket you create.

● You do not need to set service parameters.

Figure 2-22 Parameter configuration of the Traffic_statics job

Step 8 After the Traffic_statics job is executed, you can view that the job status isCompleted and the execution result is Successful in the job list. See Figure 2-23.

Figure 2-23 Successful job execution

Step 9 Repeat Step 5 to create a job named Movement_path to analyze vehicle travelpaths. See Figure 2-24.

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 30

Page 33: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

● Set Type to MapReduce.

● Set Name to Movement_path.

● Set Program Path to the path of the mapreduce-example-normal.jarprogram on OBS, for example, s3a://mrscardata/program/mapreduce-example-normal.jar.

● In Parameters, enter the following information:com.huawei.mrs.pathflowstatistics.FrontEndCollector -Dfs.s3a.access.key=xxxxxx -Dfs.s3a.secret.key=xxxxxx /tmp/output/ s3a://mrscardata11/frontendcollector/

The OBS bucket name in the s3a://mrscardata11/frontendcollector/ parameter mustbe replaced with the name of the bucket you create.

● You do not need to set service parameters.

Figure 2-24 Parameter configuration of the Movement_path job

Step 10 After the Movement_path job is executed, you can view that the job status isCompleted and the execution result is Successful in the job list. See Figure 2-25.

Figure 2-25 Successful job execution

----End

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 31

Page 34: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

2.2.8 Displaying Data and Reports on the Web UIStep 1 Purchase a Windows ECS and bind it to an EIP.

The region and AZ of the Windows ECS are CN North-Beijing4 and AZ2,respectively. The VPC of the Windows ECS must be the same as the VPC of theMRS streaming cluster. For details, see Purchasing and Logging In to a WindowsECS.

This document uses AP-Hong Kong as an example. If you want to perform operations inother regions, ensure that all operations are performed in the same region.

Step 2 Log in to the Windows ECS remotely and download and install the JDK 8 andTomcat 7 on Windows.

After the JDK 8 is installed, you need to set environmental variables.

Step 3 Download and copy the hccDemo.war package to the /webapps directory in theTomcat installation path.

Download link of the hccDemo.war package: https://mrs-obs-ap-southeast-1.obs.ap-southeast-1.myhuaweicloud.com/mrs-demon-samples/demon/hccDemo.war

Step 4 Start Tomcat to automatically decompress the hccDemo.war package to thewebapps directory. The hccDemo directory is automatically generated.

Step 5 Go to the webapps/hccDemo/WEB-INF/classes directory and modify the valuesof the configuration items in the app.conf file to the actual values. The commentsin the app.conf file provide details about the configuration items.

The configuration items to be modified are fs.s3a.endpoint, ak, sk, queue, as wellas ip1, ip2, and ip3 in kafka.bootstrap.servers.

# Endpoint corresponding to the OBS bucket storing the output of the MapReduce job for offline vehicle analysis. You can view the endpoint on the basic information page of the OBS bucket.fs.s3a.endpoint=xx.xx.xx.xxfs.s3a.path.style.access=truefs.s3a.connection.ssl.enabled=false# AK and SKak=xxxxxxsk=xxxxxxxxxxxxxxxxxxxxxxxxxxx# OBS path storing the output of the MapReduce job for offline vehicle travel path analysiscar.movemement.path=/frontendcollector/carmovement# Name of the OBS bucket storing the output of the MapReduce job for offline vehicle travel path analysisqueue=cartest# OBS path storing the output of the MapReduce job for offline checkpoint traffic analysistraffic.stat.path=/output/flowsum# OBS path storing the output of the MapReduce job for offline checkpoint traffic analysistraffic.stat.time.interval.path=/output/flow

#Traffic stat configuration# Checkpoint traffic analysis: High traffic thresholdtraffic.stat.high=52354# Checkpoint traffic analysis: Medium traffic thresholdtraffic.stat.medium=45383#set -1 to fetch all the recordstraffic.total.record=1000#Car movement configuration# Vehicle travel path analysis: High thresholdcar.movement.high=24239# Vehicle travel path analysis: Low threshold

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 32

Page 35: Best Practices - support.huaweicloud.com · Best Practices Contents 2020-01-19 ii. 1 Using Spark to Analyze IoV Drivers' ... perform operations in other regions, ensure that all operations

car.movement.low=18979#set -1 to fetch all the recordscar.movement.total.record=1000#Hight Traffic report configuration# Real-time traffic monitoring and analysis: High traffic thresholdhigh.traffic.high=25# Real-time traffic monitoring and analysis: Low traffic thresholdhigh.traffic.medium=10#General Configurations# File of mappings between longitude and latitude coordinates and checkpoint namescoordinate.file=coordinate-shanghai-with-name.csv# Indicates whether to use the hard-coded data created by yourself.use.dummy.data=false//kafka or dis# kafka. Indicates whether real-time analysis data is through dis or kafka. The current value is kafka.real.time.data.provider=kafka#Kafka related changes# Broker list of Kafka, separated by commas (,). Ensure that the security group rule of the corresponding port is enabled.kafka.bootstrap.servers=ip1:9092,ip2:9092,ip3:9092# Topic storing data of fake plate vehicleskafka.fakeCar.topic=fake_car_output# Topic storing checkpoint traffic datakafka.highTraffic.topic=sum_car_outputkafka.fakeCar.partition=0kafka.highTraffic.partition=0kafka.highTraffic.max.record=1000kafka.fakeCar.max.record=1000kafka.poll.timeout=500kafka.batchNumber=100#property starting with kafka.property will be used while constructing KafkaConsumerkafka.property.group.id=kafkakafka.property.enable.auto.commit=truekafka.property.auto.commit.interval.ms=1000

Step 6 Restart Tomcat.

Step 7 Enter http://{EIP}:8080/hccDemo in the address box of the browser to access thehomepage of Big Data in Smart Transportation. See Figure 2-26.

Figure 2-26 Homepage of Big Data in Smart Transportation

----End

MapReduce ServiceBest Practices

2 Using MRS to Analyze Traffic and Vehicle StatusData

2020-01-19 33