call for proposals - huawei/media/corporate/pdf/hirp/2016--big dat… · datacompass: splunk-like...
TRANSCRIPT
HIRP OPEN 2016 Big Data & Artificial Intelligence
1
Call for Proposals
Big Data & Artificial Intelligence
HIRP OPEN 2016
HIRP OPEN 2016 Big Data & Artificial Intelligence
2
Copyright © Huawei Technologies Co., Ltd. 2015-2016. All rights reserved.
No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd.
Trademarks and Permissions
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective holders.
Confidentiality
All information in this document (including, but not limited to interface protocols, parameters, flowchart and formula) is the confidential information of Huawei Technologies Co., Ltd and its affiliates. Any and all recipient shall keep this document in confidence with the same degree of care as used for its own confidential information and shall not publish or disclose wholly or in part to any other party without Huawei Technologies Co., Ltd’s prior written consent.
Notice
Unless otherwise agreed by Huawei Technologies Co., Ltd, all the information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute the warranty of any kind, express or implied.
Distribution
Without the written consent of Huawei Technologies Co., Ltd, this document cannot be distributed except for the purpose of Huawei Innovation R&D Projects and within those who have participated in Huawei Innovation R&D Projects.
Application Deadline: 09:00 A.M., 18th July, 2016 (Beijing Standard Time, GMT+8).
If you have any questions or suggestions about HIRP OPEN 2016, please send Email
([email protected]). We will reply as soon as possible.
HIRP OPEN 2016 Big Data & Artificial Intelligence
3
Catalog
HIRPO20160601: Large Scale Heterogeneous Data Processing ............................................. 4
HIRPO20160602: Research on Techniques for Financial Anti-Fraud System ......................... 8
HIRPO20160603: Research on Anomaly Detection for Multiple Dimensional Data ............... 11
HIRPO20160604: Low Latency Storage for Stream Data ....................................................... 14
HIRPO20160605: Research on SDN&NFV Network Maintenance Dystem Architecture and
Technology .............................................................................................................................. 21
HIRPO20160606: Novel Algorithm Design and Use Cases for Data Stream Mining based on
StreamDM ................................................................................................................................ 25
HIRPO20160607: Communication Network Model Research based on AI Technique .......... 28
HIRPO20160608: Deep Learning based Robotic Perception ................................................. 30
HIRPO20160609: Deep Learning based Human Visual Characteristics Research ................ 34
HIRPO20160610: Deep Learning based Scene Understanding ............................................. 38
HIRPO20160611: Manufacture Quality Risk Analysis & Prediction based on Test Data ....... 42
HIRPO20160612: Behavior Analytics for Personalized Mobile Services ................................ 44
HIRP OPEN 2016 Big Data & Artificial Intelligence
4
HIRPO20160601: Large Scale Heterogeneous Data
Processing
1 Theme: Big Data & Artificial Intelligence
2 Subject: resource scheduling
3 Background
Big data analytics have become a necessity to the business worldwide.
Modern data centers host huge volumes of data, stored over large number of
nodes with multiple storage devices and process them using thousands of
cores. Cloud computing has been revolutionizing the IT industry by adding
flexibility to the way IT is consumed, enabling organizations to pay only for the
resources and services they use. In an effort to reduce IT capital and
operational expenditures, organizations of all sizes are using Clouds to provide
the resources required to run their applications. Organizations are using cloud
platforms to run scalable analyses on their data, gaining insight into the health
of their systems and the activities of their customers.
To improve performance and cost-effectiveness of a data analytics cluster in
the cloud, the data analytics system should account for heterogeneity of the
environment and workloads. Data analytics workloads have heterogeneous
resource demands because some workloads may be CPU intensive whereas
others are I/O-intensive. Some of them might be able to use special hardware
like GPUs to achieve dramatic performance gains.
It is also likely that the computing environment is heterogeneous. The cloud
consists of generations of servers with different capacities and performance;
therefore, various configurations of machines will be available. For example,
HIRP OPEN 2016 Big Data & Artificial Intelligence
5
some machines are more suitable to store large data whereas others run
faster computations.
The key question is how to schedule jobs on machines so that each receives
its “fair” share of resources to make progress while providing good
performance.
4 Scope
Possible research topics include:
Resource modeling, including:
o Support for describing the resource information for resource in
the heterogeneous environment;
o Support for describing the need for resource request in the
heterogeneous environment;
o Support the mapping between resource and resource request;
o Support the data locality information description;
o Support the environment topology information description.
Resource information collection
o Support for plug-n-play resource to be added in heterogeneous
environment, e.g. a new resource type could be automatically
recognized during resource information collection;
o Automatic collect resource information in a heterogeneous
environment;
o Automatic schema construction;
o Support large scale (10K nodes).
HIRP OPEN 2016 Big Data & Artificial Intelligence
6
Basic resource scheduling
o Support the cost properties association with resource;
o Support the performance/capability properties association with
resource;
o Support Cost Based Optimizer model for resource scheduling;
o Support fair based model for resource scheduling.
Intelligent resource scheduling
o Workload resource consumption and runtime predication in
heterogeneous environment, for instance, running workload in
bare metal vs VM, ARM vs x86, private cloud vs public cloud, or
with different size of allocation, performance and etc.;
o SLA expression to translate to workload resource and runtime
demand, uses business objective rather than static resource
requirement to express goal;
o Intelligent scheduling to choose the best allocation by balancing
multiple objectives, SLA (time, performance), cost etc in
heterogeneous environment.
5 Expected Outcome and Deliverables
Demo system of a resource management/scheduling system for
heterogeneous environment;
Simulation tool for performance evaluation;
One or more patents.
HIRP OPEN 2016 Big Data & Artificial Intelligence
7
6 Acceptance Criteria
The demo system should support complex resource type in a heterogeneous
environment and provide way to add new resource type in the system without
code changes;
The system design must support 10K nodes scale and could be proved via
simulation;
One or more patent ideas accepted by Huawei.
7 Phased Project Plan
Expected project Duration (year): 1 year.
Project Phase
Duration Content Objective Output
Phase 1 ~6 months
Technical analytics and Solution design and
Finish solution design.
Solution design documents.
Phase 2 ~6months Demo system implement
The enterprise Hadoop cluster can use public cloud computing resources smoothly.
Prototype demo
Click here to back to the Top Page
HIRP OPEN 2016 Big Data & Artificial Intelligence
8
HIRPO20160602: Research on Techniques for
Financial Anti-Fraud System
1 Theme: Big Data & Artificial Intelligence
2 Subject: financial fraud detection technology
3 Background
Financial fraud is a broad term with various potential meanings, but for our
purposes it can be defined as the intentional use of illegal methods or practices
for the purpose of obtaining financial gain. Credit card fraud is one of popular
financial frauds, which has many types of credit card fraud, including never
received fraud, account take-over fraud, lost or stolen fraud, counterfeit fraud,
card not present. In addition, fraudsters are very inventive, fast moving people.
They are continually refining their methods, and as such there is a requirement
for detection methods to be able to evolve accordingly. Therefore, detecting
financial fraud is a difficult task.
Traditional anti-fraud methods are mostly based on rules defined by human
analyst via investigating cases using their intuition, experience, and domain
knowledge. But human rules require significant human efforts to identify the
fast moving patterns (concept drift) of fraudulent activity by modifying existing
rules or adding new rules. Thus, machine learning based method, which could
adaptively combat the concept drift, has become of importance whether in the
academic or business organizations currently.
HIRP OPEN 2016 Big Data & Artificial Intelligence
9
4 Scope
Problem to be resolved
Identify the fraudulent credit card transactions based on machine learning
based fraud detection methods. Two techniques should be studied:
P1: Anomaly detection technique to identify the abnormal transactions
based from different aspects, e.g. account, equipment, location,
behavior, relationship, preference;
P2: Predictive modeling techniques based on the labeled data to
identify the fraudulent transactions, including, but not limited to, neural
networks, logistic regression, deep learning, ensemble methods.
Additionally, these techniques should handle three characteristics of
the data: concept drift, class imbalance, and cost-sensitivity.
5 Expected Outcome and Deliverables
Design documents and prototype demo of the solution reaching the
acceptance criteria in session 6.
6 Acceptance Criteria
For P1 and P2, the scoring should be finished within 20
milliseconds(ms) for one transaction with 100 dimensions;
For P2, true positive rate>90%; false positive rate<5%; gross losses
rate<0.2%.
7 Phased Project Plan
Expected project Duration (year): 1 year.
Project Phase Duration Content Objective Output
HIRP OPEN 2016 Big Data & Artificial Intelligence
10
Project Phase Duration Content Objective Output
Phase 1 ~4 months Finished tasks of P1
(1) The scoring should be finished within 20ms for one transaction with 100 dimensions
Algorithm design documents, prototype demo and the code
Phase 2 ~8 months Finished tasks of P2.
(1) The scoring should be finished within 20ms for one transaction with 100 dimensions
(2) True positive rate>90%; false positive rate<5%; gross losses rate<0.2%
Algorithm design documents, prototype demo and the code
Click here to back to the Top Page
HIRP OPEN 2016 Big Data & Artificial Intelligence
11
HIRPO20160603: Research on Anomaly Detection for
Multiple Dimensional Data
1 Theme: Big Data & Artificial Intelligence
2 Subject: data anomaly detection
List of Abbreviations
DataCompass: Splunk-like platform for analyzing machine data
HDD: Hard Disk Drive
SVM: Support vector machine, a machine learning method
3 Background
Traditional tools are able to handle the 3V (variety, velocity, volume) of the
machine data, i.e. logs, configuration, message queues, thus Datacompass is
designed. One of important features is supporting interactive analyzing the
data, which denotes DataCompass has real-time or near real-time capability.
Datacompass will be applied to the scenario of the public cloud (Deutsche
Telecom) operation. Datacompass requires some statistical or machine
learning methods to automatically detect anomaly in the data to reduce the
human efforts.
This is not the only application domain that could benefit for such mechanism.
Examples of such application domains are: banking – detecting payments
behavior which deviate from normal customer(s) patterns which can indicate
frauds or money laundry schemas; hardware failure – detecting that physical
machines in a data center will crash by observing that certain metrics are
HIRP OPEN 2016 Big Data & Artificial Intelligence
12
indicators of near failures (e.g. the heat of the HDD is increasing continuously
over 1 hour might show that a HDD crash is expected).
From the point of view of the analytics tools, the current state of the art is that
anomaly detection methods, being able to support interactive analysis, only
focus on the low dimensional data. The methods which could handle high
dimensional methods, such as k-means, one class SVM, needs a large
computational cost and have NOT real-time or near real-time capability.
Therefore new mechanisms are needed to be able to provide detection of
anomalies and of abnormal patterns online.
4 Scope
Finding patterns in multiple dimensional data that do not conform to expected
behavior in real time or near real time, especially for the high-dimensional
data.
P1: Anomaly detection for 1-D time series.
P2: Anomaly detection for low dimensional data being less than 100
dimensions (without and with an order). Note that ordered multiple
dimensional data are a.k.a multiple time series.
P3: Anomaly detection for high dimension data being greater than 100
(without and with an order).
P4: Online anomaly detection for the data less than 20 dimensions.
5 Expected Outcome and Deliverables
Design documents and prototype demo of the solution reaching the
acceptance criteria in session 6.
HIRP OPEN 2016 Big Data & Artificial Intelligence
13
6 Acceptance Criteria
For P1~P3, the model building (learning) should be finished within 1s,
2.5s, and 5s, respectively;
For P1~P4, the scoring should be finished within 10 milliseconds(ms);
For P1~P4, for the data with known anomalies, true positive rate>90%,
false positive rate<5%.
7 Phased Project Plan
Expected project Duration (year): 1 year.
Project Phase
Duration Content Objective Output
Phase 1
~4 months Finished tasks of P1 and P2.
(2) The model building should be finished with 1 and 2.5 seconds for P1 and P2, respectively.
(3) The scoring should be finished within 10 ms for one record
(4) true positive rate>90%; false positive rate<5%
Algorithm design documents, prototype demo and the code
Phase 2
~8 months Finished tasks of P3 and P4.
(1) The model building should be finished with 5 seconds for P3.
(2) The scoring should be finished within 10 ms for one record for both P3 and P4
(3) true positive rate>90%; false positive rate<5%
Algorithm design documents, prototype demo and the code
Click here to back to the Top Page
HIRP OPEN 2016 Big Data & Artificial Intelligence
14
HIRPO20160604: Low Latency Storage for Stream Data
1 Theme: Big Data & Artificial Intelligence
2 Subject: research on hadoop HDFS and related
stream tools from big data ecosystem
List of Abbreviations
HDFS: Stands for Hadoop Distributed File System and is the de facto storage
used for Big Data processing.
Kafka: Is the message broker tool most commonly used for streaming.
RamCloud: The message broker tool most commonly used for streaming.
RAMCloud: A storage system design for super-high-speed storage for
large-scale datacenter applications.
3 Background
An increasing number of Big Data applications and scenario require to deal
with increasing amounts of small data. This trend is easily observed in
domains like finance, whether forecast, IoT, insurance or social networks. In
many related applications and systems such small items are continuously
collected from the stream sources or are received from other stream
processing computation. Even if the stream engines running the applications
are processing such stream data on the fly, by passing it through the topology
of stream operators, there is an increasing need to be able to store such items
efficiently.
HIRP OPEN 2016 Big Data & Artificial Intelligence
15
Unlike traditional storage, the main challenge raised when aiming to store
stream data is the large number of small items. Additionally, as the nature of
the computation required for such data is time-critical, the data once stored
needs to be accessible with very high performances (i.e., low latencies). This
makes the existing storage options, such as the ones available in the Hadoop
ecosystem, mostly unfit for such scenarios as they cannot fully meet
out-of-the-box all performance requirements. HDFS, the default Big Data
storage, was not design as stream storage and cannot thus provide the
sub-second performance required by such applications. Using other solution
that target streaming will lead to hybrid architectures, which introduce extra
dependencies, increase the O&M complexity and can lead to less reliable
solutions. Moreover some of these stream data management solutions, such
as Kafka, are not proper storage systems (e.g., can only hold data temporarily)
and provide limited data access semantics (e.g., access data only based on its
offset) that drastically reduce the search performance.
All these issues point to the need that an own dedicated solution for low
latency stream storage is needed. Such a solution should provide on the one
hand traditional storage functionality and on the other hand stream-like
performance (i.e., low latency IO access to items/range of items). This shows
the necessity for an extensive research study to explore architectural options
to provide such storage for stream data, either as a standalone component or
as an extension of an existing Big Data solution (e.g., HDFS). The latter option
is preferred considering the benefits of having a unified storage system for all
types of Big Data.
4 Scope
The goal of the project is to provide a solution for a low latency storage for
stream data. The ideal solution would propose an extension to HDFS that
HIRP OPEN 2016 Big Data & Artificial Intelligence
16
would bring to the system high performance when storing stream data (e.g.,
billions of small items) and when accessing the data either via scans or
random access (e.g., millisecond or ideally nanosecond access time to retrieve
data). This is shown in Figure 1. To achieve these we expect that good
practice that exists in other systems that demonstrated good performances will
be ported to HDFS (alongside with novel techniques which might be needed),
such as: Kafka partitioning and ability to deal with billions of items, RamCloud
and DXRam techniques of managing data in distributed caches to enable ns
access time across large collections of items, etc. Additionally, as Big Data
processing is being migrated quite often to the cloud, it is necessary that the
solution is compatible with such infrastructures, is able to provide similar
performance as in Big Data clusters and ultimately can be used as a service.
The main research questions, issues and requirements to be addressed are:
How can HDFS be extended to support billions of small items;
How can it be tuned to enable high performance data access in large
collection of small items (e.g. ms or ns IO access for scan, range
queries and random access);
How can the performance be guaranteed at increasing scales
Can the stream storage (add-on) work as an application library to share
data across the nodes of the distributed application as well as
standalone or within HDFS and be accessible from other application
(e.g. RMC, REST, API…);
What are the best partition techniques, data placing and search
strategies for stream storage;
Can the solution work as a cloud service with same or similar
performance.
HIRP OPEN 2016 Big Data & Artificial Intelligence
17
Figure 1 Overview of the various features to be added to HDFS to properly support stream
storage
5 Expected Outcome and Deliverables
The expected outcomes for the project are:
1) Evaluation study (i.e., in the form of a scientific paper) of existing Big Data
tools and their performance ability to support stream storage and the
performance of IO operations. The goal is to identify the best practices and
techniques to tackle the requirements for stream storage;
2) Architecture option for extending HDFS for stream storage while providing
low latency IO for scans, range queries and random access. The
architecture should be scalable to a variable number of data nodes;
3) The system should be design to run at scale. At such it needs to support
multiple metadata nodes (i.e., namenodes for HDFS). Additionally with the
HIRP OPEN 2016 Big Data & Artificial Intelligence
18
increase of the data nodes, the system should allow an increase in the
number of supported objects (e.g., 2x data nodes means 2x more objects)
with the same performance targets;
4) A library or a software system that implements the architecture of point 2;
5) If point 2 cannot be meet, a new stand-alone solution should be provided
that can run both as standalone solution and as an application library for
storing stream data and a strong scientific motivation that argues the
limitation of HDFS to be extended to such scenarios.
6 Acceptance Criteria
The acceptance criteria with respect to the outcomes are:
1) Study meets the academic norms of paper/technical writing and
research analysis. The study analytics considers at least 4 existing
solutions (e.g., kafka, kudu, Hbase, DXRam, RamCloud, redis.io…) and
identifies limitations as well as best practices and techniques to be used
for stream storage;
2) The architecture is HDFS compatible and is able to support billion+
objects and enables at most millisecond access to elements for range
queries and random access. A description of the data partition
techniques and search strategies to be used, which are compatible with
HDFS. The solution can work with systems that have 1,2 or more
namenodes;
3) Demo system for the implementation of the architecture described in
point 2 which demonstrates the performance of storing: at least 1 billion
objects and enable ms access to random access and to range queries.
The system should demonstrate that it enables scalability by running
the demo on various setups (5, 10 and more nodes if available);
HIRP OPEN 2016 Big Data & Artificial Intelligence
19
4) Through and strong argumentation of why point 2 cannot be meet (why
HDFS cannot be extended to support stream storage). An alternative
design solution to point 2, that meets all the performance requirements
and can work both as a standalone and as application library;
5) The solution is demonstrated to work also on cloud platforms and as a
service (stream storage as a service).
7 Phased Project Plan
Expected project Duration (year): 1 year.
Project
Phase Duration Content Objective Output
Phase 1 ~3
months
Evaluation of
existing tools
(Kafka, Kudu,
Redis,
RamCloud,
DXRam)
Identify limitations of
existing solutions
Identify best
architectural options
for stream storage
Report
Architecture
design
guidelines
Phase 2 ~2
months
Architecture
design of
stream storage
1. Identify
architecture options
for HDFS to support
both batch and stream
storage
2. If 1 cannot be
meet, provide
alternative solution
3. If 2, then identify
solutions for running
the solution both as a
stand-alone service
and as an application
library
System
architecture
Phase 3 ~7
months
The Storage for
Stream data
system
Implement the system
Implement JAVA
connectors/ APIs
The stream
storage
APIs
HIRP OPEN 2016 Big Data & Artificial Intelligence
20
Implement external
connectors (outside
application domains)
Connectors
Click here to back to the Top Page
HIRP OPEN 2016 Big Data & Artificial Intelligence
21
HIRPO20160605: Research on SDN&NFV Network
Maintenance Dystem Architecture and Technology
1 Theme: Big Data & Artificial Intelligence
2 Subject: SDN big data fault analysis
3 Background
1) Trend
SDN & NFV network technique introduce a powerful combination of changes
to bring networks into new age, which realizes automatic deployment of
network business, efficient and reliable network operation, and lower the
CPEX. In the global view, SDN & NFV network will be widely applied in
commercial deployment from 2016 to 2020. Many carriers and internet
companies will deploy SDN & NFV networks in different scenarios by their own
plan, and the SDN &NFV network will enter the mature period.
2) Challenge values
For the automation SDN&NFV network, its maintenance system must be
automatic, visual and intelligent. The maintenance system have many
components such as data collection system, data storage & access system,
data visualization, data analysis, fault diagnosis, fault recovery, etc. The key
technique is as follow:
Standardization data collection, including measure data (KPI) and description
data (system log), which can realize efficient data collection, pre-processing,
data transformation.
HIRP OPEN 2016 Big Data & Artificial Intelligence
22
Intelligent fault diagnosis algorithms. The algorithms detect fault and send
alarm when single net unit faults, locate fault device from a network or a link,
check the correctness of configuration, forecast the possible fault in the
network or a device.
Intelligent network recovery system. The system combines expert experience
and machine learning method, which can recommend recovery solution when
network is abnormal.
Different from traditional network, SDN&NFV network is in evolution, many
potential problems are not exposed in commercial operation. The goal of this
project is to investigate and explore the possible network maintenance
technique based on big data analysis technique for future carrier network. The
research will have profound and valuable impact on the evolutionary network
technique both in industry and academic.
4 Scope
The scope of the project contains two key directions: SDN&NFV network
running data collection technique and data based fault diagnosis & recovery
technique. The content of the research includes, but is not limited to, the
following parts:
1) Data collection:
Definition of standard data format, including key performance index
(KPI), log, configuration;
Design and analysis of data collection system, including system
framework, collecting technique, data transformation technique;
Design and analysis of data storage system, including system
framework, access method, and data transfer technique.
2) Data based fault diagnosis & recovery:
HIRP OPEN 2016 Big Data & Artificial Intelligence
23
Data based fault diagnosis technique for network unit;
Data based fault diagnosis technique for network level;
Data based fault forecasting technique;
Fault correlation analysis technique, analyzing the correlation for
different types of fault;
Fault recovery intelligent recommendation technique, experience based
intelligent recommendation technique.
5 Expected Outcome and Deliverables
The deliveries of the project include, but are not limited to the following:
1) Research reports on SDN&NFV data formation standard definition, data
collection, data storage & access, fault diagnosis & recovery applications
traffic patterns, scenarios, new carrier network architecture, network security,
etc;
2) Possible prototype on SDN&NFV data collection system, data storage &
access system, or fault diagnosis & recovery algorithms;
3) Publications in peer-reviewed Journals or top ranked conferences, and/or
invention/ patents on SDN&NFV impact on carrier network, network related
technology innovation.
6 Phased Project Plan
Sta
ge
Date Work description Output Evaluation Criteria
1 ~3
mont
hs
Specify
milestones.
Thesis proposal
cover the whole
research scope.
Routine technical
1, An determined
work plan about what
should we do in this
project and how to
guaranty the
successful of the
The documents
can be accepted
by Huawei’s
Review Group.
HIRP OPEN 2016 Big Data & Artificial Intelligence
24
& work progress
meeting.
collaboration
2, Thesis proposal
3, Research report on
one or more items
described in section 5
2 ~5
mont
hs
Continuing the
research work.
Academic paper’
writing.
Prototype design
and coding.
Routine technical
& work progress
meeting.
1, Research report on
more items described
in section 5
2, Complete at least
one academic paper
3, Prototype design
document & source
code(partial )
The design
documents can be
accepted by
Huawei’s Review
Group.
3 ~4
mont
hs
Complete the
research work.
Academic paper
is accepted by
the Journals or
top ranked
conferences.
Implement
prototype for
demonstration
and verifying.
Routine technical
& work progress
meeting.
1, Research report on
all items described in
section 5
2, Complete all
papers
3, Complete
prototype
1, Finish the
prototype
implementation,
complete the
prototype’s coding,
testing, verifying,
and related report
2, Hold an
associated
workshop or attend
a SDN&NFV
related summit, on
which make an
open speech or
demonstration.
3, The paper
published in
peer-reviewed
Journals or top
ranked
conferences
Click here to back to the Top Page
HIRP OPEN 2016 Big Data & Artificial Intelligence
25
HIRPO20160606: Novel Algorithm Design and Use
Cases for Data Stream Mining based on StreamDM
1 Theme: Big Data & Artificial Intelligence
2 Subject: stream mining/real-time machine learning
3 Background
Data mining techniques consume a large amount of resources since they need
to do many iterations during the learning phase, while data stream mining
techniques only use one pass over data, and due to that are more challenging.
Currently, more and more companies use stream mining to process large
quantities of data in real-time, to build incremental model to help business
units. At the end of 2015, Huawei Noah’s Ark Lab released StreamDM-- a new
real-time machine learning library built on top of Spark Streaming, including
SGD Learner, Naïve Bayes, Hoeffding Tree, CluStream, StreamKM++ and
bagging. The motivation of StreamDM is to help industry and researchers to
have a fast solution for real time data mining cases, and we expect more
people to contribute to StreamDM, including algorithms and use cases.
4 Scope
1) We are seeking proposals of real business scenarios based on StreamDM.
These business scenarios should be challenging and interesting. They can be
deployed by universities or companies, and they will use StreamDM’s current
algorithms or new algorithms, implemented and contributed to StreamDM in
the future;
HIRP OPEN 2016 Big Data & Artificial Intelligence
26
2) We are seeking proposals to implement well known distributed stream
mining algorithms based on StreamDM. The algorithms can either be well
known algorithms or new algorithms. The algorithms can be, but not limited to,
classification, clustering, frequent item mining or regression algorithms. The
algorithms should be distributed and incremental, implemented and
contributed to StreamDM;
3) This project can accept only a few numbers of proposals, each one with the
same funding. Proposals including both 1) and 2) are extremely welcome, and
proposals with potential patents will be given extra amount of funding.
5 Expected Outcome and Deliverables
Proposals of real business scenarios should include sample data, documents
and application codes that can be contributed to StreamDM at github, and
performance comparisons with other Stream machine learning APIs.
Proposals of algorithm implementation should include documents, codes and
test codes which can be contributed to StreamDM at github, and performance
comparisons with similar algorithms implemented in other Stream machine
learning APIs.
6 Acceptance Criteria
Project proposal is accepted by the evaluation team, Huawei;
Project deliverables are accepted by the evaluation team, Huawei;
Documents and codes are merged to StreamDM at github.
7 Phased Project Plan
1) Proposals of real business scenarios
HIRP OPEN 2016 Big Data & Artificial Intelligence
27
Phase1 (~3 months): Scenarios detailed description, solution of use cases;
Phase2 (~6 months): Codes, detailed documents;
Phase3 (~3 months): Performance test results and Pull request and merge to
StreamDM at github.
2) Proposals of algorithm implementation
Phase1 (~3 months): Algorithms design documents;
Phase2 (~6 months): Algorithms codes and test codes;
Phase3 (~3 months): Performance test results and Pull request and merge to
StreamDM at github.
Proposals of 1) or 2) have extra patent, the patent should be finished before
T+9.
Click here to back to the Top Page
HIRP OPEN 2016 Big Data & Artificial Intelligence
28
HIRPO20160607: Communication Network Model
Research based on AI Technique
1 Theme: Big Data & Artificial Intelligence
2 Subject: architecture and resource management
List of Abbreviations
AI: Artificial Intelligence
3 Background
With the development of machine learning, artificial intelligence becomes a hot
research area again. Another change in communication world is that the
communication object is from the relationship between humans to the
relationship between machines (M2M). The network and configuration will be
more and more sophisticated, so the AI based technology is a preferred
solution for network measurement and management. If this technology used,
analysis of communication network model is very important.
4 Scope
Survey on the use case for AI technology in wireless communication
networks;
Research on the communication network model using AI technology;
For the special use case, give the detailed algorithm design and analysis;
Verify the AI algorithm effect on the communication network.
HIRP OPEN 2016 Big Data & Artificial Intelligence
29
5 Expected Outcome and Deliverables
1 survey reports on key technology of artificial intelligence using wireless
communication;
1-2 research reports on key technology of artificial intelligence, including
candidate schemes of optimal technology used in wireless communication;
1 design/analysis reports and verification about key technology of artificial
intelligence using wireless communication, such as system architecture;
1-2 patents and 1 publication submission.
6 Acceptance Criteria
Survey Report: Comprehensive study of the subject;
Research Report/Design Report: Technical solution can be implemented.
Clear technological advancement can be proved. Clear advancement can be
proved;
Patent Proposal: Patent proposals are evaluated and accepted by the internal
Huawei patent evaluation;
Publication: Paper written and submitted to a prestigious conference.
7 Phased Project Plan
Phase 1 (~2 months): Survey on key technology of artificial intelligence using
wireless communication, including industry and academia area;
Phase 2 (~7 months): Research on key technology of artificial intelligence
using wireless communication, including architecture design, model selection,
algorithm design and so on.
Phase 3 (~3 months): Verification of the proposed architecture and technology.
Click here to back to the Top Page
HIRP OPEN 2016 Big Data & Artificial Intelligence
30
HIRPO20160608: Deep Learning based Robotic
Perception
1 Theme: Big Data & Artificial Intelligence
2 Subject: computer vision
List of Abbreviations
GPU: Graphics Processing Unit
3 Background
The resurgence of neural networks, most prominently in the form of deep
learning (DL), has recently led to significant technological advances in image
understanding, speech recognition, and natural language processing. In
computer vision, supervised deep learning models of the Convolutional Neural
Network (CNN) family have led to significant error reduction on large-scale
classification tasks, due to their hierarchical nature and to the directness of
their feature and classifier learning. Since 2012, deep learning methods, in
particular those based on CNN, have greatly improved the performances of
traditional computer vision tasks including image classification, object
recognition, object detection, edge detection, face recognition, image
denosing/super-resolution, image quality assessment, tracking and event
recognition. The rapidly improved availability and accessibility of large-scale
Internet images/videos, in particular from mobile platforms, has greatly
facilitated the CNN training process with GPU-powered massive parallel
computing platforms, making training tens millions of parameters in CNN
practically a feasibility (training time ranges between a few hours to a couple of
weeks).
HIRP OPEN 2016 Big Data & Artificial Intelligence
31
Computer vision technologies have become increasingly mature, being used in
real products including self-driving, image search, smart-phone applications,
surveillance and security, robotics, and smart home applications. Internet
powerhouse companies have made a big effort in investing in developing deep
learning technologies with large support of human, machine, and data
resources.
It is evident that deep learning technologies have led to the recent
breakthroughs in both academia and industry, creating intelligent products that
greatly enhance the quality of human lives. Therefore, the emphasis on
advancing in deep learning and computer vision is a must. Not only will areas
like conventional mobile terminals and intelligent monitoring be enhanced with
the emerging deep learning technology, next-generation products, such as
household robots, and driverless cars are expected to function primarily based
on deep learning.
Object detection and image recognition are considered as central problems in
computer vision. They are the building blocks for other complex vision systems
that consist of a suite of individual modules to make a real product. Specifically,
visual object recognition goes beyond determining whether the image contains
instances of certain object categories. It also refers to attributes of objects
such as location, pose and so on, making it a challenging task in computer
vision. Progress in CNN-based methods sped the development of image
classification, object detection, and semantic segmentation. However, there
still exists gaps between what many of these methods can do and what is
required in real-world situations, in term of speed, performance, and demand
in power and memory. Thus, we are motivated to go deeper into the structure
of deep models and optimize the object detection and recognition algorithms.
In addition, we also hope to transfer knowledge from the seen to unseen object
classes to improve the adaptiveness of the future perception systems.
HIRP OPEN 2016 Big Data & Artificial Intelligence
32
4 Scope
Strategic cooperation: Give regular academic and technical reports.
Efficient object detection and recognition: Exploit structural properties of neural
networks and develop an efficient deep learning based object detection and
recognition algorithm without compromising speed and accuracy.
Robot self-learning: Explore unsupervised or weakly-supervised learning
algorithms to improve the intelligence level of robotic perception. For example,
solve the unknown categories recognition task, which is commonly
encountered in robots scenarios. Or, the robot can learn to guide itself around
the house.
5 Expected Outcome and Deliverables
Establish the technology accumulations, research capabilities and algorithm
systems on deep learning based object detection and recognition.
1) Software and prototype deliverables: Efficient object detection and
recognition algorithm; Robot self-learning algorithm and application system;
2) Document deliverables: Research report for efficient object detection and
recognition algorithm; Research report for robot self-learning algorithm and
application system; Academic papers and patents;
3) Other work: Tele-conference each month for technology communication and
progress briefing; Technical report each quarter of a year.
6 Acceptance Criteria
1) Acceptance criteria: Achieve top performance on popular object detection
and recognition datasets, for example ImageNet and PASCAL VOC;
HIRP OPEN 2016 Big Data & Artificial Intelligence
33
2) Demo scenario: A typical apartment is shown in the following figure. Room
layout includes: living room, kitchen, and bedroom. There could be coffee table,
sofa, TV cabinets and other furniture in living room; table, stove in kitchen; bed
and other furniture in bedroom. There also may be objects such as plants, TV
in living room and small object such as beverage cups, mineral water on the
desk or table. In some cases there may be people moving in the house. The
whole area may be larger than 200m2.
7 Phased Project Plan
Phase1 (~6 months): Project Objectives is to develop an efficient object
detection and recognition algorithm. Achieve top performance on popular
object detection and recognition datasets, for example ImageNet and PASCAL
VOC. Deliverables List is to deliver the codes, systems and instructions of the
efficient object detection and recognition algorithm. Scenario test in Huawei
based on designed demo scheme;
Phase2 (~6 months): Project Objectives is to develop robot self-learning
algorithm and application system. Deliverables List is to deliver the robot
self-learning algorithm and application system. Scenario test in Huawei based
on designed demo scheme.
Click here to back to the Top Page
HIRP OPEN 2016 Big Data & Artificial Intelligence
34
HIRPO20160609: Deep Learning based Human Visual
Characteristics Research
1 Theme: Big Data & Artificial Intelligence
2 Subject: computer vision
List of Abbreviations
GPU: Graphics Processing Unit
3 Background
The resurgence of neural networks, most prominently in the form of deep
learning (DL), has recently led to significant technological advances in image
understanding, speech recognition, and natural language processing. In
computer vision, supervised deep learning models of the Convolutional Neural
Network (CNN) family have led to significant error reduction on large-scale
classification tasks, due to their hierarchical nature and to the directness of
their feature and classifier learning. Since 2012, deep learning methods, in
particular those based on CNN, have greatly improved the performances of
traditional computer vision tasks including image classification, object
recognition, object detection, edge detection, face recognition, image
denosing/super-resolution, image quality assessment, tracking and event
recognition. The rapidly improved availability and accessibility of large-scale
Internet images/videos, in particular from mobile platforms, has greatly
facilitated the CNN training process with GPU-powered massive parallel
computing platforms, making training tens millions of parameters in CNN
practically a feasibility (training time ranges between a few hours to a couple of
weeks).
HIRP OPEN 2016 Big Data & Artificial Intelligence
35
Computer vision technologies have become increasingly mature, being used in
real products including self-driving, image search, smart-phone applications,
surveillance and security, robotics, and smart home applications. Internet
powerhouse companies have made a big effort in investing in developing deep
learning technologies with large support of human, machine, and data
resources.
It is evident that deep learning technologies have led to the recent
breakthroughs in both academia and industry, creating intelligent products that
greatly enhance the quality of human lives. Therefore, the emphasis on
advancing in deep learning and computer vision is a must. Not only will areas
like conventional mobile terminals and intelligent monitoring be enhanced with
the emerging deep learning technology, next-generation products, such as
household robots, and driverless cars are expected to function primarily based
on deep learning.
Research on human visual characteristics includes face detection and
recognition, human detection, identification, tracking, behavior recognition, and
age estimation. Both in the future application of intelligent products and for
entertainment, the study of human visual characteristics shows greater value.
On one hand, the smart home application and a series of future scenarios, the
research on the human visual characteristics offers necessary technical
capabilities for human-computer interaction, intelligence services and other
high-level applications. On the other hand, the study of human visual
characteristics also supplies some entertainment, which can to some extent
attract users.
HIRP OPEN 2016 Big Data & Artificial Intelligence
36
4 Scope
1) Body tracking: Establish the body's real-time tracking technology and
research capabilities to address the following problem for intelligence service
robots;
2) Face attributed recognition: through the analysis of human attributes in
images / videos such as age estimation, gender estimation, clothing with
attributes and expression, provide the necessary functions for high-level
application scenarios like Smart Home and intelligent robots;
3) Face detection/recognition: Establish the ability for face learning and
recognition in home environment;
4) Human behavior recognition: recognition of variety human behaviors which
provides the necessary basic skills for high level applications like
human-computer interaction and abnormal behavior of the warning.
5 Expected Outcome and Deliverables
1) Provide the functional modules of face detection techniques and correlation
filter tracking techniques for the human following feature in the robot demo;
2) Establish the technology accumulations, research capabilities and algorithm
systems on deep learning, including face detection, face recognition, human
detection, human identification, human tracking, human behavior recognition,
age estimation, facial expression recognition and clothing assessment.
6 Acceptance Criteria
Support the human following feature in robot demo;
Work in Huawei team at least one day per month.
HIRP OPEN 2016 Big Data & Artificial Intelligence
37
7 Phased Project Plan
Phase1 (~6 months): Project Objectives is to develop face
detection/recognition and body tracking algorithms in robot. Deliverables List is
to deliver the codes, systems and instructions of the developed algorithm.
Scenario test in Huawei based on designed demo scheme;
Phase2 (~6 months): Project Objectives is to develop face attribute recognition
and human behavior recognition algorithms in robots. Deliverables List is to
deliver the systems and instructions of the developed algorithm. Scenario test
in Huawei based on designed demo scheme.
Click here to back to the Top Page
HIRP OPEN 2016 Big Data & Artificial Intelligence
38
HIRPO20160610: Deep Learning based Scene
Understanding
1 Theme: Big Data & Artificial Intelligence
2 Subject: computer vision
List of Abbreviations
GPU: Graphics Processing Unit
VQA: Visual Question Answering
3 Background
The resurgence of neural networks, most prominently in the form of deep
learning (DL), has recently led to significant technological advances in image
understanding, speech recognition, and natural language processing. In
computer vision, supervised deep learning models of the Convolutional Neural
Network (CNN) family have led to significant error reduction on large-scale
classification tasks, due to their hierarchical nature and to the directness of
their feature and classifier learning. Since 2012, deep learning methods, in
particular those based on CNN, have greatly improved the performances of
traditional computer vision tasks including image classification, object
recognition, object detection, edge detection, face recognition, image
denosing/super-resolution, image quality assessment, tracking and event
recognition. The rapidly improved availability and accessibility of large-scale
Internet images/videos, in particular from mobile platforms, has greatly
facilitated the CNN training process with GPU-powered massive parallel
computing platforms, making training tens millions of parameters in CNN
HIRP OPEN 2016 Big Data & Artificial Intelligence
39
practically a feasibility (training time ranges between a few hours to a couple of
weeks).
Computer vision technologies have become increasingly mature, being used in
real products including self-driving, image search, smart-phone applications,
surveillance and security, robotics, and smart home applications. Internet
powerhouse companies have made a big effort in investing in developing deep
learning technologies with large support of human, machine, and data
resources.
It is evident that deep learning technologies have led to the recent
breakthroughs in both academia and industry, creating intelligent products that
greatly enhance the quality of human lives. Therefore, the emphasis on
advancing in deep learning and computer vision is a must. Not only will areas
like conventional mobile terminals and intelligent monitoring be enhanced with
the emerging deep learning technology, next-generation products, such as
household robots, and driverless cars are expected to function primarily based
on deep learning.
Human can constantly observe the environment structure that surrounds
his/her. For example, when walking in the house, we recognize objects within it
and make corresponding reactions. Such capabilities help us accomplish
various tasks even in unfamiliar places. Building a system that can
automatically perform scene understanding, is a crucial prerequisite for a
variety of applications, including robot navigation, semantic mapping,
autonomous driving and human-machine interaction. Therefore, image
semantic segmentation, as the fundamental component of scene
understanding, is the key to many high-level semantic related applications. On
one hand, semantic segmentation produces highly compact representation of
images. Indexed with these representations, we can greatly improve the
efficiency of retrieving and processing. On the other hand, semantic
HIRP OPEN 2016 Big Data & Artificial Intelligence
40
segmentation lay down the foundation for other applications such as object
detection and scene captioning.
4 Scope
1) Research on semantic segmentation: investigate pixel-wise semantic
segmentation of an image, facilitating the object detection, semantic mapping
and high-level scene understanding process;
2) Research on instance semantic segmentation: not only give pixel-wise
semantic segmentation of an image, but also differentiate between objects of
the same category, i.e., instance semantic segmentation. It could be used in
fine-grained scene understanding and interaction in the future;
3) Research on VQA application scenarios: estimate objects, object
attributes and object relationships of the scene based on visual analysis and
answer questions about the scene. Exploit and design application scenarios of
VQA systems in household environment.
5 Expected Outcome and Deliverables
Algorithm, system and technical reports for semantic image segmentation;
Algorithm, system and technical reports for instance semantic segmentation;
Design report of visual question answering application system;
Academic papers and patents;
Tele-conference each month for technology communication and progress
briefing;
Technical report each quarter in Huawei.
HIRP OPEN 2016 Big Data & Artificial Intelligence
41
6 Phased Project Plan
Phase1 (~4 months): Develop a fast image semantic segmentation algorithm
with labelling no less than 20 classes of common household items. Achieve top
performance on popular semantic segmentation datasets;
Phase2 (~4 months): Develop instance level semantic segmentation algorithm
with labelling no less than 20 classes of common household items. Achieve top
performance on popular semantic segmentation datasets;
Phase3 (~4 months): Develop a visual question answering system application
system.
Click here to back to the Top Page
HIRP OPEN 2016 Big Data & Artificial Intelligence
42
HIRPO20160611: Manufacture Quality Risk Analysis &
Prediction based on Test Data
1 Theme: Big Data & Artificial Intelligence
2 Subject: predictive analysis
3 Background
Production volume goes higher significantly and cycle time reduced much
shorter.
Currently the manufacture testing quality control system is designed to trouble
shooting and fast tracking based on problem. We hope to enhance the system
to be able to identify the potential risk in advance and eliminated it in time.
4 Scope
Through real time data analysis of product test data, incoming material’s test
data, equipment status data, and test software information, to predict the risks
of potential quality fluctuations in advance;
When the quality problem of the production process occurs, it automatically
identifies the key factors which impacted the abnormal fluctuations.
5 Expected Outcome and Deliverables
Technical reports including business analysis and data mining model for test
process quality control system;
Predictive analysis system with source code and document.
HIRP OPEN 2016 Big Data & Artificial Intelligence
43
6 Acceptance Criteria
Provide sample model for 2~3 products;
The risk catch ratio higher than 70%;
The error warning ratio less than 30%.
7 Phased Project Plan
Phase1 (~4 months): Business analysis; Data analysis, data cleaning, model
training, model optimization base on sample data;
Phase2 (~3 months): Using history data from real products to validate the
model;
Phase3 (~5 months): Deploy model to production environment; Training &
transfer to the development team and deploy to production.
Click here to back to the Top Page
HIRP OPEN 2016 Big Data & Artificial Intelligence
44
HIRPO20160612: Behavior Analytics for Personalized
Mobile Services
1 Theme: Big Data & Artificial Intelligence
2 Subject: others
List of Abbreviations
OTT: Over the Top
POC: Prove of Concept
UE: User Equipment
3 Background
Considering users’ everyday reliance on smart phones, there is an
ever-increasing requirement for personalized mobile services in many aspects
of human life, such as health, education, transportation, and shopping. Thanks
to the spurt of mobile big data, e.g., call logs and location footprint, as well as a
wealth of sensors in mobile phones, e.g., gyroscope, accelerometer and light,
such personalized mobile services have become possible. For example, a
data-driven approach can be used to predict the emotional state of human by
leveraging smart phone usage data and/or location footprint, and devise
emotion-aware recommendation for shopping.
The abundance of mobile data on UE is significantly more beneficial in
understanding human behavior than social network data that has been widely
studied. Mobile data reflects the real-world behavior of human, such as
mobility, call logs and location. This could be dramatically different from social
HIRP OPEN 2016 Big Data & Artificial Intelligence
45
network data that is simply the human actions in cyber-world, which are often
faked or contrary to the actual behavior.
Furthermore, mobile data is collected in a passive sensing fashion. It will not
impact human’s normal life activities. This is especially preferable over the
conventional survey-based studies in psychology. The survey-based studies
are usually expensive in terms of time and money, and not suitable for
long-term behavior analysis.
The unique advantages of mobile data are the foundation of successful
personalized mobile services. Mobile operators can utilize their available data
and/or cooperate with OTT providers to obtain additional UE sensor data in
order to provide personalized services and enhance user experience. This will
definitely reduce customer churn and improve customer loyalty. According to
the Harvard Business School, increasing customer retention rates by 5 percent
increases profits by 25 percent to 95 percent.
4 Scope
Research on methodologies of understanding human behavior:
investigate the possible data source and how to mine those data. Focus on
one or more behaviors.
Research on applications of the behavioral understanding: investigate
how to utilize the behavioral understanding to provide personalized mobile
services. Focus on one or more examples of services.
Prototype of such a system: Huawei will provide vUIC platform if necessary
and do prototyping on top of that to extend the MBB network intelligence to
UE.
HIRP OPEN 2016 Big Data & Artificial Intelligence
46
5 Expected Outcome and Deliverables
Technical reports of general requirements and main components of
personalized service platform;
Solution proposal for understanding human behavior;
Solution proposal for providing personalized service using the behavioral
understanding;
A working prototype of such a system.
6 Acceptance Criteria
A detailed report on both item 1, 2 and 3 in section 6. A working prototype for
prove of concept.
7 Phased Project Plan
Phase1 (~2 months): Survey of existing personalized mobile services,
focusing on the data source, mining methodologies and applications;
Phase2 (~2 months): Solution proposal for understanding human behavior and
its potential personalized service;
Phase3 (~5 months): Collection of user data from mobile phones;
Phase4 (~3 months): A prototype for POC on such a system;
Huawei will be able to provide lab time on vUIC platform to facilitate the
prototype POC.
Click here to back to the Top Page