machine learning for wireless networks @bestcom2016
TRANSCRIPT
Machine learning for improving wireless network performance
Merima Kulin, Eli De Poorter, Dirk Deschrijver, Tom Dhaene and Ingrid [email protected] Internet Based Communication Networks and Services research group (IBCN)- IDLabDepartment of Information Technology (INTEC)Ghent University - imec BESTCOM, 21.10.2016., Louvain
Machine learning for wireless networks• Introduction• Data-driven design: examples• Data science in wireless networks: a
tutorial• Conclusion
2
Why machine learning?• Gartner's Hype Cycle for Emerging Technologies
2015.
2016.
Computationalpower
Massive amounts of data
Unprecedented advances
in ML
Why machine learning?
Data is the new oil!
What kind of data are generating wireless networks?
IoTNetwork
monitoring
Cognitive radio
Wireless networksas data sources
Data Science
What is data science?
Machine learning
Data mining Data analysis
ML algorithm selection
Model Evaluation
Pre-processing
…
Kulin, Merima, Carolina Fortuna, Eli De Poorter, Dirk Deschrijver, and Ingrid Moerman. "Data-Driven Design of Intelligent Wireless Networks: An Overview and Tutorial." Sensors 16, no. 6 (2016): 790.
• Machine learning vs.
• Data mining vs.
• Data science
“Data science is the study of generalizable extraction of knowledge from data”.
Machine learning in wireless networks• Introduction• Data driven design: examples• Data science in wireless networks: a
tutorial• Conclusion
7
Data mining/Machine learning approaches8
Regression Classification
Clustering Anomaly detection
Regression9
Regression
RegressionX Y
Regression - example10
Application area: Localization
Vanheel, F.; Verhaevert, J.; Laermans, E.; Moerman, I.; Demeester, P. Automated linear regression tools improve rssi wsn localization in multipath indoor environment. EURASIP J. Wirel. Commun. Netw. 2011, 2011, 1–27.
RegressionRSSI distance
Classification11
ClassifierX Y
Classification
C1C2
C3
C1
C2
C3
Classification: example12
ClassifierRSSIZigbee
WiFi
Bluetooth
Application area: System recognition
Zheng, Xiaolong, et al. "ZiSense: towards interference resilient duty cycling in wireless sensor networks." Proceedings of the 12th ACM Conference on Embedded Network Sensor Systems. ACM, 2014.
Microwave
-
-
--
Clustering13
ClusteringX ?C1
C2
C3
Clustering
Clustering: example14
Application area: System identification
Shetty, N.; Pollin, S.; Pawełczak, P. Identifying spectrum usage by unknown systems using experiments in machine learning. In Proceedings of the 2009 IEEE Wireless Communications and Networking Conference, Budapest, Hungary, 5–8 April 2009
ClusteringX ?Zigbee
WiFi
Noise
Anomaly detection15
AnomalyX Y/?C1
C2
C3
Anomaly detection
Which DM/ML method can you use?16
X Y
Uluagac, A. Selcuk, et al. "A passive technique for fingerprinting wireless deviceswith wired-side observations." Communications and Network Security (CNS), 2013IEEE Conference on. IEEE, 2013
Application area: ?
Measurements from a device Device type
?Classification
Machine learning in wireless networks• Introduction• Data driven design: examples• Data science in wireless networks: a
tutorial• Conclusion
17
The knowledge discovery process
The knowledge discovery processStep 1: Understanding the problem domain
Problem formulation Fingerprinting wireless devices Identify devices and device types classification problem
Goal A new solution for
Network Access Control to enhance network security
Assumptions Packet generation is
influenced by hardware architecture (CPU, DMA, L1/L2 cache, ..)
Hypothesis Identify devices and/or
device types based on statistical properties of their traffic flows
Data collection Analyze inter-arrival times
(IATs) from several devices
The knowledge discovery process
Collected data• IAT traces
Validate the data Is the selected data a
representative sample for solving the formulated problem?
Validate the hypothesis Is the stated hypothesis true
and the selected data mining task is likely to prove it?
Step 2: Understanding the data
Device fingerprinting• Data collection
• Data repositories (e.g. CRAWDAD)• Run experiments on testbed facilities• Collect data in situ
Overall 94 files Total ~137 mil. Mean ~1.46 mil. Std ~1.3 mil
Dell
iPad
iPhone
Nokia
Device fingerprintingStep 2: Understanding the data
22
Visual techniques Computational techniques
Five number summary Standard deviation, variance skewness Coefficient of determination (R2) Coefficient of correlation …
boxplot
PDF and CDF
Scatter plots
Histograms
Device fingerprintingStep 2: Understanding the data
• Visual techniques• PDF, time-series, histograms…
23
Device fingerprintingStep 2: Understanding the data
• Computational techniques• 5-num summary
24
Device fingerprintingStep 2: Understanding the data
• Computational• Coefficient of determination (R2)• Analysis for device identification
25
How much can data from one Dell Notebook tell about the data from other Dell Notebooks?
DN2
DN
3
The knowledge discovery processStep 3: Data pre-processing
Raw data• Traces of IAT
data points
Training data Features extraction Feature vectors Training examples
Device fingerprintingData pre-processing
27
Device fingerprintingData pre-processing
28
Device fingerprintingData pre-processing
• Features extraction
29
The knowledge discovery processStep 4: Data mining
Training data• Feature vectors
of histogram bins
Model Neural network
HL=6, α=0.1, learned weights
k-Nearest Neighbors K=1
Decision trees Logistic regression …
ML
31
Device fingerprintingStep 5: Performance evaluation
Test data• Test set of
feature vectorsPerformance indication RMSE, MAE, R2… Precision, Recall, Confusion matrix …
• Algorithm selection: k-fold cross validation
32
Device fingerprintingStep 5: Performance evaluation
• Performance evaluation• Confusion matrix• Accuracy, Precision, Recall, accuracy, F1-score
33
Device fingerprintingStep 5: Performance evaluation
34
Device fingerprintingStep 5: Performance evaluation
• Device type classification results
35
Device fingerprintingStep 5: Performance evaluation
• Model tuning: neural networks
Kulin, Merima, Carolina Fortuna, Eli De Poorter, Dirk Deschrijver, and Ingrid Moerman. "Data-Driven Design of Intelligent Wireless Networks: An Overview and Tutorial." Sensors 16, no. 6 (2016): 790.
More details about how to tune your algorithm can be found:
Conclusion36
• Data-driven network design can be used for• Failure detection• Systems recognition• Performance optimization…
• Data traces are valuable• Considering releasing data traces after use
• Need for increased collaboration• Network experts, testbed experts, data mining
experts, statisticians, wireless communication, etc.