are there any areas artificial - c-mric.com · 2019. 11. 22. · artificial intelligence (ai) –is...
TRANSCRIPT
Are there any Areas Artificial Intelligence couldn’t be
applied to?
Dr Cyril OnwubikoC-MRiC.ORG
Invited Guest LectureKingston University, London (KU), Surbiton, Surrey, UK
22nd November 2019
AI, ML, DL and RL are being applied to most everyday aspects of our lives, ranging from Autonomous Vehicles (Self-driving cars), Social Media, Cyber Security to Digital Imagery for Clinical Diagnosis.
It’s arguable to see What and Where AI could not be applied.
Abstract
Rules Engines, Expert Systems, Knowledge Graphs, or Symbolic AI. Collectively, known as the Good, Old-Fashioned AI (GOFAI)
Context
New AI e.g. Neural Networks, Deep Neural Networks, NLP, Reinforcement Learning and Deep Reinforcement Learning etc. (New-AI)
Building BlocksArtificial Intelligence (AI) – Is a branch of Computer Science that ‘trains’ computers to behave intelligently by carrying out tasks that normally required human intelligence, e.g. visual perception, speech recognition, etc. It encompasses ML, DL, RL, DRL, NLP and Stochastic process, decision and predictive analysis. AI – is simply “the Science and Engineering of making intelligence machines” – John McCarthy
Deep Learning (DL) Is a subset of Machine Learning that does well in identifying patterns in unstructured data, and are used mostly in Computer Vision for image identification, object detection and image classification. Simply put, it means more accuracy, more math, and more compute because of the deep hidden layers.
Machine Learning (ML) Is a branch of AI that are generally regarded as Programs that alter themselves.
ML has the ability to modify itself when exposed to more data and/or different data set. E.g. Neural Networks
Building Blocks
Deep Reinforcement Learning (RL) Is a subset of RL that applies Deep Learning. Reinforcement learning is usually described using concepts such as agents, environments, states, actions and rewards and penalty.
Reinforcement Learning (DRL) Is a branch of AI that focuses on goal-oriented algorithms to solve complex tasks through ‘reward’ and ‘penalty’ akin to Game Theory.
These algorithms are rewarded when they make good decisions and penalised when they make bad ones – hence the word reinforcement. E.g. AlphaGo
Example of A Deep Neural Network for DL
Some AI Use CasesVoice
recognitionAutonomous
CarsVoice Search
Cyber Security
Fraud Detection
Social MediaPredictive Analysis
Credit Rating
Network Distribution
Preference Recommendation
AnalyticsFacial
RecognitionImage
Recognition
Motion Detection
Threat Detection
Sentiment Analysis
Imagery Identification
ResourceAnalysis
Games
Load Balancing
General Principles & Concepts
© 2019 C-MRIC
ML Problem Space
ML AlgorithmsFeature
EngineeringData Set
(Training & Testing)
Data Sets• Ground Truth Data is still an issue
• Data has become relatively available in comparison to previous decade. E.g.
• ImageNet
• MNIST
• Kaggle
• CRAN
• UCI UC Irvine Data Set
• CTU-13
• VisualData
Publicly available
Artificially generated
https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
Feature Engineering
Feature engineering, while generally known to boost classification metrics, however, it creates a need for substantial investment of expensive and scarce data science expertise. We find that reliance on domain expertise and feature engineering severely inhibits the feasibility of applying existing correlation and filtering methods in practice [2]
[2] Egon Kidmose – PhD Thesis –Network-based Detection of Malicious Activities – A Corporate Network Perspective, 2018
• Cost – Expensive
• Domain Experts / Domain Knowledge
• Data Scientist
• Complex and Convoluted
• Tedious
Parsers /Plugins
UserDataFieldsFeatures
Features, Reliability & Relevance
[3] SSDeep Project – context triggered piecewise hashes (CTPH) - https://ssdeep-project.github.io/ssdeep/index.html
Indicators of Compromise (IoC)
• Constantly evolving features
• Feature Reliability & Relevance
Features Features Features Features
IP Address (IPv4 or IPv6)
FilePath FileHASH-SHA256 Encrypt-AES256
Domain CIDR FileHash-IMPHASH Encrypt-AES128
Hostname CVE Mutex Encrypt-AES224
FQDN Email SSDeep /CTPH [4] Encrypt-DES
URL FileHash-MD5 GeoIP Encrypt-Unknown
URI FileHASH-SHA1 DNS YARA
Applications in Cyber Security
© 2019 C-MRIC
User behaviouranalysis
Insider threat detection
Malware familyidentification
Network trafficprofiling
Malware detection
Spam filtering
Network anomaly detection
C2detection
Unsupervised(no labels)
Supervised(with labels)
Incremental(Learn
continuously)
Batch(Learn only once
or in discrete steps)
The problem one is trying to solve dictates the data,feature and ML algorithms used [4][4] Scott Miserendino, BluVector Inc.
Cyber Security Case Studies• Malware Detection
• Malware detection, network-based attack detection and code detection and C2 & Bots
• Profiling & Security Analytics
• User & Entity profiling, behavioural analytics, big data, security and web analytics etc
• Cyber Security Operations Centre
• Security Monitoring, Flow Analysis, Log Collection & Collation, Correlation, Analysis, and Cyber Incident Management and ‘Human-in-the-loop’ etc
Malware Detection• Intrusion detection systems (IDS) using Rule-based,
Heuristic, Machine Learning - Supervised & Unsupervised Learning
• Detect Bot / C2 (Command & Control)
• Correlating Intrusion detection alerts on Bot malware infections
• Content inspection – content is inspected against feature set of malignant (malicious) content
• Temporal heuristic & behavioural analysis
• Featureless Engineering for Anomaly detection
Featureless Engineering & without Domain Expertise
• A case study of Filtering & Correlation of IDS Alerts
• IDSs produce high volumes of logs/alerts, raise high false positives
• Reliability / Trust?
• Alert ≠ Incident
Conclusions• AI, ML, DL & DRL have been applied to solve
many real-life problems ranging from autonomous vehicle to malware detection to correlation of alerts and behavioural analytics.
• Cloud computing, advances in GPU, and possibly Quantum Computing, and the availability of Test data sets, e.g. ImageNet, MNIST etc have helped industrialise AI applications.
1
Conclusions• AI does not have all the answers! Context
information is elusive with AI/ML-based solutions.
• AI will always find something, and whether what AI has found is correct, or can be explained is still a cause for concern. E.g. Biased algorithms, ‘black-box’, unrelated contextual interpretations.
• Human-in-the-loop is, and will continue alongside most applications of AI.
• We see a cooperative and collaborative Human-to-AI applications
2
Cyber Science 2020 Conferencehttps://www.c-mric.com/conferences
Thank – You!
Centre for Multidisciplinary Research,
Innovation and Collaboration
(C-MRiC.ORG)
@CMRiCORG
www.C-MRiC.ORG
© 2019 C-MRIC