peter halland haro sintef · colocation – a . multi-tenant data centre, colocation space can be...
TRANSCRIPT
1
Peter Halland HaroSINTEF
9 Buzzwords in 40 minutes
8 May 2019, Konferansesenteret Meet Ullevaal, Oslo
• Data Centre
• Cloud Computing
• Edge vs. Cloud
• High performance computing
• Big Data
• I(IoT) in Built data
• Data Analytics / Data Scienc
• AI and ML
• Blockchain
3
Addressed Buzzwords – your choice
DATA CENTRE
Data Centre
• A data center is simply a facility composed of:• networked computers and storage
• That businesses or other organizations use to:• Organize,
• Process,
• Store and dissiminate
• Large amounts of data
5
Data Centre types and services
• Data centre services support implementation, operation and enhancement of data centre.
• Data centres have evolved into a mix of alternate data centre types which can be classified into the following matrix:
6
Reseller Colocation Owned and Operated
Direct access to infrastructure ✕ ✕ ✓
Root access to the server ✕ ✓ ✓
Direct support ✕ Maybe ✓
Custom infrastructure ✕ ✓ ✓
Real application optimizations ✕ ✓ ✓
Data centre facilities and service models [1]• In-House Data Centre – Larger organizations and those in the technology industry, design, build and operate their own facilities.
• Colocation – A multi-tenant data centre, colocation space can be sold to enterprises by the rack, cabinet or cage. Customers still maintain control over their hardware but outsource facility and internal systems maintenance to the provider.
• Wholesale Data Centre – Providers sell data centre space in larger capacities than a colocation model and typically have fewer customers. "The landlord" provides facility maintenance to the tenant.
• Dedicated Hosting – The provider operates and/or rents server capacity to single customers. No additional services are provided and the customer maintains full control over the server beyond maintenance.
• Managed Hosting – In a managed hosting facility, the provider operates servers and storage for its customers, as well as provides additional administrative and engineering services.
• Examples include database administration, operating system administration, managed security services, managed storage, application management services, disaster recovery, systems monitoring and remote management. The hardware may be owned by the customer or the provider.
• Shared Hosting – customers share server capacity and operate shared hosting facilities. To deploy services, these providers create a user interface overlaying the physical server. This interface provides multi-tenant applications to help customers configure their services.
However, numerous service models exist and the line between different types of operations can become blurred.
[1] https://cyrusone.com/corporate-blog/understanding-the-different-types-of-data-center-facilities/
Data center for your needs
• There exist a multitde of various data center types and configurations
• Furtheremore, these are often defined by extended SLAs dependingon the data-center tier
• There need to be an informed decision based on your selective needs
8
CLOUD COMPUTING
Cloud Computing
• Cloud computing is a general term for the delivery of hosted services over the internet.
• Cloud computing enables companies to consume a compute resource, such as a virtual machine (VM), storage or an application, as a utility -- just like electricity -- rather than having to build and maintain computing infrastructures in house.
https://searchdatacenter.techtarget.com/definition/cloud-computing
Cloud computing• In general, there are three cloud computing characteristics that are common among all cloud-computing vendors:
1. The back-end of the application (especially hardware) is completely managed by a cloud vendor.
2. A user only pays for services used (memory, processing time and bandwidth, etc.).
3. Services are scalable
• Many cloud computing advancements are closely related to virtualization or "containerisation".
• The ability to pay on demand and scale quickly is largely a result of cloud computing vendors being able to pool resources that may be divided among multiple clients.
• It is common to categorize cloud computing services as:
• Infrastructure as a service (IaaS),
• Platform as a service (PaaS) or
• Software as a service (SaaS).
https://www.techopedia.com/definition/2/cloud-computing
EDGE Computing
12
Edge vs cloud computing
• Edge data centres are generally smaller facilities that extend the edge of the network to deliver cloud computing resources and cached streaming content to local end users.
• For internet of things (IoT) networks, edge data centres also serve as clearing houses for data being generated by IoT devices that requires additional processing but is too time sensitive to be transmitted back to a centralized cloud server.
• Since they are positioned closer to the end users, they can deliver faster services with minimal latency.
• Although many of them are managed remotely with very little on the ground staff, they should form an important part of the local network.
• That improved performance should not also come at an increased cost.• Edge computing doesn’t deliver better service by laying better cables or boosting power;• It’s simply a more efficient architecture for transferring and processing data because it can
deliver content quickly to local users with minimal latency.
https://www.vxchnge.com/blog/what-is-an-edge-data-center
HIGH PERFORMANCE COMPUTING (HPC)Peter Haro
What is high performance computing?
• No official definition, various different interpretations.
• “A computing system exhibiting high-end performance capabilities and resource capacities within practical constraints of technology, cost, power, and reliability.”
• “Large very fast mainframe used especially for scientific computations. “
• Basically, something that allows for many computations in parallel
15
How do we perform HPC?
Clusters
16
Supercomputers
What separates a supercomputer from a cluster?
What’s the difference from a cluster?
• Distinction is a bit subjective
• Presented as a single system (or mainframe) to the user
• Individual servers are stripped down to absolute basics
• Typically cannot operate as independent computer from the rest of the system
17
HPC Clusters
Goal is to have network fast enough to ignore… why?
• All cores ideally are “close” enough to appear to be on one machine
• Keep the problem FLOP (or at least node) limited as much as possible
• High throughput networks, ie. Infiband HDR
• Bandwidth: 600 Gb/s = 75 GB/s
• Latency: Varies on topology and location in network (~1-10 microseconds)
18
Cloud vs HPC?
19
• Multi-Tenancy• Dynamic Environments• Virtualization• SLAs / SLOs
Cloud Computing C
• High performance Interconnect
• Performance Isolation• Predictability
HPC/AI/DL Applications C
HPC in the built sector
• Consume meteorologic data
• Search, simulations and emulations
• Analysis and computation of construction material i.e. Reinforcedconcrete building structures
• Live data treatment
• HPC + edge computing -> Future
• Interactive real-time data visualization
20
BIG DATAPeter Haro
Big data
• Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it
22
The Big Data VsVolume Velocity VeracityVariety Value
Data at Rest
Terabytes to exabytes of existing
data to process
Data in Motion
Streaming data, requiring mseconds
to respond
Data in Many Forms
Structured, unstructured, text,
multimedia,…
Data in Doubt
Uncertainty due to data inconsistency &
incompleteness, ambiguities, latency,
deception
€ €
€
€
€
€ €€
Data into Money
Business models can be associated to the
data
Adapted by a post of Michael Walker on 28 November 2012
Sales data, physical data, repositories..
Realtime data,
sensors traceabilitysurveillance
Sensors, analogue,
digital, unstructured
data ++++
Insufficient reports, partial results, incorrect data-
registration, +++
Gather Process Visualize
Figure from
Technology
• Big Data
• Machine learning
• Internet of Things (IoT)
• Edge computing
• Cloud computing
Services in the built sector
• Visualizing historical data
• Long-term decision support
• Onsite operations decision support
• Predictive maintenance
Big data must provide the added value
Possible data sources in Built data• Historical meteorological data
• Satellites
• Operation-data
• Sensor data
• BIM systems
• Traffic
• Community (energy sharing systems etc)
• Market data (sales notes)
Data collection is demanding
(I)IOT IN THE BUILT DATA ENVIRONMENTPeter Haro
28
IOT and IIOT are defined as machines or “things” that are connected with other “things” in a distributed connected platform
IOT VS IIOT
29
IOT & IIOT
IOT IIOT
30
Similar technology drivers
• Small inexpensive sensors and actuators
• Low cost processing power and data storage
• Smart devices• Connectivity
Different Business drivers• Consumer
• Adequate technology• Modest security• Cost sensitive
• Industry• QOS• Robust security• Various specialized
equipment• ROI necessary
Applications in Built data sector
• Predicative and preventive maintenance
• Health and safety
• Detect sensor failures using local consensus algorithms on-chip
• Supply chain management – Asset tracking and monitoring
• Improving operational efficiency
31
DATA ANALYTICS / DATA SCIENCEBjørn Magnus Mathiesen
• What can this data tell you?• Descriptive analytics: What happened?• Predictive analytics: What will happen?• Prescriptive analytics: What should be
done? What can make X happen? (Why did that happen?)
33
Data science
• Data Managment• Store, update, delete, add - privacy
• Data Retrieval• Getting data from sources to managed storage, getting
orthogonal data
• Data and Metadata• Data describing the data (how big, how fast, how
detailed, how verified)
34
Data science
• Data Sharing / Liberation• Sharing your data can increase its value to you, and
increase the value of others data (win – win)
35
Data science
36
Data visualization:
• Mean x 9
• Mean y 7.5
• Var 11
• Corr 0.816
• Y= 3 + 0.5X
CC-BY-SA Wikipedia
37
Data visualization 2
revolutionaryanalytics.com, "DataSaurus" (first created by Alberto Cairo)
• How to treat data
• What to keep
• What the data can tell you
Data Analytics2017
• Data can help you steer
• Data is valuable (to many)
ARTIFICIAL INTELLIGENCE & MACHINE LEARNING
Artificial Intelligence and ML
ArtificialIntelligence
41
towardsdatascience.com
Clustering
Dimensionality reduction
42
towardsdatascience.com
Clustering
Dimensionality reduction
43
towardsdatascience.com
Clustering
Dimensionality reduction
44
towardsdatascience.com
Clustering
Dimensionality reduction
45
towardsdatascience.com
Clustering
Dimensionality reduction
Category
46
Problem type Example methods Example
Machine learning
Supervised learning
Predictive models based on input and output data
ClassificationSupport vector machines,
naïve Bayes, decision trees, random forest
Medical imaging
RegressionEnsemble methods,
decision trees, random forest, neural networks
Predicting prices
Unsupervised learning
Discover internal representation from input
data only
Clustering K-means, density based, hierarchical Customer segmentation
Association Association rules Market basket analysis
Reinforcement learning
Decision optimisation
ClassificationApproximate policy
systems, recommender systems
Optimized marketing
Control Reinforcement learning in control loops Driverless cars
47
Supervised learning
Duck Duck Rabbit Hedgehog
Supervised learning
Predictive model
Predictive model
Duck
?
48
Unsupervised learning
Unsupervised learning
Performance
49Widanapathirana et al. 2012
K-nearest neighboursSupport vector machine
An example of bias
50 Source: Ribeiro, Singh, Guestrin, (2016); “Why Should I Trust You?” Explaining the Predictions of Any Classifier
Predicted: WolfTrue: Wolf
Predicted: WolfTrue: Wolf
Predicted: WolfTrue: Wolf
Predicted: HuskyTrue: Husky
Predicted: WolfTrue: Husky
Predicted: HuskyTrue: Husky
Machine Learning vs Data science
51
• ML: F(A) = B
• Can be used in systems such as websites, apps etc.
A B
• Homes with 3 bedroms are more expensive than homes with 2 bedrooms of similar size
• Newly renovated homes have a 15% premium
Source:Andew Ng. deeplearning.ai “AI for everyone”
Machine Learning vs Data science
52
• Many methods exist
• Off-the-shelf often not enough
• Develop new methods by combining data-driven and model based methods
Hybrids
53
Artificial Intelligence & Machine Learning
2017
• AI/ML can automate
• AI/ML can predict
• AI/ML can increase productivity
• AI/ML will disrupt
BLOCKCHAINPeter Haro
Blockchain as a universal solution?
56
What is Distributed Ledger Technology?
• Defined as a consensus of replicated, shared, and synchronized data in a distributed network – Think distributed database
• Blockchain is a specific instanceof DLT, where the data is groupedinto «blocks», where data is append only
57
Server
What is new about Blockchain?
• Solves a known problem within distributed systems; the double spend problem, without using a central authority.• Avoid paying double
• Introduce logical clocks through cryptographic means
• Byzantine fault tolerance• Errors in the system must be discovered by other nodes in the system, not limited to
crashes
• Particularly important in safety-critical applications such as airplane, nuclear powerplants..
• Trust even in hostile environments 58
Blockchain solves everything!?
• «There is no silver bullet»
• Subservient to the CAP theorem – either CP or AP
• Can be very slow, depending on the blockchain type and size
• Not effective for storing universal data
• Often outcompeted by traditional solutions
59
So why use blockchain?
• Transparent transaction history
• Immutability
• «Secure» logging facilities due to the consensus algorithms – Harder to attack
• Trackability throughout the system(s)
• Reducing costs – Less work with third parties
• Risk reduction, especially with regards to system manipulation, fraudand other criminal activity
60
Built data
• Blockchain will not change our current view of the built data sector
• Low-hanging fruit for added value:• Complete traceability of the supply chain
• Approval of production platforms wrt the given SLAs for running software using smart-contracts
• Allows for platforms to automatically connect to systems and other platforms using theaforementioned contracts to approve the demands for the platforms.
• Document and regulate access to protected resources
• Especially good for supply chain management
61
Technology for a better society