copernicus data access benchmark service design presentation€¦ · service (qos) to quality of...

40
Copernicus Data Access Benchmark Service Design Presentation ESA ESRIN, 2019/10/15, Nicola Cafaro Ref. Service Design Document V0.5 (September 2019)

Upload: others

Post on 16-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Copernicus Data Access Benchmark

    Service Design Presentation

    ESA ESRIN, 2019/10/15, Nicola Cafaro

    Ref. Service Design Document V0.5 (September 2019)

  • 2

    Benchmarking strategy and service design

    The Project: “Copernicus Space Component Worldwide Sentinels Data Access Benchmark”Objectives: To support ESA to improve the Sentinels data accessibility providing

    • a comprehensive and objective overview of the conditions actually experiencedby users when trying to discover, view, download and otherwise locallymanipulate the Sentinels data, also in comparison to third party comparableproviders

    • ESA with information needed to bridge the gap between the performancemeasured locally at the hubs and at the DIAS/s and the performance reported bydifferent users communities

    • ESA with information useful to improve the services, to address users’ concernsand to interpret the performances as well as anomalies reported by the users

  • 3

    Benchmarking strategy and service design

    Implementation: To design, implement and operate a service for the continuous benchmarking of Sentinel data access service, through

    • Development, testing and integration of Test tools suite• Creation and maintenance of a Test Sites Network infrastructure• Orchestration of the Test Scenarios execution • Collection and processing of the results - i.e. evolving from Quality of

    Service (QoS) to Quality of Experience (QoE)• Analysis and Reporting• Regular calibration and validation

  • 4

    Benchmarking service deliverables

    Implementation: To design, implement and operate a service for the continuous benchmarking of Sentinel data access service, through

    • Open distribution delivery:• Service Design document (methodology)• SW test tools (github)

    • Limited distribution delivery:• Summary results (Copernicus Governance)• Specific report results (Target Site Service Provider)• Raw data results, log files (Target Site Service Provider)

  • Highlights on project challenges

    • The test sites chosen represent a statistical sample of the user base of Copernicus access services and is intrinsically limited

    • The number of test sites foreseen for the full-scale configuration is a trade-off between representativeness, cost and complexity (e.g.: commercial provider availability of VMs on specific geolocations)

    • The target sites have a limited number of concurrent accesses per user(limit on concurrent access from different test sites)

    • The QoE indicators are sensitive to the parameters used to build them (tuning of metric thresholds and weights)

  • 6

    Benchmarking strategy and service design

    Team: • Exprivia – Prime contractor

    – Service Design– Service Set-up– Service Operations

    • Terradue – Subco– Development of test tools– Support to Service Transition– Support to Service Operations

    • Italtel – Consultant– Network Engineering

  • 7

    Benchmarking strategy and service design

    Timeline: Service OperationsService Set-up

    Service Maintenance & Evolutions

    6M 6M 6M

    QoE parameters tuning

    Status: still NOT in Operation (results are preliminary!)

  • Deliverables

  • The benchmarking strategy and the service design

    ID Test Scenario / Test Case Objective and Funcionality

    TS01 Simple data search and single download

    TC101 Service Reachability Test the target site general availability

    TC201 Basic query Test a simple filtered search (e.g. mission, product type...) and validate the results

    TC301 Single remote online download Test the download service of the target site for online data

    TS02 Complex data search and bulk download

    TC202 Complex Query (geo-time filter) Test a complex filtered search (e.g. geometry, acquisition period, ingestion date) and validate the results

    TC302 Multiple remote online download Test the download capacity of the target site for online data

    TS03 Systematic periodic data search and related remote data download

    TC203 Specific query (handle multiple results pages) Test a specific filtered search (e.g. geometry, acquisition period, ingestion date) with many results pages and validate the results

    TC303 Remote Bulk download Test the download capacity of the target site for downloading data in bulk

  • The benchmarking strategy and the service design

    ID Test Scenario / Test Case Objective and Funcionality

    TS04 Offline data download (in progress)

    TC204 Offline data queryTest a simple filtered search (e.g. mission, product type...) forquerying only offline data and validate the results

    TC304 Offline download Test the capacity of the target site for downloading offline data

    TS05 Coverage Analysis

    TC501 Catalogue Coverage Test and evaluate the catalogue coverage of a target site by collection

    TC502 Online Data Coverage Test and evaluate the catalogue online coverage of a target site by collection

    TS06 Data Latency Analysis

    TC601 Data Operational Latency Analysis [Time Critical] Test and evaluate the data latency of a target site by collection

    TC602 Data Availability Latency Analysis Test and evaluate the availability data latency of a target site by collection with respect to a reference target site

  • The benchmarking strategy and the service design

    ID Test Scenario / Test Case Objective and Funcionality

    TS11 Cloud Services Simple local data search and single local download on single virtual machines (in progress)

    TC211 Basic query from cloud services All the same as #TC 201 except it is used in the cloud services test scenarios.

    TC311 Single remote online download from cloud servicesAll the same as #TC 301 except it is used in the cloud services test scenarios.

    TC411 Cloud Services Single Virtual Machine ProvisioningTest the cloud services capacity of the target site for provisioning a single virtual machine

    TS12 Cloud Services Complex local data search and multiple local download on multiple virtual machines (in progress)

    TC212 Complex query from cloud services All the same as #TC 202 except it is used in the cloud services test scenarios.

    TC312 Multiple remote online download from cloud servicesAll the same as #TC 302 except it is used in the cloud services test scenarios.

    TC412 Cloud Services Multiple Virtual Machine ProvisioningTest the cloud services capacity of the target site for provisioning multiple virtual machines

  • The benchmarking strategy and the service design

    Hybrid Cloud

    ServiceArchitecture

  • The benchmarking strategy and the service design

    The driving questions

    • Q1 - What are the available Copernicus Sentinels collections? How rich is the data offer?

    • Q2 - How reactive is the target site in responding to the user’s requests?• Q3 - How quickly can users find (and visualize) the desired Sentinels products?• Q4 - How quickly can users download the identified Sentinels product/s of interest?• Q5 - Where applicable, how efficiently can users access and process Sentinels data in

    a cloud environment? How costly is this service?• Q6 - How stable are the above measured performances along time?• Q7 - How stable are the above measured performances with respect to the users’

    load?• Q8 - How variable are the above measured performances with respect to the user

    location in Europe and around the world?

  • The benchmarking strategy and the service design

    From QoS to QoE

    QoE= ∑𝑘𝑘=1𝑛𝑛 𝑤𝑤𝑘𝑘 ∗ 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴(𝑚𝑚𝑘𝑘)

    Each Question is transformed into a QoE Indicator.Each QoE Indicator is a weighted average of an APDEX Metric

    𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑚𝑚𝑘𝑘 =𝑁𝑁 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐶𝐶𝐶𝐶𝐶𝐶𝑛𝑛𝑆𝑆𝑆𝑆 + 𝑁𝑁𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝐶𝐶𝑇𝑇𝐶𝐶𝑇𝑇𝑇𝑇𝐶𝐶2

    𝑇𝑇𝐶𝐶𝑆𝑆𝑆𝑆𝑇𝑇 𝐶𝐶𝐶𝐶𝐶𝐶𝑛𝑛𝑆𝑆𝑆𝑆

  • The benchmarking strategy and the service design

    Advantages of APDEX

    • APDEX (Application Performance Index) introduces 2 thresholds that mirror user expectations and preferences (-> perceived quality)

    • It simplifies further computations• It allows an easy re-calibration of the benchmark as new results are collected

    -> archived raw data can be re-processed after the choice of new thresholds• It is an open standard, widely used in the evaluation of online services• Ref: https://www.apdex.org/

    https://www.apdex.org/

  • The benchmarking strategy and the service design

    The QoEClassification

  • The benchmarking strategy and the service design

    Results visibility

    • All raw benchmarking data and log results provided to target site

    • For every run of each user scenario, there is a JSON structured filecontaining all related test cases information (execution datetime,timespan, input test data set, etc.) and the results including everymetrics and their measured values

    • This dataset along with the benchmarking methodology, described inthe SDD document, will allow to reproduce the APDEX results of thereport

  • Q1: Collections RichnessQuestion: What are the available Copernicus Sentinels collections? How rich is the data offer?

    Elementary Metrics of Relevance User Satisfaction Zone User Frustration Zone Weights#M015 dataCoverage Not Applicable Not Applicable 0.3#M023 onlineDataRate Not Applicable Not Applicable 0.6#M014 avgDataOperationalLatency Less than 6 hours More than 24 hours 0.1

    Collections Richness ClassesA B C D E

    90< x 60< x ≤90 40< x ≤60 10< x ≤40 0< x ≤10The site has a huge andvaried data offer, whichequals the one of the OpenHub. The collections arealigned with minimum delaywith respect to theavailability thereof.

    The site has a remarkablealignment with respect to theOpen Hub data offer

    The site has an averageoffer with respect to theOpen Hub, either interms of completenessor latency of thealignment wrt Open Hub

    The site has littlealignment with respect tothe Open Hub data offer

    The site offers a minimumfraction of the overallSentinels products. Thismight indicate a veryfocused collection (e.g.only a given product type,only one single country, ora strict rolling policy).

    How is it computed?The “richness” score for a given target site is computed by sending requests to its catalogue to retrieve the total number of products for given product types, geographic regions or pre-defined time spans and by comparing them to the ones of the Open Access Hub

    Tuning in progress

  • Q1: Collections RichnessQuestion: What are the available Copernicus Sentinels collections? How rich is the data offer?

    Q1: preliminary values computed with only dataCoverage metric

  • Q1: Collections RichnessQuestion: What are the available Copernicus Sentinels collections? How rich is the data offer?

    Q1 details: colhub3 last month coverage by collection (NTC product types)

  • Q2: ReactivityQuestion: How reactive is the Sentinels data access portal in responding to user requests?

    How is it computed?The target site general reactivity is evaluated by performing requests to the target site and measuring:- The time needed for the site to respond to the user request (both average and peak times are taken into account)- The associated error rates (i.e. when the site is not responding at all).

    Elementary Metrics of Relevance User Satisfaction Zone User Frustration Zone Weights#M001 Average Response Time Less than 0.4 s More than 1.6 s 0.4#M002 Peak Response Time Less than 0.8 s More than 3.2 s 0.2#M003 Error Rate Less than 1% More than 10% 0.4

    Site Reactivity ClassesA B C D E

    90< QoE ≤100 60< QoE ≤90 40< QoE ≤60 10< QoE ≤40 0< QoE ≤20The site is extremelyreactive to the usersrequest, providing veryquick responses andnegligible error rates

    The site has very goodreactivity and very low errorrates

    The site has goodreactivity and low errorrates

    The site has low reactivityand high error rates

    The site has very low reactivityto the users requests. It presentslarge response times that maybe coupled with unacceptableerror rates

  • Q2: ReactivityQuestion: How reactive is the Sentinels data access portal in responding to user requests?

    Q2: Last 14 days, EU-Reference Test Sites

  • Q2: ReactivityQuestion: How reactive is the Sentinels data access portal in responding to user requests?

    Q2 details: Last 14 days, EU-Reference Test Sites, avgResponseTime

  • Q3: DiscoverabilityQuestion: How quickly can users find (and visualize) the desired Sentinels products?

    How is it computed?Multiple concurrent remote requests to the target site using different queries, from basic ones (e.g. by mission, by product type…) to more complex ones (e.g. by geo locations, time intervals, ingestion date…). Queries are executed with random combinations of options over pre-defined databases of e.g. geographic areas, product types and time intervals. View service performances will be introduced later in the project.

    Elementary Metrics of Relevance User Satisfaction Zone User Frustration Zone Weight#M001 Average Response Time Less than 1 s More than 4 s 0.3#M002 Peak Response Time Less than 2 s More than 8 s 0.1#M003 Error Rate Less than 1% More than 10% 0.3#M012 Results Error Rate Less than 1% More than 10% 0.3

    Discoverability Classes

    A B C D E 90< QoE ≤100 60< QoE ≤90 40< QoE ≤60 10< QoE ≤40 0< QoE ≤10

    The site is extremely quick in finding and visualising the desired data

    The site has a very good speed in finding and visualising the desired data

    The site has average performances in finding and visualising the desired data

    The site is slow in finding and visualising the desired data

    The site is extremely slow in finding and visualising the desired data

    Discoverability Classes

    A

    B

    C

    D

    E

    90< QoE ≤100

    60< QoE ≤90

    40< QoE ≤60

    10< QoE ≤40

    0< QoE ≤10

    The site is extremely quick in finding and visualising the desired data

    The site has a very good speed in finding and visualising the desired data

    The site has average performances in finding and visualising the desired data

    The site is slow in finding and visualising the desired data

    The site is extremely slow in finding and visualising the desired data

  • Q3: DiscoverabilityQuestion: How quickly can users find (and visualize) the desired Sentinels products?

    Q3: Last 14 days, EU-Reference Test Sites

  • Q3: DiscoverabilityQuestion: How quickly can users find (and visualize) the desired Sentinels products?

    Q3 details: Last 14 days, EU-Reference Test Sites, avgResponseTime

  • Q3: DiscoverabilityQuestion: How quickly can users find (and visualize) the desired Sentinels products?

    Q3 details: Last 14 days, colhub, EU-Reference Test Sites, errorRate, resultsErrorRate

  • Q4: DownloadQuestion: How quickly can users download the identified Sentinels product/s of interest?

    How is it computed?The download performances are evaluated through single or multiple concurrent remote HTTP web requests to retrieve several product files from a set of selected URLs, where the URLs for download are obtained from the results of the search scenarios. Only online products are taken into account; offline products’ download will be introduced later in the project.

    Elementary Metrics of Relevance User Satisfaction Zone User Frustration Zone Weights#M001 Average Response Time Less than 0.4 s More than 1,6 s 0.2#M002 Peak Response Time Less than 0.8 s More than 3.2 s 0.1#M003 Error Rate Less than 1% More than 10% 0.2#M004 Average Throughput More than 3 MB/s Less than 0.4 MB/s 0.5

    Download Classes

    A B C D E90< Download ≤100 60< Download ≤90 40< Download ≤60 10< Download ≤40 0< Download ≤10

    Excellent downloadcapabilities.Products are downloadedwith highest throughput,with no or very limitederrors and minimumlatency

    Very good downloadcapabilities.Products aredownloaded with highthroughput, with limitederrors and limitedlatency

    Fair download capabilities.Products are downloaded withaverage throughput, limitederrors and limited latency

    Poor download capabilities.Products are downloadedwith low throughput, manyerrors and/or long latency

    Download capabilities aredefinitely not acceptable

  • Q4: DownloadQuestion: How quickly can users download the identified Sentinels product/s of interest?

    Q4: Last 14 days, EU-Reference Test Sites

  • Q4: DownloadQuestion: How quickly can users download the identified Sentinels product/s of interest?

    Q4 details: Last 35 days, scihub, EU-Reference Test Sites

  • Q4: DownloadQuestion: How quickly can users download the identified Sentinels product/s of interest?

    Q4 details: Last 14 days, EU-Reference Test Sites

  • Q8: Geographic Variability

    Question: How variable are the above measured performances with respect to the user location in Europe and around the world?

    How is it computed?Via user operations from different test sites of the network and by computing the QoE indicators overeach connection.

    Note: current network being expanded!

  • Q8: Geographic Variability

    Question: How variable are the above measured performances with respect to the user location in Europe and around the world?

  • Q8: Geographic Variability

    Question: How variable are the above measured performances with respect to the user location in Europe and around the world?

  • Available in the next delivery

    Question: Where applicable, how efficiently can users access and process Sentinels data in a cloud environment? How costly is this service?

    Q5: Cloud Computing

    Q6: Time StabilityQuestion: How stable are the above measured performances along time?

    Q7: Load StabilityQuestion: How stable are the above measured performances with respect to the users’ load?

  • QoE thresholds/weights tuning

    36

    The QoE thresholds and weights will be refined after analysis of results and interactions with domain experts and stakeholders

  • Calibration: Test Sites Network

    Test sites calibration is periodically undertaken with the objective to:• regularly monitor the health status and the performances of the test sites• select the set of the reference benchmark trusted goldmarks (EU-ref

    TestSites) that are used to compute the scores for all the target sites.

    The calibration is based on the computation of a specific Calibration Quality Indicator (C1) providing an estimate of the overall quality of a given test site. This is based on the speedtest network (https://www.speedtest.net/it/speedtest-servers)The methodology is being refined to identify suitable speedtest servers to be used as a reference.

    https://www.speedtest.net/it/speedtest-servers

  • Validation: User Scenarios

    • The user operations via API are performed automatically. A set of “typical queries” are launched from each test site on the catalogue API of each target site.

    • The queries are generated randomly using a pre-defined Mission Configuration Dictionary. The randomization process aims at ensuring that:

    – the discovery and download performances of a target site are properlybenchmarked (i.e. avoid caching by launching each time a new query) and

    – that the functionality is properly exercised over a wide spectrum of filteringcapacity and results accuracy.

    – If the service is run over a sufficiently large period and in combination with a fairlysignificant load factor, the randomization process ensures that these objectivesare met whilst maintaining a fair exercise of all target sites.

  • Way forward

    39

    Next steps:• Completion of User Scenarios to include missing functionalities to be benchmarked (e.g. off-line

    download, cloud processing, view service)• Enhancement of Test Sites Network (up to 30 sites foreseen, including selected locations in the

    proximity of the core target sites)• Consolidation of Test Sites calibration procedure (e.g. analysis of speedtest servers)• Consolidation of QoE parameters (thresholds and weights) • Validation of User scenarios and benchmarking operations

    Next deliveries:• SDD Version 1.0: end 2019• SDD Annex: Benchmarking SW Suite: first release on github: beg. 2020

    Service Evolutions:• Expansion of Test Sites network (e.g. to cover specific ESA and/or user measurements needs)• Expansion of Target Sites being benchmarked• Expansion of benchmarking user scenarios (?)

  • Way forward

    40

    Thank you

    Slide Number 1Slide Number 2Slide Number 3Slide Number 4Slide Number 5Slide Number 6Slide Number 7Slide Number 8Slide Number 9Slide Number 10Slide Number 11Slide Number 12Slide Number 13Slide Number 14Slide Number 15Slide Number 16Slide Number 17Slide Number 18Slide Number 19Slide Number 20Slide Number 21Slide Number 22Slide Number 23Slide Number 24Slide Number 25Slide Number 26Slide Number 27Slide Number 28Slide Number 29Slide Number 30Slide Number 31Slide Number 32Slide Number 33Slide Number 34Slide Number 35QoE thresholds/weights tuningSlide Number 37Slide Number 38Way forwardWay forward