asterios katsifodimos saturday, may 23, 2015 high performance computing systems lab university of...

35
Asterios Katsifodimos Saturday, June 18, 2022 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides based on: “AMGA metadata catalog with use case by Tony Calanducci

Upload: maurice-byrd

Post on 18-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

Asterios KatsifodimosTuesday, April 18, 2023

High Performance Computing systems Lab

University of Cyprus

The AMGA metadata catalog – An Overview

Slides based on:

“AMGA metadata catalog with use cases”

by Tony Calanducci

Page 2: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

Outline

Background and Motivation for AMGA

Interface, Architecture and Implementation

Metadata Replication/Federation on AMGA

Use cases

Page 3: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

ARDA proposed an interface for Metadata access on the GRID Based on requirements of LHC experiments Designed jointly with the gLite/EGEE team

Adopted as the official EGEE Metadata Interface Endorsed by PTF (Project Technical Forum of EGEE)

Released on December 07 in gLite 3.1(update 10) All the release process was made by HPCL - University of

Cyprus testing, test scripts, automatic configuration scripts, preparation for

gLite environment Initial release: glite-AMGA_postgres Upcoming release(April 08): glite-AMGA_oracle

now in preproduction services Releases are officially supported by EGEE

Since the first release

Arda Metadata Grid Application (history)

Page 4: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

Metadata on the GRID

Metadata is data about data e.g. On a Data Grid: information about files

Describe files Locate files based on their contents

AMGA makes DB access a simple task on the Grid Many Grid applications need structured data Many applications require only simple schemas

Can be modelled as metadata Main advantage: better integration with the Grid

environment Metadata Service is a Grid component Grid security Hide DB heterogeneity

Page 5: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

Metadata user requirements I want to

store some information about files In a structured way

query a system about those information keep information about jobs

I want my jobs to have read/write access to those information

have easy access to structured data using my proxy certificate

NOT use a database

Page 6: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

AMGA Features Dynamic Schemas

Schemas can be modified at runtime by client Create, delete schemas Add, remove attributes

Metadata organised as an hierarchy Collections can contain sub-collections Analogy to file system:

Collection Directory; Entry File; attribute inode information

Flexible Queries SQL-like query language Joins between schemas Example

QUERY EXAMPLE:

selectattr /gLibrary:FileName /gLibrary:Author ‘/gLibrary:FILE=/gLAudio:FILE \ and like(/gLibrary:FileName,“%.mp3")‘

Page 7: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

Metadata Concepts

Some Concepts in AMGA: Metadata - List of attributes associated with entries Attribute – key/value pair with type information

Type – The type (int, float, string,…) Name/Key – The name of the attribute Value - Value of an entry's attribute

Schema – A set of attributes Entry – Lives in a schema – assigns values to

attributes Collection – A set of entries associated with a

schema Think of schemas as tables, attributes as columns,

entries as rows

Page 8: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

AMGA data organization

Relational schema AMGA(hierarchy)

/HOSPITAL//HOSPITAL/

PATIENTS/

PATIENTS/

DOCTORS/

DOCTORS/

johnjohn george

george

#name sickness

age

john malaria 68

george otitis 84 sickness

otitis

age 84

Attributes

Entries

Schema/Directory

Schema/Directory

TABLE: PATIENTS

#name

PATIENTS

DOCTORS

TABLE: HOSPITAL

Collection

#type

people_group

people_group

Page 9: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

AMGA Implementation

C++ multiprocess server Runs on any Linux flavour

Backends Oracle, MySQL, PostgreSQL, SQLite

Two frontends TCP Streaming

High performance Client API for: C++, Java, Python, Perl,

Ruby

SOAP Interoperability

Also implemented as standalone Python library Data stored on filesystem

Metadata Server

MDServer

SOAP

TCP Streaming

PostgreSQL

Oracle

SQLite

Client

Client

MySQL

Python Interpreter

Metadata Python

APIClient

filesystem

Page 10: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

AMGA Security

Unix style permissions user-group-others (e.g. rwxr--r--)

ACLs – per-collection or per-entry. Secure connections – SSL Client Authentication based on

Username/password General X509 certificates Grid-proxy certificates

Access control via a Virtual Organization Management System (VOMS)

Authenticate with X509 Cert VOMS-Cert

with Group & Role information

VOMS-Cert

Resource management

AMGAOracle

VOMS

Page 11: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

Accessing AMGA

TCP Streaming Front-end mdcli & mdclient and C++ API (md_cli.h,

MD_Client.h) Java Client API* and command line*

(mdjavaclient.sh & mdjavacli.sh) Python* & PHP* Client API

SOAP Frontend (WSDL) C++ gSOAP AXIS (Java)* ZSI (Python)* *(also under Windows)

Page 12: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

Python API example

Page 13: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

AMGA Internals – Backend translation To better understand how AMGA works

Example: $mkdir /hpcl

INSERT INTO schema(id,name) VALUES(“/hpcl”,”dir2”);

CREATE TABLE dir2; $addattr /hpcl id int

ALTER TABLE dir2 ADD COLUMN "user:id" integer;

AMGA DB Backend

Collections Tables

Entries Rows

Attributes Columns

Page 14: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

AMGA Internals – TCP-Streaming

Designed for scalability Asynchronous operation

Reading from DB and sending data to client

Response sent to client in chunks No limit on the maximum

response size

Example: TCP Streaming Text based protocol (like SMTP,

POP3,…) Response streamed to client

Client Server Database

<operation> Create DB cursor

[data]

[data]

[data]

[data]

[data]

[data]

[data]

[data]

StreamingStreaming

Client: listattr entry

Server: 0entryvalue1value2…<EOT>

Page 15: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

Metadata Replication 1/2

Motivation Scalability – Support hundreds/thousands of concurrent users Geographical distribution – Hide network latency Reliability – No single point of failure DB Independent replication – Heterogeneous DB systems Disconnected computing – Off-line access (laptops)

Architecture Asynchronous replication Master-slave – Writes only allowed on the master Replication at the application level

Replicate Metadata commands, not SQL → DB independence Partial replication – supports replication of only sub-trees of the

metadata hierarchy

Page 16: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

Metadata Replication 2/2

MetadataCommands

RedirectedCommands

Full replication Partial replication

Federation Proxy

Page 17: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

Importing existing data

Suppose that you have the data A reasonable question would be:

Can I use my existing database data?? The answer is YES

Importing data to AMGA Pretty simple Connect a database to AMGA

Execute the import command import table directory

Ready to go!

Page 18: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

Using AMGA along with an LFC LFC uses a database backend(commonly

MySQL) AMGA integration on an LFC

Work on LFC’s database Logical File names in LFC collections,entries in

AMGA Very nice for managing files & directories

Every new file entry is also put into AMGA BUT

Currently broken feature The AMGA developers are working on it

Page 19: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

Conclusion (uses cases follow)

AMGA – Metadata Service of gLite Part of gLite 3.1

Officially Supported from EGEE Useful for simplified DB access Integrated on the Grid environment

Security (voms proxies, globus proxies) Replication/Federation features Tests show good performance/scalability AMGA Web Site

http://amga.web.cern.ch/amga/

Page 20: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

A generic use case

1. Use Storage Elements for storing files2. Use LFN’s(Logical File Names) for having a

file name (storing them on an LFC)3. Use AMGA to store metadata about files4. Query AMGA using complex queries about

files I want all files that have:

type=image AND size > 6kb AND description LIKE “%breast%cancer%”

5. Use results to retrieve only specific files

Page 21: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

AMGA usage examples

Biomed: Medical Data Manager

Deployed on EGEE production grid

gMOD

Deployed on GILDA

Page 22: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

Biomed: Medical Data Manager

ImagesGUID Date

PatientID Doctor

DoctorName Hospital

Patient

Store and access medical images exploiting metadata on the Grid

Strong security requirements Patient data is sensitive Data must be encrypted Metadata access must be restricted

to authorized users AMGA used as metadata server

Demonstrates authentication and encrypted access Used as a simplified DB NO ENCRYPTION on DB Backend – Anyone interested?

More details at: http://www.i3s.unice.fr/~johan/mdm/mdm-051013.pdf

Page 23: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

gMOD: grid Movie On Demand

gMOD provides a Video-On-Demand service User chooses among a list of video and the chosen one

is streamed in real time to the video client of the user’s workstation

For each movie a lot of details (Title, Runtime, Country, Release Date, Genre, Director, Case, Plot Outline) are stored and users can search a particular movie querying on one or more attributes

Two kind of users can interact with gMOD: TrailersManagers that can administer the db of movies (uploading new ones and attaching metadata to them); GILDA VO users (guest) can browse, search and choose a movie to be streamed.

Page 24: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

gMOD screenshot

gMOD is accesible through the Genius Portal (https://glite-tutor.ct.infn.it)

Selecting from left side menu: VO Services/gMOD

Page 25: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

gMOD under the hood

Built on top of gLite services + GENIUS web portal: Storage Elements, sited in different places, physically

contain the movie files LFC, the File Catalogue, keeps track in which Storage

Element a particular movie is located AMGA is the repository of the detailed information for

each movie, and makes possible queries on them The Virtual Organization Membership Service (VOMS)

is used to assign the right role to the different users The Workload Management System (WMS) is

responsible to retrieve the chosen movie from the right Storage Element and stream it over the network down to the user’s desktop or laptop

Page 26: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

gMOD interactions

VOMS

LFCCatalogue

MetadataCatalogue

WN WN

WN

CE

Storage Elements

User

Genius Portal

Workload Management System

get RoleAMGA

Page 27: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

The End

Questions - Discussion

Page 28: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

Backup Slides

Page 29: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

AMGA Web Interface

Page 30: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

AMGA Web Interface

Page 31: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

Metadata Schema Management

Page 32: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

Entry Management

Page 33: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

ACL Management

Page 34: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

QBE like Query Engine

Page 35: Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides

Query Result