oak meeting 18/09/2014
Post on 23-Jun-2015
110 Views
Preview:
DESCRIPTION
TRANSCRIPT
Fall 2014 OAK project mee4ng
OAK
Plan
• History and environment • Members • Ongoing grants • Main research themes • Prac4cal informa4on
History and environment
OAK INRIA project-‐team • « Database op4miza4ons and architectures for complex
large data » • Created as a « team » in 2012, as a « project-‐team » (full
status) in april 2013. • Recall:
– INRIA (project-‐) teams are created based on a proposed scien4fic program, for four years. They may be renewed once.
– Yearly ac4vity report (end of the year! J ) – Evalua4on by an external commiYee every four years (2015)
• Personnel from INRIA and U. Paris Sud • Subset of the Paris Sud team LaHDaK (« Large and
Heterogeneous Data and Knowledge »).
OAK members INRIA: Ioana Manolescu (DR) U. Paris Sud faculty: Nicole Bidoit (Pr) Bogdan Cau4s (Pr) Benoît Groz (MdC) External faculty: Dario Colazzo (Pr, U. Dauphine) François Goasdoué (Pr, U. Rennes 1)
Melanie Herschel (Pr, U. StuYgart) Post-‐docs: Francesca Bugioc Soudip Roy Chowdhury
Ioana Ileana (1/11/2014) Inria engineers: Benjamin Djahandideh Juan Alvaro Munoz Naranjo Interns: E. Akbari, S. Cebiric (1/11/2014) OAK, 09/2014
PhD students: Raphaël Bonaque (BC/FG/IM) Damian Bursztyn (FG/IM) Jesús Camacho-‐Rodríguez (DC/IM) Paul Lagrée (BC/O. Cappé, 1/10/14) Alexandra Roa4ş (DC/FG/IM) Aikaterina Tzompanaki (NB/MH) Stama4s Zampetakis (FG/IM)
Current grants
Datalyse • 2013-‐2016 • PhD thesis of Damian. Also: Ioana, Sejla • With: Business & Decision / Eolas, LIG, Université de Lille, Univ.
Joseph Fourrier @ Grenoble, LIRMM Montpellier • Partners: Les Mousquetaires, City of Grenoble
• Project goal: Big Data Warehousing technologies for exploi4ng in-‐house DW data (client database, …) together with external (less structured) data.
• Our work: parallel RDF data analysis tools; adap4ve hybrid stores (cloud and non-‐cloud, RDBMS, NoSQL, …)
Datalyse, a few more details Our research contributes to: 1. Uniform storage
layer plauorms for heterogeneous data (rela4onal, RDF, social…) – Indexes and
views
2. RDF data analysis tools (lenses)
ODIN • 2014-‐2018, « Open Data Intelligence ». Financed by Direc&on
Génerale de l’Armement (DGA). Start: 1/10/2014 • Elham. Also: Alexandra, François, Ioana • With: SemSow, U. Rennes 1 (ENSSAT – home team of François
Goasdoué)
• Project goal: a suit of tools for integra4ng, cleaning, and analyzing open (RDF) data
• Our work: comple4ng a full analy4cs stack for RDF – (DW basis for RDF laid out in Alexandra’s thesis) – Follow-‐up: OLAP opera4ons on OD analy4cal queries (cubes)
• Other work in the project: RDF data quality etc.
KIC Europa
• In the third and final year • Juan. Also: Dario, Ioana, Jesús; Francesca
• Project goal: plauorm for massively parallel processing of large data volumes, building on the TU Berlin project Stratosphere (à Apache Flink)
• Our work: massively parallel processing of XML queries
Main ongoing research (grant-‐based or free…)
Database techniques for the Seman4c Web
• Efficient RDF query answering – Damian’s PhD thesis
• Models and tools for RDF data warehousing (WaRG) – Alex’s PhD thesis, Sejla and Elham’s work
• Massively parallel RDF data management (CliqueSquare) – Stama4s’ PhD thesis – Benjamin
Efficient massively parallel processing of Web data
• (CliqueSquare)
• PAXQuery: transla4ng XML queries into plans of massively parallel operators (within Flink) – Jesús’ PhD thesis; Juan
• PigReuse: detec4ng and reusing repeated subexpressions in PigLa4n scripts (within Apache’s Pig project) – Jesús’ PhD thesis; Soudip’s post-‐doc (un4l now)
Crowd-‐sourcing and graphs
• OpKmizing crowd-‐sourced queries: skyline operators, inference and traceability
• Graph query evalua4on and reasoning: – Path queries – Rewri4ng – Indexes
Parallel XML and provenance
• ANDROMEDA : Evalua4ng Queries and Updates on Big XML Documents (based on sta4c and dynamic par44oning) Who : Dario Colazzo, Carlo Sar4ani and also Nicole Bidoit, Federico Ulliana, Alessandra Solimando
• Missing-‐Answers Problem : query debugging and fixing -‐ query based explana4ons for why not ques4ons -‐ Nau4lus Plauorm Who : PhD of Aikaterini Tzompanaki with Nicole Bidoit and Melanie Herschel
Hybrid and redundant stores
• Applica4on data is heterogeneous (format, complexity, …)
• Great variety of storage systems (tradi4onal DBMSs, NoSQL stores, graph/RDF stores, document stores, key-‐value stores… Centralized vs. distributed)
• Goal: – Provide to applica4ons access to their data in the data source’s na4ve model, on top of heterogeneous stores
– Automa4cally determine which data fragments to store where (transparent to the applica4on)
• Alin, Damian, Francesca, Ioana I., Ioana M.
Of OAK code(s)
All slides from Sept 16 to be added on a SlideShare account
(possibly the OAK seminar one)
Sowware (1) • Well-‐rounded, standalone systems:
– ViP2P: view-‐based management of XML in a P2P network – AMADA (data management in the Amazon cloud) – Nau4lus – PAXQuery – PigReuse
• Strongly reused RDF-‐related modules – Conjunc4ve queries: parser, equivalence tes4ng, transla4on to SQL
– RDF loading into an RDBMS, dic4onary encoding, indexing; RDF satura4on
– Some of these have many versions and are under consolida4on / cleanup
Sowware (2)
• Tuple toolkit to be extracted from ViP2P / PAXQuery? – (Nested) tuples; metadata – Physical operators (iterators) – Logical operators – Visualiza4on(s), GUI – For teaching and/or as a starter’s kit for future project development
• RDF generator by Stama4s à open sourced?
Sowware (3)
Other sowware (systems dependent on RDF reused modules) – RDF op4mized reformula4on – CliqueSquare à to be open-‐sourced soon – WaRG
Admin / daily life issues
Who does what
• Web site: should be up to date! – Useful internal pages (INRIA creden4als)
• Web master: Raphael – Re-‐organiza4on ongoing – Anyone can create/edit pages
• Signal accepted papers to the Web master: – They go in the « News » category (the most recent 4-‐5 appear on the main OAK page)
– Papers in major venues go also in «Main results » – An e-‐mail is sent to oak.news
Who does what: mailing lists • Ioana • We have
– oak.permanent – oak.phd – oak.eng-‐postdoc – oak.interns – oak.ext – oak.main=permanent U phd U eng-‐postdoc U ext – oak.all = oak.main U interns – oak.seminar = all U external friends – oak.news = all U « other » external friends (Serge Abiteboul, Tova Milo, Victor Vianu, Claire David …)
• All lists are private. Should this change?
Who does what: OAK seminars • Internal talks, or guest talks • Generally 2 pm Friday awernoon; not always • Jesús used to handle this. Replacement? • Task: – Get the 4tle+abstract – Reserve a room through the LRI Web site – Adver4se on oak.seminar (cc speaker) and on the OAK web site (it must appear in the calendar)
– (Mark in OAK shared agenda) – Get the slides, upload them in the SlideShare account www.slideshare.net/INRIA-‐OAK
Who does what
• Hardware inventory: Benjamin – Maintains the respec4ve page on the OAK Web site
• Add new hardware w/ the respec4ve user • Try to get us rid of old hardware no one uses any more • Suggest purchases
• OAK agenda: Ioana • OAK cluster agenda: many people (Benjamin, Stama4s, Ioana…)
• OAK blog: Ioana and many people
Upcoming events • PhD defenses: Alexandra 22/9, Jesús 25/9 • Group lunch & photo: 23/9 • INRIA ac4vity report 2014: put all your papers in HAL (hYp://hal.inria.fr), there is an OAK affilia4on which has INRIA, LRI, Univ Paris Sud, CNRS, …
• OAK seminars (typically Fridays at 2 pm) – Oct 1: Tamer Ozsu – Oct 24: Defense Ioana Ileana (morning @ Telecom), seminar Dan Olteanu (awernoon @ PCRI)
• Oct 10: HDR Fabian Suchanek @ Telecom • Oct 13-‐15: BDA @ Autrans • Oct 22-‐24: Visit Alin Deutsch (UCSD) • December 2: Forum STIC
top related