eu cost action cm1404: wg€ - efficient data exchange

35
Sustainable and Smart Energy Carriers for Decentralised Energy Production PRESENTATION Data Mining Challenges in Distributed Generation Edward S. Blurock Blurock Consulting AB (previously with Malmö University: Computer Science Dept. Lund University: Combustion Physics, Energy Sciences Research Institute for Symbolic Computation University of California, Irvine: Thesis, Computational Chemistry) bottom line: a career in (chemical) modelling (using data, AI and machine learning/data mining …)

Upload: edward-blurock

Post on 23-Jan-2018

45 views

Category:

Science


4 download

TRANSCRIPT

Page 1: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

PRESENTATION

Data Mining Challenges in Distributed Generation

Edward S. Blurock

Blurock Consulting AB

(previously with

Malmö University: Computer Science Dept.

Lund University: Combustion Physics, Energy Sciences

Research Institute for Symbolic Computation

University of California, Irvine: Thesis, Computational Chemistry)

bottom line: a career in (chemical) modelling

(using data, AI and machine learning/data mining …)

Page 2: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

(SLIGHTLY) REVISED TITLE

Data Mining Challenges in Distributed Generation

Data Mining Challenges in Distributed Generation

Data Mining Challenges in Distributed Generation

Community(?)

Data Mining Challenges in Distributed Generation

Combustion Community(?)

Data Mining Challengesfrom the widely distributed generated data from the scientific community

specifically for those dealing with all aspects of combustion

Page 3: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

WHAT WE ARE TALKING ABOUT

DataTheme:

you have to have data available

before you can do something with it

Page 4: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

PRESENTATION

• Introduction (with disclaimers and revisions)

• Motivation: • Data exchange moving into the clouds

• WG4: • Standard definition for data collection and mining toward a virtual

chemistry of Smart Energy Carriers

• WG4 Task Force:• Toward efficient data exchange in the combustion community

Page 5: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

DATA PERSPECTIVE: DATA EXCHANGE

• Data is the backbone of modern scientific research

• Exchange of data is paramount to successful interaction between research groups

OPEN DATA

Publications and

conferences

Data exchanged between

researchers (email, etc)

Virtual Research Environment

papers

Data files

Clouds (infrastructures)

Page 6: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

TOWARD A VIRTUAL SCIENTIFIC ENVIRONMENT

We are not alone in this

development

(maybe a bit behind)

Page 7: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

DATA PERSPECTIVE GOALS: SOCIAL NETWORK

Need

tools

to

promote

efficient

data

sharing

within

the

community

Page 8: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

DATA PERSPECTIVE: MANY SOURCES

Need

to

accommodate

the

varied

data

that

needs

to

be

handled

Page 9: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

DATA PERSPECTIVE: INTERRELATIONSHIPS

There

is

no

such

thing

as

an

isolated

data

point

Page 10: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

DATA PERSPECTIVE: QUALITY CONTROL

Reproducibility

Reliability

AccountabilityDue to accountability requirements (financial incentives):

data managing tools are already being used

An important aspect of interdependency of data

is quality control

(calculation of sensitivity or error bars)

Efficient data exchange and availability

(beyond just published data)

is the key

Page 11: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

ACCOUNTABILITY: ELECTRONIC LAB NOTEBOOKS (ELN)

In other fields

(pharma)

accountability

has

financial

motivations

(patents)

and

lead

to the

development and use

of ELNs

Page 12: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

TOWARDS EFFICIENT DATA EXCHANGE

SMARTCATS

WG4

Page 13: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

PRESENTATION

• Introduction (with disclaimers and revisions)

• Motivation: • Data exchange moving into the clouds

• WG4: • Standard definition for data collection and mining toward a virtual

chemistry of Smart Energy Carriers

• WG4 Task Force:• Toward efficient data exchange in the combustion community

Page 14: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

WG4 SUMMARY

DATA

WG4 can be summarized in one word:

Management of data:

Use of data

How do we keep track of, exchange and manage all the data

that is generated by the SMARTCATS community

How can we efficiently use the immense amount of data

that the SMARTCATS community generates

Page 15: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

WG4: TITLE

Standard definition

for data collection and mining

toward a virtual chemistry of smart carriers

Page 16: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

WG4 CHALLENGE

The main challenge of this WG is to provide a

forum for all experts in the combustion

community to formulate a common set of

requirements for a universal combustion

database not only capable of efficiently

store the vast amount of raw data generated

by experiments and modeling but also, more

importantly, efficiently accessible for

future use and maintenance.

Page 17: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

Page 18: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

WG4: AIMS

• Identification of the main requirements and

tools for the development of databases,

software and mathematical tools for data

collection and handling as well as chemistry

optimization using data mining techniques.

• Definition of “crucial” experiments and

simulations, uncertainty and sensitivity

analysis in combustion modeling

Page 19: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

DEFINITIONS, REQUIREMENTS AND TOOLS

Page 20: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

WG4: INCREASED DIALOG ABOUT DATA

Page 21: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

WG4: DATA PERSPECTIVE

Definition of specific sets of prerequisites and

goals for the establishment of a

combustion database that will allow

efficient electronic communication of

combustion-related data.

Page 22: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

PRESENTATION

• Introduction (with disclaimers and revisions)

• Motivation: • Data exchange moving into the clouds

• WG4: • Standard definition for data collection and mining toward a virtual

chemistry of Smart Energy Carriers

• WG4 Task Force:• Toward efficient data exchange in the combustion

community

Page 23: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

WG4 TASK FORCE

Goal: To use the expertise within the action

to promote efficient data exchange

among combustion researchers

First task: Cataloging

1. State of the art (in and out of the community)

2. Data within the community

Page 24: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

GOALS OF PHASE I: CATALOGING

• Roles and Perspectives:

• For each role/perspective catalog a prioritised set of requirements,

expectations and desires

• Data to Disseminate

• For each apparatus and tool, outline (in words, mainly) the data

that could/should be available, from raw data to final published

results.

• Current efforts (inside the Action and outside)

• Catalog how different groups are storing data

• Catalog other data handling from other disciplines

• Projects/proposals/discussions having to do with data

Page 25: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

PERSPECTIVES AND ROLES

• User: Interested in using the tools to acquire and use the data.• In this role, the user is interested accessing data in a convenient and efficient way.

The user is also interested in what data is available.

• Generator: Generates data, both experimental and theoretical.• The first focus is how much, in which detail and in what form the data should/can be

disseminated.

• An important aspect of this is to make this as painless and efficient as possible so as

to not generate more burden.

• Software/Database Developer: Developer of the tool.• From User: How and in what form the data can be accessed.

• From Generator: Incorporating their data into whatever system they are developing.

Page 26: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

LEVELS OF DATA

•Public: tables or figures within the text of the publication, or as more

detailed information in supplementary material

•Preliminary: Data leading up to published data

•Experiment: Data directly from the device, uninterpreted and

unedited.

•Intermediate: Data that has been process, but basically very device

dependent and not necessarily useful to others. In a sense, this is only

useful within the research group.

•Collaboration: Data that is useful to exchange among

(knowledgeable?) colleagues and collaborators

Page 27: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

PARTICULAR FOCUS

•Preliminary:

•Data leading up to published data

• Accessibility

• Usefulness

• Characterisation

• Breadth of exchange: Public, collaborators, within group…

Page 28: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

CATALOGING

WHATdata to be cataloged and availabilityhas to be cataloged first (a major goal of first phase)

HOWthe data is to be stored

is of secondary importance:

• Catalog with respect to particular devices and models

• Within each:

• What are the data types and forms

• Characterisation of the data

• Quality of the data

• Usefulness

Page 29: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

HOW: SECONDARY FOCUS

• De facto standards• In moving towards electronic representation, the community is already in the

process establishing standards

• Convenience:• Researchers generate data and ‘store’ it is the most convenient form

available to them (generation of data is primary concern).

• Software:• As long as the format is ‘consistent’, intelligent software can interpret it and

then convert to another ‘standard’ form.

HOWthe data is to be stored

is of secondary importance:

Page 30: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

DIFFERENT ‘STANDARDS’: TWO TYPICAL FORMATS

<?xml version="1.0" encoding="utf-8"?>

<experiment>

<fileAuthor>Chemical Kinetics Laboratory, Institute of Chemistry, ELTE, Budapest, Hungary</fileAuthor>

<fileVersion>

<major>1</major>

<minor>0</minor>

</fileVersion>

<ReSpecThVersion>

<major>1</major>

<minor>0</minor>

</ReSpecThVersion>

<bibliographyLink preferredKey="N. Leplat, P. Dagaut, C. Togbe, J. Vandooren,

Combust. Flame 158 (2011) 705-725, Fig. 9, C3H6 not taken"/>

<apparatus>

<kind>stirred reactor</kind>

</apparatus>

<experimentType>Jet stirred reactor measurement</experimentType>

<commonProperties>

<property description="" label="P" name="pressure" units=“atm">

<value>1</value></property>

<property description="" label="V" name="volume" units=“cm3">

<value>30</value></property>

<property description="" label="tau" name="residence time" units="s" >

<value>0.07</value></property>

<property name="initial composition">

<component><speciesLink preferredKey="C2H5OH" />

<amount units="mole fraction">0.002</amount></component>

<component><speciesLink preferredKey="O2" />

<amount units="mole fraction">0.024</amount></component>

<component><speciesLink preferredKey="N2" />

<amount units="mole fraction">0.974</amount></component>

</property>

Table 1

Experiment Type: Jet stirred reactor measurement

Paper Title: Oxidation of Cyclohexane in a Jet-Stirred

Reactor

Common Properties

Pressure: 106.7 kPa

Volume: 30 cm3

Phi: 0.5

Residence Time: 2 s

Fuel inlet mole fraction: 0.0067

Temperature range: 500 - 1100 K

Inlet mole composition

CH3CHO 0.0067 mole

fraction

H2 0.0345 mole

fraction

N2 0.9 mole

fraction

Temperature(K)/Mole

Fraction

H2 O2 CO CO2

500 0 5.87E-02 0 0

525 0 5.63E-02 0 0XML format from ReSpecTh Spreadsheet: CloudFlame

State of the art: what is in use now….

Page 31: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

Review the Hierarchical Data Format (HDF5) used by PrIMe database; hierarchy enables

extension (new groups).

Page 32: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

XML DATA REPRESENTATION: WE ARE NOT ALONE

XML is the language of the internet == many tools for its manipulation

Important note:

Though understandable for humans,

not necessarily convenient to generate

need tools

Gaining ground in scientific computing

Page 33: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

SOFTWARE SOLUTION SUPPORTING MANY FORMATS

From a software technical point of view:

interchange between formats

Example

in

computational

chemistry

Page 34: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

DATABASES WITHIN THE ACTION

http://respecth.chem.elte.hu/respecth

http://www.chemicalkinetics.info

http://primekinetics.org/

Page 35: EU COST Action CM1404: WG€ - Efficient Data Exchange

Sustainable and Smart Energy Carriers

for Decentralised Energy Production

OUTPUT

Through input from actors in the SMARTCATS action

a

white paper on

Data within the combustion community

We need YOUR input