scientists whoe stare at data - mathworks · scientists who stare at data | dipl.-ing. jan grüner...
TRANSCRIPT
Scientists who stare at data
Dipl.-Ing. Jan Grüner | MathWorks Automotive Conference 2019
About
TU Berlin // FVB
• Chair of Naturalistic Driving Oberservation for Energetic
Optimisation and Accident Avoidance
• Focus on Electromobility
www.fvb.tu-berlin.de
Scientists who stare at data | Dipl.-Ing. Jan Grüner
2
Myself
• Research Assistant
• Matlab since 2009
• Incl. several years of teaching Matlab
• PHD Thesis: “Usage behavior of hybrid vehicles”
Matlab Projects
• IDCB (Create individual driving cycles from user data)
• Database Toolbox (wrapper functions for MySQL Database
Connector)
• AMPERE (Usage behavior of hybrid vehicles)
• WeatherDB (Weather DB / API to make use of data provided
by the DWD)
We are living in the information processing age
We record everything & everywhere
• Cameras
• Cars
• Smart-X (Phones, Homes)
• IoT
Scientists who stare at data | Dipl.-Ing. Jan Grüner
3
We can store huge amounts of data
• Fast storage
• Large storage
We can transmit data from A to B
• Worldwide
• Instantly / Fast
• Everything is connected
We have the computational power to analyze it
• Server
• Cluster
• Databases
• Software
The challenge is no longer getting data, it‘s processing it
Devices, Services,
DataProcessing Result
(and making sense of)
• With endless possibilities
comes complexity
• Complexity creates chaos
• Data formats
• Competing standards
• Methods
• A huge amount of
information
Scientists who stare at data | Dipl.-Ing. Jan Grüner
4
In tech / data we (blindly) trust
Scientists who stare at data | Dipl.-Ing. Jan Grüner
5
The pessimistic view on hardware /
services
• Sensors are cheap, the device isn‘t
• Updates?
• Security?
• Testing?
• Documentation?
• Support?
• May change at any timeConsecutive datastrem
Wrong unit
Question the data and your work (and everyone else)
The situation
• Assumptions
• Documentation
• Human error
Scientists who stare at data | Dipl.-Ing. Jan Grüner
6
Rules of data processing
• Assume the data is broken
• Mistakes will happen
• Re-check your data / existing processes regularly
"
• The RMS value is inserted into a circular list of 401 entries
• No event is fired until this list is initially filled
• TBD should this list be persisted, or rebuilt after each
reboot?
"
• (Variable-)Names will not change over time
• The documentation is complete
Updates everywhere
Bugs everywhere
➔ Can become false over time
➔ Outdated (best case)
➔ nobody is prefect
Workflow (an example)
Why Matlab
• University (Everything is DIY)
• You want it, you build it
• You built it, you run / manage it
• Maintenance / teaching experience
• Easy to learn / to understand (for students)
• Avoid multiple languages
• Complexity
• Knowledge required
• Development
• Easy to visualize data along the way
• Ready to use toolboxes (or create your own)
Scientists who stare at data | Dipl.-Ing. Jan Grüner
7
Import (a simple) (CSV)
• Column names, units, limits
• Create overallmappingarchitecture
Pre-Analysis
• Remove duplicates
• Data format
Challenges• Read header
• Map data
• Process
Import
• Set datatypes
• Store in database
Storage
Scientists who stare at data | Dipl.-Ing. Jan Grüner
8
GetMD5()
MATLAB to MySQL
➔ MySQL Database Connector
Order, names, units (SI)
• Missing columns
• Changing column names
• No fixed column order
• Changing data units
Validation & Fixes
ValidationValidation
LiteratureLiterature
ValuesValues
MapsMaps
DataData
FormulaFormula
Integration / DifferentiationIntegration /
Differentiation
Literature
• Not always correct either
• Contradicting sources
Data
• Requires Formulas
• Documentation?
• Generally known?
• Signal can be estimated with “basic” math
Scientists who stare at data | Dipl.-Ing. Jan Grüner
9
Literature
Scientists who stare at data | Dipl.-Ing. Jan Grüner
10
Data Literature A
Literature B Estimation
And now?
A: M
AT
TH
É,
R., 2
01
2c. O
pe
l A
mp
era
-E
lectr
ic P
rop
uls
ion
an
d E
ne
rgy S
tora
ge
Syste
m,
Ele
ktr
ik / E
lektr
on
ik in
Hybri
d-
un
d E
lektr
ofa
hrz
eu
ge
n u
nd
ele
ktr
isch
es E
ne
rgie
ma
na
ge
me
nt,
Deu
tsch
lan
d
B: G
RE
BE
, U
.D. u
nd
L.T
. N
ITZ
, 2
01
1.
Vo
lte
c–
Das A
ntr
ieb
ssyste
m f
ür
Che
vro
let V
olt u
nd
Op
el A
mp
era
, M
TZ
-M
oto
rte
ch
nis
ch
e Z
eitsch
rift
, 7
2(5
), 3
42
-35
1. IS
SN
00
24
-85
25
Literature
#Formula Literature
(1)𝑅𝑃𝐴 =
1
𝑥𝑎𝑖
+ ∗ 𝑣𝑖(Bratt und Ericsson, 2000, S. 5, The
European Comission, 2016, S. 12)
(2)𝑅𝑃𝐴 =
1
𝑥න𝑣 ∗ 𝑎+ (Ericsson, 2000, S. 11)
(3)
𝑅𝑃𝐴 = ෑ
𝑘 = 1
𝑒𝑛𝑑
𝑣 𝑘 ∗ 𝑎 𝑘 /𝑚𝑒𝑎𝑛 𝑣(Blanco-Rodriguez, Vagnoni und
Holderbaum, 2016, S. 653)
Scientists who stare at data | Dipl.-Ing. Jan Grüner
11
a : accelerationv : velocityx : distance
Relative Positive Acceleration (RPA)
• Descriptor for the dynamics of a
(driving) cycle
• Assumptions make the difference
• Documentation
(1)
BR
AT
T, H
. u
nd
E. E
RIC
SS
ON
, H
g.,
20
00
. M
ea
su
rin
gve
hic
led
rivin
gp
att
ern
s–
estim
atin
gth
ein
flu
en
ce
ofd
iffe
ren
t m
ea
su
rin
gin
terv
als
. L
un
d, S
ch
we
de
n.
(1)
Co
mis
sio
nR
eg
ula
tio
n (
EU
) 2
01
6/6
46
[o
nlin
e].
In
: O
ffic
ial J
ou
rna
l o
fth
eE
uro
pe
an
Un
ion
[Z
ug
riff
am
: 2
3.
Fe
bru
ar
20
19
]. h
ttp
s:/
/eu
r-le
x.e
uro
pa
.eu
/eli/r
eg
/20
16
/64
6/o
j
(2)
ER
ICS
SO
N,
E.,
Hg
., 2
00
0. D
rivin
gp
att
ern
in u
rba
n a
rea
s-
de
scri
ptive
an
aly
sis
an
d in
itia
l p
red
ictio
nm
od
el. L
un
d, S
ch
we
de
n:
Lu
nd
In
stitu
te o
fT
ech
no
log
y. B
ulle
tin
. 1
85
.
(3)
BL
AN
CO
-RO
DR
IGU
EZ
, D
., G
. V
AG
NO
NI u
nd
B.
HO
LD
ER
BA
UM
, 2
01
6.
EU
6 C
-Se
gm
en
t D
iese
l ve
hic
les, a
ch
alle
ng
ing
se
gm
en
tto
me
et
RD
E a
nd
WL
TP
re
qu
ire
me
nts
. IF
AC
-Pa
pe
rsO
nL
ine,
49
(11
), 6
49
-65
6. IS
SN
24
05
89
63
.
Data
Velocity
• Formula
• 𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦𝐸𝑆𝑇 =2∗π∗𝑟𝑤ℎ𝑒𝑒𝑙∗𝑠𝑝𝑒𝑒𝑑𝐸𝑀𝐵
𝑟𝑎𝑡𝑖𝑜𝐴∗𝑟𝑎𝑡𝑖𝑜𝐵
• (expected) Value range
• 0 : 170 km/h
• Conditions
• Validation only possible in a specific electric mode
Scientists who stare at data | Dipl.-Ing. Jan Grüner
12
Data
Acceleration
• Formula
• 𝑎𝑐𝑐𝑒𝑙𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝐸𝑆𝑇 =𝜕𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦
𝜕𝑡𝑖𝑚𝑒
• (expected) Value range
• -10 m/s² : +10 m/s²
Scientists who stare at data | Dipl.-Ing. Jan Grüner
13
Data
Scientists who stare at data | Dipl.-Ing. Jan Grüner
14
(fixed) Data
Steps (so far)
• Look at all data
• RMSE values for identification
• Original vs. Estimate histograms
• Literature (if available)
• Look at individual data
• Plot individual trips
• Correct / Fix
• Your code
• Assumptions
Scientists who stare at data | Dipl.-Ing. Jan Grüner
15
Handling outliers
(monitor) Data
Scientists who stare at data | Dipl.-Ing. Jan Grüner
16
Bugs
• Sensor stopped working correctly for unknown reasons
• Bug reported in January (still not fixed)
Steps
• ?
Lessons learned
know
validatefix
work
• There will be new issues
• Re-check your (raw-)data & assumptions
• Make sure you are not the problem
• Test your code
• Review your code
• Make proper bug reports
• Provide examples (figures if possible!)
• Don’t expect the bugs to get fixed
• Documentation saves time
• Use it whenever possible
• Read it properly
• Create your own (and keep it up to date)
Scientists who stare at data | Dipl.-Ing. Jan Grüner
17
Lessons learned
• Nobody is perfect, neither is software
• Nothing is ever really finished
• Expect things to change / go wrong
• Look at your data and understand what you are looking at
• Know what your data is supposed to look like
• Never stop learning
• Decent hardware is a big help
Unexpected MATLAB lessons (over the years)
• read(datastore) doesn‘t always read the whole file
• webread() may not always return a variable (or produce an
error)
• use wget() instead of mget()
• Different behaviour of the same function in different releases
• MATLAB and Linux work best from a terminal in software
mode (best guess: buggy gfx driver)
• Never stop upgrading to the next release
Scientists who stare at data | Dipl.-Ing. Jan Grüner
18
End
Scientists who stare at data | Dipl.-Ing. Jan Grüner
19