scott pickett – ww informix technical sales

117
© 2011 IBM Corporation June 27, 2022 Scott Pickett – WW Informix Technical Sales Scott Pickett – WW Informix Technical Sales For questions about this presentation contact: [email protected] TimeSeries TimeSeries

Upload: wilson

Post on 13-Jan-2016

24 views

Category:

Documents


3 download

DESCRIPTION

TimeSeries. Scott Pickett – WW Informix Technical Sales For questions about this presentation contact: [email protected]. Agenda. Time Series Real Time Loader. Key Points. Describe two complex extenders. TimeSeries Real Time Loader Why time series data can be a problem. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Scott Pickett – WW Informix Technical Sales

© 2011 IBM CorporationApril 21, 2023

Scott Pickett – WW Informix Technical SalesScott Pickett – WW Informix Technical SalesFor questions about this presentation contact: [email protected]

TimeSeriesTimeSeries

Page 2: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation2

Agenda

Time Series Real Time Loader

Page 3: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation3

Key Points

Describe two complex extenders.

– TimeSeries

– Real Time Loader

Why time series data can be a problem. How TimeSeries solves these problems. Applications. Why streaming data can be a problem. How the RealTime Loader solves these problems.

Page 4: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation4

TimeSeries

TimeSeries:– What is a time series ?

• Regular

• Irregular

– Where they are used ?

– What they are used for ?

– How they are used ?

112

2

3

45

67

8

9

10

11

Page 5: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation5

Regular TimeSeries Data

Timestamps have a regularly repeating pattern of intervals daily, hourly, etc...

Might be breaks in the pattern:– A work week is data captured for 5 days in a row then nothing for 2 days.

Only one piece of data per interval:– If an interval has not been inserted into then it has the value of NULL– Intervals not inserted into occupy some space

• Heuristic: intervals not inserted into at the end of series do not take up space.

Can be thought of as an array. Optimized to return data at offset, not timestamp.

Page 6: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation6

Irregular Time Series Data

Data in an irregular time series does not have a regularly repeating pattern of intervals:– Any interval may have zero or more pieces of data.

Missing data takes no space on disk:– There really is no concept of missing data.

Only efficient way to access data is by timestamp:– You can retrieve the Nth piece of data, but the code does a linear search.

Data can be stair stepped:– Value persists until next value arrives – for example stock prices.

Data can be discrete points:– Value is valid only at the given time – for example heart beats.

Page 7: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation7

The TimeSeries "Problem"

Timeseries are usually stored as "tall – thin" tables with a very large number of rows.

Need fast access to an ordered set of rows. May need one index to enforce uniqueness and

another for index only read, more space used for index than data.

Can be difficult to write SQL to work with the data.

Page 8: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation8

TimeSeries in Traditional RDBMS

Irregular time series of stock "ticks".

Typical volumes are 50M per day and increasing.

Could access by "Stock" and "Timestamp" but often need an index on all columns for index only read.

25012.542006-01-01 09:30:01.70000ABC

10000194.002006-01-01 09:30:01.05000KKK

1012.442006-01-01 09:30:00.19000ABC

10009.342006-01-01 09:30:00.01000XYZ

10012.342006-01-01 09:30:00.00000ABC

VolumePriceTimestampStock

43 MbIndex on all columns

28 MbIndex on stock + timestamp

35 MbData

Page 9: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation9

TimeSeries in Traditional RDBMS

Can be difficult to write SQL to process tick data:

– Applying stock splits.

– Calculating Volume Weighted Average Price (VWAP)

select t.tstamp, t.price * s.fac, t.vol/ s.fact from tick t, split s where t.stock = s.stock and t.tstamp::datetime year to day = s.tstamp and t.stock = "ABC" and t.tstamp::date between ('2006-01-01' and '2006-01-05')

select stock, ststamp::datetime year to hour, sum(price * vol) / sum(vol) from tick where stock = "ABC" and tstamp::date between ('2006-01-01' and '2006-01-05') group by 1, 2

Page 10: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation10

TimeSeries Solution (1 of 7)

Store timeseries elements as an ordered set of elements:– Uses less space because the "key" is factored out and the

time can be represented in 4 bytes (regular).

– Access is as fast (or faster) than index only read but no additional index.

– SQL can be made much simpler.

Page 11: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation11

TimeSeries Solution (2 of 7)

Store timeseries data as:– Start time.– Calendar information.– Ordered set of elements.

2006-01-01, daily, {(12.34,1234567), (12.56,950000), (12.34,5555567),..}ABC

2003-09-01, daily, {(9.34,890567), (9.56,989000), (9.40,1000780),..}XYZ

2006-05-05, daily, {(199.08,678900), (198.55,345000), (198.12,850250),..}KKK

TicksStock

Page 12: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation12

TimeSeries Solution (3 of 7)

How a timeseries is stored:– Start time and calendar name.– Index (btree) of the first element in each

page.– Each page contains all the elements

between 2 timestamps.

30 MbTimeseries

43 MbIndex on all columns

28 MbIndex on stock + timestamp

35 MbData

– Traditional methods use 2 to 3 times more space or even more depending on the number of NULL values.

Page 13: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation13

TimeSeries Solution (4 of 7)

Performance:– Slower for singleton selects.– Faster accessing sets of data.– Much faster combining timeseries.

1000

3

1

Aggregate 80,000 1,000 consec.

f(t1, t2)

1,000 consec.Singleton

400

2

1

2

.5

1 1Timeseries

650Traditional (2)

5Traditional (1)

– Traditional(2) = index on key columns.– Traditional(1) = index on key and all columns.

Page 14: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation14

TimeSeries Solution (5 of 7)

Much simpler SQL, consider the "stock split"/calibration problem:

select t.tstamp, t.price * s.fac, t.vol/ s.fact from tick t, split s where t.stock = s.stock and t.tstamp::datetime year to day = s.tstamp and t.stock = "ABC" and t.tstamp::date between ('2006-01-01' and '2006-01-05')

select func(ticks, splits, '2006-01-01', '2006-01-05') from tick where stock = "ABC"

Page 15: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation15

TimeSeries Solution (6 of 7)

Much simpler SQL, the VWAP problem:

select stock, vwap(ticks, '2006-01-01', '2006-01-05', '01:00:00') from tick where stock = "ABC"

select stock, tstamp::datetime year to hour, sum(price * vol) / sum(vol) from tick where stock = "ABC" and tstamp::date between ('2006-01-01' and '2006-01-05') group by 1, 2

Page 16: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation16

TimeSeries Solution (7 of 7)

Other Timeseries features:– Calendar support.

– VTI to make time series look like traditional tables.

– C API.

– Java Class Library.

Easy to combine with own or 3rd party code.

Page 17: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation17

TimeSeries Practicalities (1 of 6)

Calendar Patterns – stored in the CalendarPatterns table

– All time series are associated with these.

– Used to bound the times at which timeseries data is recorded:

• Data recorded during 'on' times.

• Data not recorded during 'off' times.

• Data recorded for the interval specified within the bounds.

• Within the calendar definition.

A Monday to Friday 5 day work week can be represented as 1 day off (Sunday), 5 days on, 1 day off (Saturday): – Insert into CalendarPatterns values

('Five_Day_Work_Week', '{1 off, 5 on, 1 off}, day');

– Note the brackets '{ }' above.

– The trailing day is an interval for a single unit of time.

Page 18: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation18

TimeSeries Practicalities (2 of 6)

How do I create a data type that defines each element in a time series ?– Must first create a row type whose first column timestamp

must be a datetime year to fraction(5) value:

• create row type stockVals ( tstamp datetime year to fraction(5),• price decimal(16,2),• high decimal(16,2),• low decimal(16,2),• median decimal(16,2),• volume integer,• trades integer,• …..

• );– Each timestamp must be unique.– Row types cannot contain types:

• blob, clob, text, serial

Page 19: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation19

TimeSeries Practicalities (3 of 6)

Once a calendar pattern is established, create the calendar to use it in the table CalendarTable:

• insert into CalendarTable ( c_name, c_calendar) • values ('Trading Week','startdate(2010-01-01

00:00:00.00000), pattstart(2010-01-07 00:00:00.00000), pattname(Five_Day_Work_Week)');

Regular timeseries calendars are used to guide the mapping between the timestamp and the offset

– Represented by Calendar Data Type, consisting of:

• Start date of the calendar.• Calendar repeating pattern during the duration of the data type for the

valid and invalid recording of the data.• Interval – calibration of the calendar pattern.• Starting date of the calendar pattern

– Must be >= Calendar start date and

– Less than one calendar pattern length after calendar start.

Page 20: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation20

TimeSeries Practicalities (4 of 6)

Containers are required data structures that hold data for one or more time series.

You cannot mix data for regular and irregular time series in the same container.

How do I create a container to store a time series ?– execute procedure

TSContainerCreate(“ts_ctrname”,”dbspace1”,”stockVals”,40000,40000); A container is made up of index pages and data pages. Containers allow time series data to be spread onto many disk

partitions.

Page 21: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation21

TimeSeries Practicalities (5 of 6)

How do I create a table that includes a time series ?– create table dailyStocks (• stockId integer,• abbrev char(4),• stockData timeseries(stockVals));

How do I create a time series record ?– insert into dailyStocks values (901, 'IBM', TSCReate('Trading Week','2010-01-03 00:00:00.00000', 20, 0, 0, 'stockCon'));

Page 22: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation22

TimeSeries Practicalities (6 of 6)

How do I add data to a time series ?– update dailyStocks• set stockData = PutElem(stockData, row(NULL::datetime year to fraction(5), 3.3, 4.4, 2.2, 3.0, 123456, 789, NULL)::stockData)

• where stock = 'XYZ';

How do I query a row with a time series ?– select clip(stockData, current – interval(1) month to month, NULL) from stockData;

Get all of IBM since the beginning of the 2001– select clip(series, ‘2001-01-01 00:00:00.00000’, Current) from S_P_500 where name = ‘IBM’;

Page 23: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation23

Building Applications with TimeSeries Several interfaces are available:

– SQL– VTI– SPL– Java– C-API

It’s a toolkit approach! Allow people to build their analytics.

Page 24: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation24

TimeSeries SQL Interface

TimeSeries data is usually accessed through user defined routines (UDR’s) from SQL, some of these are:– Clip() - clip a range of a time series and return it.– LastElem(), FirstElem() - return the last (first) element in the

time series.– Apply() - apply a predicate and expression to a range of

elements in a time series.– AggregateBy() - change the interval of a time series using a

aggregate function.– SetContainerName() - move a time series from one container to

another.– BulkLoad() - load data into a time series from a file.

See Appendix A for a list of all of SQL Interface UDR's.

Page 25: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation25

TimeSeries VTI Interface

Makes time series data look like standard relational data:– Useful for programs that can’t handle objects.– Useful when application connects with ODBC.

There is a small (10%) penalty for using VTI. Restrictions:

– A VTI table can only reference one time series column from the base table.

– No secondary indices are allowed. SQL to create a VTI table:

– execute procedure tscreatevirtualtab(‘t_vti’, ‘t’);

Page 26: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation26

TimeSeries C-API Interface

Client and server versions of the API.

Treats a time series like a table (sort of). Functions to:

– Open and close a time series. – Scan a time series between 2 timestamps.– Create a time series.– Retrieve, insert, delete, update.

Plus another 70 functions defined. See Appendix C for a list of these.

Page 27: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation27

Stored Procedure (SPL) Example

-- count non-null elements in a time series create function spl_test(arg lvarchar) returning integer define var daybar; define cnt integer; let cnt = 0; foreach execute function transpose((select series from S_P_500 where name = arg)) into var let cnt = cnt + 1; end foreach return cnt; end function;

Page 28: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation28

TimeSeries Application

Existing applications:– Finance (Capital Markets).

– Some in Manufacturing and Environmental.

– Smart Metering / Electrical Grid Usage

Possible applications:– Telco.

• Real Time Cell Phone Tower Data

• Sales/sales analysis.

– Energy• Oil Field Geology data.

– Water/Sewer System Usage– Satellite Telemetry – Assembly Line Q/A & Analysis– Your application ?

Page 29: Scott Pickett – WW Informix Technical Sales

© 2011 IBM CorporationApril 21, 2023

Real Time Loader

Page 30: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation30

RealTime Loader (1) Next problem after time series

"problem" solved. Large number of ticks generated

at market open/close and in reaction to events.

Money is made at these times. Need to make the data available

as soon as possible.

Ticks per minute

0

10,000

20,000

30,000

40,000

50,000

60,000

Time(hours)Ti

cks

per m

inut

e

Edison Stock Ticker - 1892

Page 31: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation31

RealTime Loader (2)

Traditionally data can be loaded very quickly by loading it in batches. This causes problems:– It takes time for the data to become available.

– The table can be locked while the data is loaded.

The RealTime Loader solves this problem by holding the data in memory and making it available to the SQL layer as if it was already in the database:– Allows real-time analysis of data as it come thru the feeds

• Uncommitted and committed data reads are possible.

Page 32: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation32

RealTime Loader Architecture

Data Feeds

RTL MemoryRTL RTL

Database Memory

Database Server

Client

Page 33: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation33

RealTime Loader Performance (1 of 3)

Load performance of traditional systems depends on the system but rates of 1,000s of ticks per second are reasonable but there is a high latency as the data is batched and loaded.

With the RTL the data is made accessible within a few ms. There are 2 load rates to consider:– How fast the data can be added to shared memory.– How fast the data can be loaded into the database.

Page 34: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation34

RealTime Loader Performance (2 of 3)

How fast can the data be added to shared memory ?– Rates of 10,000s per cpu sec are common.

How fast can the data can be loaded into the database ?– This is roughly the same as the traditional RDBS solution.

The system can run at the "top" rate until the shared memory allocated to RTL is full.

Page 35: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation35

RealTime Loader Performance (3 of 3)

The peaks don’t last very long, the system can be sized for something just above the average rate.

Traditional systems have to be sized to handle the peak rate or the user has to accept a high latency during the crucial market open/close periods.

Ticks per minute

0

10,000

20,000

30,000

40,000

50,000

60,000

Time(hours)

Tic

ks p

er m

inu

te

Page 36: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation36

RTL Data Access

Access can be through the usual SQL layer. Even faster access through an API that gets the data

directly from the shared memory segment. The SQL layer can see all the ticks, or just those in

RTL, or just those on the database.

Ticks in RTL Shared Memory

Ticks in Database

Page 37: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation37

RealTime Loader Analytics

The RealTime Loader can do some simple aggregation and statistics on the incoming data making this available to SQL or through the API:– VWAP - Volume Weighted Moving Average Price

• = total amount spent on a security / total number of shares traded.

• This needs all of the share purchase history stored unbounded by time.

– EWAP – Exponentially Weighted Moving Average Price• A share price moving average where previous moving average

data is considered but dampened by the effect of the new, more recent moving average.

• Only needs 3 pieces of data:– the old and new MAP data.

– plus the damping factor.

– max, min etc.

Page 38: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation38

RealTime Loader Applications

Stock ticker trades data analysis. Gaming Telco:

– Cell Phone Tower Call Analysis Apps.– Cell Phone Account Apps.

24 x 7 Satellite Telemetry Data Applications:

– Millions of records per minute ....... Geological Analysis Applications:

– Oil Exploration Data. Land Resource Analysis Applications. Water/Sewer Flow Data

Page 39: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation39

Conclusion

TimeSeries allows time series data to be:– Efficiently stored.– Quickly accessed.– Conveniently queried.

The RealTimeLoader allows time series data to be:– Quickly stored.– Accessed by applications with the minimum of delay.

Appendix D has more information on a recent successful POC and customer win worth a read.

Page 40: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation40

Questions?

Page 41: Scott Pickett – WW Informix Technical Sales

© 2011 IBM CorporationApril 21, 2023

Time Series Improvements 11.70

Page 42: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation42

Time Series – Delete Performance Improvements

Delete Performance Improvements – When a user deletes rows from a table that contain a TimeSeries, the

TimeSeries needs to be deleted. Since they can be very large, improvements on the delete performance is important to provide a good response time to the users. Performance improves as more records are deleted:• < 30,000 --> 2X faster• 100,000 --> 3X• 1,000,000 --> 18X• 3,000,000 --> 101X

Page 43: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation43

Time Series – Container Improvements

Container Improvements:– A TimeSeries is stored in a container. We removed the requirement of

having to identify or even create containers when inserting a new TimeSeries into a table. This makes using TimeSeries more user-friendly.

– The container name limit was raised from 18 characters to 128 characters to allow customers to be more descriptive in the naming of their containers.

– Container statistics functions were added so that System Administrators can get a better view of their space utilization and

therefore can better plan for future upgrades.

Page 44: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation44

Container Pools

New column (pool) in the function TSContainerTable.

autopool: default pool of containers created by auto-create container.

Create containers (as before):TSContainerCreate(<container_name>, <dbspace>,

<initial_size_kb>, <next_size_kb>);

Add a container to a pool:TSContainerSetPool(<container_name>, <pool_name>);– Pool created automatically.

– A container can be in only one pool at a time. Remove a container from its current pool:

TSContainerSetPool(<container_name>);

Page 45: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation45

Container Selection

Search autopool for a set of candidate containers.

Select the container with the greatest amount of free space.

Page 46: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation46

Pool Policy: TSContainerPoolRoundRobin

Select a container from a pool in a round robin manner and return the name.

Example*: INSERT INTO smartmeters(meter_id,rawreadings)

VALUES ('met00001','origin(2006-01-01 00:00:00.00000), calendar(smartmeter),regular,threshold(0),

container(TSContainerPoolRoundRobin(readings)), [(33070,-13.00,100.00,9.98e+34),

(19347,-4.00,100.00,1.007e+35), (17782,-18.00,100.00,9.83e+34)]');

Position kept in memory.

* Note: the call to TSContainerPoolRoundRobin is a string.

Page 47: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation47

Pool Policy: TSContainerPoolRoundRobin

function TSContainerPoolRoundRobin( tabname lvarchar, colname lvarchar, subtype lvarchar, irregular integer, poolname lvarchar)returns lvarchar;

tabname: Table name where the TS value is being inserted. colname: TS column name where a TS value is being inserted. subtype: Subtype TS value (row type) being inserted. irregular: Kind of TS (0 – regular, 1 – irregular, integer). poolname: Name of the container pool to use.

Page 48: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation48

Pool Policy: Create your own

function <pool_policy_name>( tabname lvarchar, colname lvarchar, subtype lvarchar, irregular integer [,user_data lvarchar] )returns lvarchar;

Tailor the container selection policy to fit your environment.

Page 49: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation49

Time Series – New Container Functions (1)

All functions take a container name or NULL– If NULL, cover all containers

TSContainerTotalPages– Total number of pages allocated to a container.

TSContainerTotalUsed– Number of pages containing TimeSeries data.

TSContainerNElems– Total number of elements stored in a container.

TSContainerPctUsed– (used * 100) / total

TSContainerUsage– Returns used, nelems, and total

Page 50: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation50

Time Series – New Container Functions (2)EXECUTE FUNCTION TSContainerTotalUsed(NULL);

pages 2029

EXECUTE FUNCTION TSContainerTotalPages(NULL); total 2169

EXECUTE FUNCTION TSContainerNElems(NULL); elements 241907

EXECUTE FUNCTION TSContainerPctUsed(NULL); percent 93.545

EXECUTE FUNCTION TSContainerUsage(NULL); pages slots total 2029 241907 2169

Page 51: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation51

Time Series – VTI Improvements

Virtual Table Interface (VTI) Improvements– The VTI view of a table containing a TimeSeries now has more

complete information to the user on the column’s characteristics: • Allow nulls in non-primary keys.• More than one “view” can be created on a TimeSeries.• Expression pushdown (keyword substitution).

A VTI view can include pre-processing with TimeSeries functions that will give the user the exact view he wants to see in a relational format while getting the performance of the TimeSeries API functions. – Example:

• Creating an hourly moving average. • Getting the max value per day.

Page 52: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation52

TS VTI Improvements

Make the NOT NULL constraint on base table visible through VTI.

Use only the primary key columns to locate the appropriate row to update:

– Benefit: Allow NULLs in non-primary key columns.– If the primary key of the base table is altered, any TS VTI tables

based on that base table will need to be recreated.

– If no primary key, use all non-TS columns. New TSVTMode flags

– TS_VTI_UPDATE_NONKEY_NOT_NULLS (32)

• Only not null updatable columns of the insert row will be added to the set clause.

– TS_VTI_UPDATE_NONKEY_INCLUDE_NULLS (64)• All the updateable columns of the insert row will be added to the set

clause.

Page 53: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation53

VTI Expression Pushdown

Provide pre-processed views of TimeSeries

Example:“Create a hourly moving average and get the max value per day”(Table: smartmeters, column: readings)

EXECUTE PROCEDURE TSCreateExpressionVirtualTab( 'smartmeters_expr1_vti', 'smartmeters', "AggregateBy('max($value)','smartmeter_daily', Apply('TSRunningAvg($energy, 4)', $ts_begin_time, $ts_end_time,readings )::TimeSeries(onereal_row), 0)",'onereal_row');

Page 54: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation54

VTI Expression Pushdown: Result

SELECT * FROM smartmeters_expr1_vti WHERE t >= '2011-01-01 00:00:00.00000'::datetime year to

fraction(5) AND t <= '2011-01-02 23:59:59.99999'::datetime year to

fraction(5);

meter_id t value

met00000 2011-01-01 00:00:00.00000 30.25000000000met00000 2011-01-02 00:00:00.00000 29.50000000000met00001 2011-01-01 00:00:00.00000 29.75000000000met00001 2011-01-02 00:00:00.00000 31.00000000000. . .

Page 55: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation55

VTI: TSCreateExpressionVirtualTab

More than one “view” can be created on a TimeSeries.

Very similar to TSCreateVirtualTab but a read-only interface.

TSCreateExpressionVirtualTab( VirtualTable VARCHAR(128,1), BaseTablename VARCHAR(128,1), expression LVARCHAR, subtype VARCHAR(128,1) TSVTMode INTEGER DEFAULT 0 TSColName VARCHAR(128,1) DEFAULT NULL);

Page 56: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation56

VTI Expression Pushdown

Keyword substitution: – $ts_begin_time, $ts_end_time

Used to substitute the time from the runtime query in the expression view:

Example expression: Apply('TSRunningAvg($energy, 4)', $ts_begin_time, $ts_end_time, readings )::TimeSeries(onereal_row)

Query including:

WHERE t >= '2011-01-01 00:00:00.00000'::datetime year to fraction(5) AND t <= '2011-01-02 23:59:59.99999'::datetime year to fraction(5);

Page 57: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation57

Time Series – All IDS Platform Support

Increased Platform Support– Customers now have more choices when it comes to platform support

to run Informix with TimeSeries:• Redhat Linux 32 and 64 bit• Mac OS 10• Linux on pSeries• Sol x86_64

– TimeSeries is now built in for all platforms for IDS 11.70xC3:• https://www-304.ibm.com/support/docview.wss?rs=630&uid=swg27013343&

S_CMP=rnav

112

2

3

45

67

8

9

10

11

Page 58: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation58

Time Series – New Standard Calendars

General Usage Improvements– Customers don't need to define commonly used calendars. They

can now use new, built-in, predefined calendars based on the following intervals of time:

Calendar Interval Calendar Name

1 minute ts_1min

15 minutes ts_15min

30 minutes ts_30min

1 hour ts_1hour

1 day ts_1day

1 week ts_1week

1 month ts_1month

Page 59: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation59

Time Series - XML Generation (1)

Customers can now use a standard function format to exchange messages using XML. With this capability, they can easily transform TimeSeries data into XML messages.

Syntax:

CREATE FUNCTION TSToXML( doctype lvarchar, id lvarchar, ts TimeSeries, output_max integer DEFAULT 0)RETURNS lvarchar;

Page 60: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation60

Time Series - XML Generation (2)

doctype: Name of the topmost XML element id: Identification for the TimeSeries ts: TimeSeries to process output_max: Max output size, 0 default represents 32KB,

minimum of 4096, maximum of 2GB

Output additional element: <AllData>

<AllData>1</AllData>: All data was returned <AllData>0</AllData>: The TS data was truncated to fit the

output max.

Still a valid XML document.

Page 61: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation61

XML Generation Example

SELECT TSToXML('meterdata', esi_id, Clip(actual, '2010-09-08 12:00:00'::datetime year to second, '2010-09-08 13:00:00'::datetime year to second) ) FROM meters WHERE esi_id = '22505613346851835';

<meterdata> <id>2250561334</id> <AllData>1</AllData> <meter_data> <tstamp>2010-09-08 12:15:00.00000</tstamp> <value>0.9170000000</value> </meter_data> <meter_data> . . .</meterdata>

Page 62: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation62

Time Series (7)

The ‘stores demo’ database has been updated with smart meter information so that customers now have a ready to use environment to experiment with TimeSeries and increase their understanding of the technology.

Multi-Cultural Support:

– Time Series now supports multiple locales.

AggregateBy accumulator too small:– Accumulators for smaller integer types have been increased for the

AVG function to prevent overflow.

– This is a bug fix.

Page 63: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation63

Questions?

Page 64: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation64

Logo

Page 65: Scott Pickett – WW Informix Technical Sales

© 2011 IBM CorporationApril 21, 2023

Appendix A - Time Series Built In Routines

Page 66: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation66

Appendix – TimeSeries Built In Routines (1 of 9)Task Type Description Routine Name

Get info from a timeseries. Get the origin. GetOrigin

Get the interval. GetInterval

Get the calendar. GetCalendar

Get the calendar name. GetCalendarName

Get the container name. GetContainerName

Get the user defined metadata. GetMetaData

Get the metadata type. GetMetaTypeName

Determine whether a time series is regular.

IsRegular

Get the instance Id if the timeseries is stored in a container.

InstanceId

Convert between a timestamp and an offset.

Return the offset, given the timestamp. GetIndex

Return the timestamp, given the offset. GetStamp

Count the number of elements.

Return the number of elements. GetNelems

Get the number of elements between two timestamps

ClipGetCount

Page 67: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation67

Appendix - Time Series Built in Routines (2 of 9) Select Individual elements. Get the element associated with a

timestamp.GetElem

Get the element at or before a timestamp.

GetLastValid

Get the element after a timestamp. GetNextValid

Get the element before a timestamp. GetPreviousValid

Get the element at a specified position.

GetNthvalid

Get the first element. GetFirstElem

Get the last element. GetLastElem

Get the last non null element. GetLastNonNull

Get the next non null element. GetNextNonNull

Page 68: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation68

Appendix - Time Series Built in Routines (3 of 9) Modify elements or a set of elements.

Add or update a single element. PutElem

Add or update a single element. PutElemNoDups

Add or update a single element at a given offset.

PutNthElem

Add or update an entire set. PutSet

Delete an element at a given timepoint. DelElem

Delete all elements in a specified time range. DelClip

Delete all elements in a specified time range. DelTrim

Insert an element. InsElem

Insert a set. InsSet

Update an element. UpdElem

Update a set. UpdSet

Put every element of one time series into another time series.

PutTimeSeries

Modify metadata. Update user-defined metadata. UpdMetadata

Page 69: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation69

Appendix - Time Series Built in Routines (4 of 9) Make Elements visible or invisible to a scan.

Make an element invisible. HideElem

Make a range of elements invisible. HideRange

Make an element invisible. RevealElem

Make a range of elements visible. RevealRange

Check for null or hidden elements.

Determine if an element is hidden. ElemIsHidden

Determine if an element is null. ElemIsNull

Extract and use part of a time series.

Extract a period between two timestamps to a set of values and run an expression or function on every entry.

Apply

Extract data between two timepoints. Clip

Clip a certain number of elements. ClipCount

Extract a period that includes a given time. WithinC

Extract a period starting or ending at a given time.

WithinR

Apply a new calendar to a time series.

Apply a calendar. ApplyCalendar

Page 70: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation70

Appendix - Time Series Built in Routines (5 of 9) Create and load time series. Load data from a client file. BulkLoad

Create a regular empty time, a regular populated time series, or a regular time series with metadata.

TsCreate

Create an irregular empty time, an irregular populated time series, or an irregular time series with metadata.

TsCreateIrr

Find the intersection or union of time series.

Build the intersection of multiple time series and optionally clip the result.

Intersect

Build the union of multiple time series and optionally clip the result.

Union

Iterator functions. Convert time series data to tabular form. Transpose

Aggregate Functions. Return a list (collection of rows) containing all elements in a time series.

TSSetToList

Return a list of columns in a timeseries. TSColNameToList

Return a list of columns in a timeseries. TSColNumToList

Return a list containing the columns of a time series plus non time series columns.

TSRowToList

Return a list containing the columns of a time series plus non time series columns.

TsRowNameToList

Return a list containing the columns of a time series plus non time series columns.

TsRowNumToList

Page 71: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation71

Appendix - Time Series Built in Routines (6 of 9) Used within the apply function to perform statistical calculations on a time series.

Perform a sum over a time series type. Sum

Sum SMALLFLOAT or DOUBLE PRECISION values.

TSAddPrevious

Compute the decay function. TSDecay

Compute a running average over a specified number of values.

TSRunningArg

Compute a running correlation between two time series over a specified number of values.

TsRunningCor

Compute a running median over a specified number of values.

TsRunningMed

Compute a running sum over a specified number of values.

TSRunningSum

Compute a running variance over a specified number of values.

TSRunningVar

Compare SMALLFLOAT or DOUBLE PRECISION values.

TSCmp

Return a previously saved value. TSPrevious

Page 72: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation72

Appendix - Time Series Built in Routines (7 of 9)Perform an arithmetic operation on one or two time series.

Add two time series together. Plus

Subtract one time series from another. Minus

Multiply one time series by another. Times

Divide one time series by another. Divide

Raise the first argument to the power of the second.

Pow

Get the absolute value. Abs

Exponentiate the time series. Exp

Get the natural logarithm of a time series. Logn

Get the modulus or remainder of a division of one time series by another.

Mod

Return the argument, is bound to the unary + operator.

Negate

Round the time series to the nearest whole number.

Positive

Page 73: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation73

Appendix - Time Series Built in Routines (8 of 9)Perform an arithmetic operation on one or two time series (cont'd).

Round the time series to the nearest whole number.

Round

Get the square root of the time series. Sqrt

Get the cosine of the time series. Cos

Get the sine of the time series. Sin

Get the tangent of the time series. Tan

Get the arc cosine of the time series. Acos

Get the arc sine of the time series. Asin

Get the arc tangent of the time series. Atan

Get the arc tangent for two time series. Atan2

Apply a binary function to a pair of time series, or to a time series and a compatible row type or number.

ApplyBinaryTsOp

Apply a unary function to a time series. ApplyUnaryTsOp

Apply another function to a set of time series.

ApplyOpTsSet

Reset the origin. Reset the origin. SetOrigin

Page 74: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation74

Appendix - Time Series Built in Routines (9 of 9)

Following routines are used only with regular time series:– Lag

– PutNthElem

– TSCreate

The TSCreateIrr function is only used with regular time series.

Aggregate values in a time series.

Aggregate values in a time series. AggregateBy

Aggregate values in a time series over a specified time range.

AggregateRange

Create a time series that lags. Create a time series that lags the source time series by a given offset.

Lag (regular only)

Manage containers. Create a container. TsContainerCreate

Destroy a container. TsContainerDestroy

Set the container name. SetContainerName

Page 75: Scott Pickett – WW Informix Technical Sales

© 2011 IBM CorporationApril 21, 2023

Appendix B – Calendar Pattern Routines

Page 76: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation76

Appendix - Calendar Pattern Routines Return the intersection of two calendar patterns.

Calendar pattern that has every interval on that was on in both calendar patterns; the rest are off. If the given patterns do not have the same interval unit, the pattern with the larger interval unit is expanded to match the other.

AndOp

Obtain the start date of the pattern for a given calendar.

Returns the start dates of the calendar patterns for each calendar. CalPattStartDate

Convert a calendar from a smaller calendar into a larger calendar pattern.

Collapses the given calendar pattern into destination units, which must have a larger interval unit than that of the given calendar pattern, i.e. Converting hours to days.

Collapse

Convert a calendar from a larger calendar into a smaller calendar pattern.

Converts the given calendar pattern into the destination units, which must have a smaller interval unit than that of the given calendar pattern.

Expand

Return an inverted calendar intervals pattern.

Turns all on intervals off and all off intervals on in the given calendar pattern.

NotOp

Returns the union of the two calendar patterns.

This function returns a calendar pattern that has every interval on that was on in either of the calendar patterns; the rest are off. If the two patterns have different sizes of interval units, the resultant pattern has the smaller of the two intervals.

OrOp

Page 77: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation77

Appendix – Calendar RoutinesReturn the intersection of two calendars.

Returns a calendar that has every interval on that was on in both calendars; the rest are off. The resultant calendar takes the later of the two start dates and the later of the two pattern start dates. If the two calendars have different size interval units, the resultant calendar has the smaller of the two intervals.

AndOp

Return the number of valid intervals in the named calendar between the two timestamps.

CalIndex

Return a set of valid timestamps within a range.

There are two syntaxes: the first syntax specifies the range as between two given timestamps. The second syntax specifies the number of valid timestamps to return after a given timestamp.

CalRange

Return the timestamp at a given number of calendarintervals after a given timestamp.

CalStamp

Returns a DATETIME value containing the start date of a given calendar name.

CalStartDate

Return a calendar that is the result of two others combined with the OR operator

This function returns a calendar that has every interval on that was on in either calendar; the rest are off. The resultant calendar takes the earlier of the two start dates and the two pattern start dates. If the two calendars have different sizes of interval units, the resultant calendar has the smaller of the two intervals.

OrOp

Page 78: Scott Pickett – WW Informix Technical Sales

© 2011 IBM CorporationApril 21, 2023

Appendix C - Time Series API Information

Page 79: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation79

Appendix - Time Series API's Application Programmer Interfaces for a Time Series: Two types -

– Client version – tsfeapi

• Sample Location: $INFORMIXDIR/extend/TimeSeries.4.07.UC7/lib/tsfeapi.a (Unix/linux)

• Similar directory for Windows.• Operates on a private copy of timeseries datum.• Changed data must be written back into the database.

– Function ts_copy is used prior to the execution of a prepared statement.

– Server version – tsbeapi

• Sample Location: $INFORMIXDIR/extend/TimeSeries.4.07.UC7/lib/tsbeapi.a (Unix/Linux)

• Similar directory for Windows.• Operates directly on timeseries datum.

– Direct execution of prepared/unprepared statements.

Header file – tseries.h - – Must be included in header whenever using bsfeapi or tsfeapi

Page 80: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation80

Appendix - Time Series API – Data Structures

ts_timeseries – the header for a time series data. ts_tscan – allows you to look at no more than two time

series elements at a time. ts_tsdesc – A structure containing a time series and the

data structures it works with. ts_tselem – A pointer to on element of a time series.

Page 81: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation81

Appendix – Time Series API Routines Task type Description API Routine SQL Routine

Open and close a time series. Open a time series. ts_open()

Close a time series. ts_close()

Return a pointer to the time series associated with the given time series descriptor.

ts_get_ts()

Create and copy a time series. Create a time series. ts_create() TSCreate, TSCreateIrr

Create a time series with metadata.

ts_create_with_metadata()

TSCreate, TSCreateIrr

Copy a time series. ts_copy()

Free all memory associated with a time series created with ts_copy() or ts_create().

ts_free()

Copy all elements of one time series into another.

ts_put_ts() PutTimeSeries

Page 82: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation82

Appendix – Time Series API Routines (cont'd)

Task type Description API Routine SQL Routine

Scan a time series. Scan a time series. ts_begin_scan()

Retrieve the next element from a scan.

ts_next()

End a scan. ts_end_scan()

Find the timestamp of the last element returned by the ts_next().

ts_current_timestamp()

Return the offset for the last element returned by ts_next().

ts_current_offset()

Make elements visible or invisible to a scan.

Make an element invisible.

ts_hide_elem() HideElem

Make an element visible.

ts_reveal_elem() RevealElem

Page 83: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation83

Appendix – Time Series API Routines (cont'd)

Task type Description API Routine SQL Routine

Select individual elements from a time series.

Get the element closest to a given timestamp.

ts_closest_elem()

Get the element associated with a given timestamp.

ts_elem() GetElem

Get the element at a specified position.

ts_nth_elem() GetNthElem

Get the first element. ts_first_elem() GetFirstElem

Get the last element. ts_last_elem() GetLastElem

Find the next element after a given timestamp.

ts_next_valid() GetNextValid

Find the last element before a given timestamp.

ts_previous_valid() GetPreviousValid

Find the last element at or before a given timestamp.

ts_last_valid()

Page 84: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation84

Appendix – Time Series API Routines (cont'd)

Task type Description API Routine SQL Routine

Update a time series. Insert an element. ts_ins_elem() InsElem

Update an element. ts_upd_elem() UpdElem

Delete an element. ts_del_elem() DelElem

Put an element in a place specified by a timestamp.

ts_put_elem()ts_put_elem_no_dups()

PutElemPutElemNoDups

Append an element. ts_put_last_elem()(regular only)

Put an element in a place specified by an offset.

ts_put_nth_elem()(regular only)

Modify metadata. Update metadata. ts_update_metadata() UpdMetaData

Convert between an index and a timestamp.

Convert timestamp to index.

ts_index() GetIndex

Put an element in a place specified by an offset.

ts_time() GetStamp

Page 85: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation85

Appendix – Time Series API Routines (cont'd)Task type Description API Routine SQL Routine

Transform an element. Create an element from an array of values and nulls.

ts_make_elem()ts_make_elem_rowdesc()ts_make_elem_no_dups()

Convert an MI_ROW value to an element.

ts_row_to_elem()

Convert an element to a MI_ROW value.

ts_elem_to_row()

Free memory from a time series element created by ts_make_elem() or ts_row_to_elem()

ts_free_elem()

Extract column data from an element.

Get a column from an element by name.

ts_get_col_by_name()

Get a column from an element by number.

ts_get_col_by_number()

Pull columns from an element into values and nulls arrays.

ts_get_all_cols()

Create and perform calculations with timestamps.

Compare two timestamps. ts_datetime_cmp()

Get fields from a timestamp. ts_get_stamp_fields()

Create a timestamp. ts_make_stamp()

Page 86: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation86

Appendix – Time Series API Routines (cont'd)

Task type Description API Routine SQL Routine

Create and perform calculations with timestamps (cont'd).

Calculate the number of intervals between two timestamps.

ts_tstamp_difference()

Subtract N intervals from a timestamp.

ts_row_to_elem()

Add N intervals to a timestamp.

ts_elem_to_row()

Get information about element data.

Find the number of a column. ts_col_id()

Return the number of columns contained in each element.

ts_get_col_cnt()

Get type information for a column specified by number.

ts_get_colinfo_number()

Get type information for a column specified by name.

ts_get_colinfo_name()

Determine if an element is hidden.

ts_ELEM_HIDDEN

Determine if an element is NULL.

ts_ELEM_NULL()

Page 87: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation87

Appendix – Time Series API Routines (cont'd)

Task type Description API Routine SQL Routine

Get information about a timeseries.

Get the name of calendar associated with a timeseries.

ts_get_calname() GetCalendar Name

Return the number of elements in a timeseries.

ts_nelems() GNelems

Return the flags associated with the timeseries.

ts_get_flags()

Get the name of the container.

ts_get_containername() GetContainerName

Determine if the timeseries is in a container.

TS_IS_INCONTAINER()

Get the origin of the timeseries.

ts_get_origin() GetOrigin

Get the metadata associated with the timeseries.

ts_get_metadata() GetMetaData

Determine if the timeseries is irregular.

TS_IS_IRREGULAR()

Page 88: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation88

Appendix – Time Series API Routines (cont'd)

Task type Description API Routine SQL Routine

Get information about a calendar.

Return the number of valid intervals between two timestamps.

ts_cal_index() CalIndex

Return all valid timepoints between two timestamps.

ts_cal_range() CalRange

Return a specified number of timestamps starting at a given timestamp.

ts_cal_range_Index()

Return the timestamp at a given number of intervals after a given timestamp.

ts_cal_stamp() CalStamp

Page 89: Scott Pickett – WW Informix Technical Sales

© 2011 IBM CorporationApril 21, 2023

Appendix D – Oncor POC Win Details and a bit more on TimeSeries

Page 90: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation90

About Oncor and their Solution Who is Oncor?

Oncor is a regulated electric distribution and transmission business that uses superior asset management skills to provide reliable electricity delivery to consumers. Oncor operates the largest distribution and transmission system in Texas, providing power to 3 million electric delivery points over more than 120,000 miles of distribution and 14,000 miles of transmission lines.

Current Solution– Architected by IBM GBS:

• Primary architect Keith Hall.– A number of different components used including Oracle.– They currently manage about 1 million smart meters.– Every meter generates one record every 15 minutes.– ERCOT (Energy Regulatory Council of Texas) requires data be kept

for 25 months.

25 months * 1 million meters every 15 minutes = 73 billion records

(from the Oncor website)

Page 91: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation91

Why Did they Contact Us?

Oracle is barely keeping up with the data– It is taking Oracle about 7 hours to read in the data for 1 million

meters.– The ERCOT reports are taking between 2 and 7 hours to run.– Some problems with reading and writing data simultaneously.

Oncor is planning to expand– In the next 1 to 2 years they expect to manage 3.5 million meters:

• Their 73 billion records stored becomes 220 billion records stored.– Even if Oracle scales linearly there will not be enough time in the day

to read and process all this data.

Oncor is looking for ways to cut their costs– Oracle is currently using about 1.3 TB to store 90 days worth of data.– They would like to do more with the hardware they have:

• Process the data more quickly to allow other applications to run against the data.

Page 92: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation92

Design Decisions

Why they picked Informix:– Low cost/Low administration well established.– High performance.– Security and High Availability also of interest.

Why they used the native time series support in Informix:– Provides high level of compression.– Provides continuous loading of data with minimal impact on concurrent

queries.– Provides very high degree of parallelism on reads and writes.– Time series tool kit allows custom analytics to be written.– No other RDBMS has native time series support.

Page 93: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation93

Results of the POC with Informix

Informix takes about 18 minutes to read in the data for 1 million meters:– Oracle is taking about 7 hours.

Informix takes about 6 minutes to run each ERCOT report and about 25 seconds if the data is already cached:– Oracle is taking from 2 to 7 hours depending on the report.

Data space used by Informix is about 350GB:– Oracle is using about 1.3TB.

Results were very linear:– Better results if you increase cpus and storage.– If less performance is acceptable then cpus and storage can be

reduced.

Page 94: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation94

Next Steps

Oncor feels they can save a tremendous amount of money with Informix due to:– Reduced storage requirements (350GB vs 1.3TB per million meters

for 90 days).– Ability to do more work on less hardware

• They currently run on Sun hardware.

As of Aug 14, 2010 Oncor has been given a price for an Informix system to handle 3.5 million meters.

Oncor is deciding whether they want to expand Informix to other systems:– Customer Portal.– Some additional billing applications currently running on Oracle.– They are also now considering high availability and whether they

should have one or more secondary servers.

Page 95: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation95

POC Details Simulation:

– 90 days worth of meter data for 1 million meters.– 200 locations.– 500 feeders.– 34 substations.

Hardware used in POC– Power7 with 2 sockets each with 8 cores.– 64 bit SUSE Linux 11.– 128 GB of memory:

• Memory actually needed = 44GB, although could probably be less.– 6 disks dedicated to the database, 2 additional for OS and LSE

staging:• Disk space actually used by the database = about 350GB.

– Additional disks for the operating system and staging area for files.

Software:– Informix Ultimate edition.– Informix Time Series.

Page 96: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation96

Data Generation

We started with one data file (LSE file) with data for 9960 meters.

We duplicated this file by adding a suffix to the meter id (esi_id) and changing the date.

End result is 1 million meters for 90 days. Because we are duplicating data it means aggregates for each

meter looks the same.

Page 97: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation97

Informix Time Series Schema

The Meter table looks like this:

create table meters (esi_id char(64) not

null primary key,suffix char(32),location char(16),feeder char(16),sub_station char(16),dbspace varchar(128),container varchar(128),actual timeseries(meter_data),estimatedtimeseries(meter_data),validtimeseries(update_day)

)

A Meter reading looks like this:

create row type meter_data ( tstamp datetime year to fraction(5), value decimal (14,3));

An update (correction) record looks like:

create row type update_day ( tstamp datetime year to fraction(5), last_update datetime year to fraction(5),);

Hierarchy is sub_station->feeder->meter. There are also tables for location, sub_station and feeder not shown above.

Page 98: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation98

Primary Use Cases

Load 90 days worth of data for 1 million meters from LSE files– Original set of LSE files massaged to generate 1 million distinct meters.

6-day ERCOT Settlement Extract– Show support for the ERCOT settlement processes by creating LSE file

consisting of every record (every meter) for operating day - 6 (calendar day that occurred 6 days prior to current day). Must be able to extract and create the LSE files for 1M meters for a specific day.

22-Day Update ERCOT Settlement Extract– Show support for the ERCOT settlement processes by creating LSE files

consisting of every record that has had a consumption interval record update since the prior extract / pull (6-Day). Only extract the last or most current update for each meter, so if a meter has been updated four times, only the last / current record is sent. The entire 96 15 minute intervals are sent each time as well.

Missing Record ERCOT Settlement Extract– Show support for the ERCOT settlement processes by creating an LSE file

consisting of only the meter IDs and date that is provided in a missing meter ID file from ERCOT. The dates will be as far back as 90 days and no sooner than 28 days back in time.

Page 99: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation99

Other Use Cases

Determine the count and the list of meter ID's for all meters with missing intervals and / or register reads on a given day.

Determine the 90 day history for a given meter (90 record pull for same meter).

Determine the count and list of meter IDs that exceeded a given high interval value for a given day or given time period (multiple days). For example, count and list of meters that had interval value of 12 or higher for a given period of time.

Determine list of meters that have 5 consecutive or more days with estimated values only (no actual interval reads during a 5 day or more period).

Page 100: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation100

Results: Load 90 days worth of data for 1 million meters from LSE files Query syntax:

– Custom loader process to read LSE data.– Uses Smart Meter Loader UDR for high speed loading.

Performance:– 6 Parallel streams - 18 minutes to load 1 million meters for 1 day.

Scalability:– Depends on parallelism of disk write performance. – Increase disks and CPUs for higher speed/more data.

Page 101: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation101

Results: 6-day ERCOT Settlement Extract

Query syntax:– The LSE file is generated from the base query:

    SELECT esi_id,

    daily_output (actual, estimated, $db_start_date) FROM meters;

– daily_output() is a custom UDR that merges the actual & estimated interval data into a single day

Performance:– 6 Streams - 6 minutes 30 second to extract 1 day (1 million

meters)

Scalability:– Depends on parallelism of disk read performance – Increase disks and CPUs for higher speed/more data

Page 102: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation102

Results: 22-Day Update ERCOT Settlement Extract Query syntax:

– Uses standard Timeseries functions and an stored procedure (SPL) routine:    SELECT apply ('"'||trim(esi_id)||'", ($last_update)',

'was_updated($last_update,$tstamp)',        '${day} 00:00:00.00000'::datetime year to fraction(5),       '${day} 00:00:00.00000'::datetime year to fraction(5) +

${period} units day, m.valid)::timeseries(esi_one_day)    FROM meters m WHERE dbspace = 'cont${c}'     AND getnelems (apply ('$last_update',

'was_updated($last_update,$tstamp)',        '${day} 00:00:00.00000'::datetime year to fraction(5),       '${day} 00:00:00.00000'::datetime year to fraction(5) +

${period} units day, m.valid)::timeseries(one_day)) > 0 Performance:

– 22 day - 6 Streams 4 minutes– 90 day - 6 Streams 11 minutes

Scalability:– Depends on period and number of meters.– Increase in speed possible with custom UDR.

Page 103: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation103

Results: Missing Record ERCOT Settlement Extract

Query syntax:– Joins meter+date info to LSE extract query.

Performance:– 4000 readings for a particular day in 6 seconds – when data is

cached.– 4000 readings in 24 seconds for 90 days – when data is not

cached.– Depends on number of meters+days and whether data is in memory

already.– Directly proportional to settlement extract.

Scalability:– Depends on parallelism of disk read performance. – Increase disks and CPUs for higher speed/more data.

Page 104: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation104

Results: Missing Intervals and / or Register Reads on a Given Day

Query syntax:– Uses a custom UDR to join actual and estimated readings.

    SELECT esi_id    FROM meters    WHERE ClipGetCount(tagged_reading (actual, estimated,             '$day 00:00'::datetime year to minute,            '$day 23:45'::datetime year to minute    )::timeseries(tagged_reading)) < 96    AND dbspace = 'cont${c}'

Performance:– 6 Streams - 6 minutes 10 seconds all meters, 1 day

Scalability:– Depends on parallelism of disk read performance. – Increase disks and CPUs for higher speed/more data.

Page 105: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation105

Results: Determine the 90 Day History for a Given Meter

Query– Uses standard Timeseries functions and a user defined aggregate row

type:

    SELECT AggregateBy('sum($value), min($value), max($value), avg($value)',‘cal1day', actual, 0,

       '2010-01-01 00:00:00.00000'::datetime year to fraction(5),   '2010-03-31 23:45:00.00000'::datetime year to fraction(5))       ::timeseries(agg_stats1)    FROM meters    WHERE esi_id = '80937225860964649#1'

Performance:– Single query - 0.04 seconds for 1 meter with 90 day aggregation.

Scalability:– Not investigated, but will depend on whether time series is already in

memory.

Page 106: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation106

Result: Determine the count and list the number of meter IDs that exceeded a given high interval value for a given day Query:

– Uses standard Timeseries functions:

    SELECT Apply ('("'||trim(esi_id)||'") , $value', '(($value) > ${limit})',        '${day} 00:00:00.00000'::datetime year to fraction(5),        '${day} 23:45:00.00000'::datetime year to fraction(5) +

${period} units day, actual )::timeseries(esi_reading)    FROM meters    WHERE getnelems (apply ('$value',            '$value > ${limit}',            '${day} 00:00:00.00000'::datetime year to fraction(5),         '${day} 23:45:00.00000'::datetime year to

fraction(5) + ${period} units day,            actual)::timeseries(meter_data)) > 1    AND dbspace = 'cont${c}'

Performance:– 6 Streams -  5 minutes 40 seconds all meters for 1 day.

Scalability:– Depends on number of days to be assessed.– Depends on parallelism of disk read performance.– Increase disks and CPUs for higher speed/more data.

Page 107: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation107

Results: Determine list of meters that have 5 consecutive or more days with estimated values only Query:

– Joins to a list of required dates.– Filter function is faster but less selective.– Uses standard Timeseries functions and an Stored procedure routine.

    SELECT esi_id, start_date,       ClipGetCount (Apply ('non_null_value ($0.value, $1.value)',start_date,

              extend ((start_date + $days units day), year to minute) - 15 units minute,           estimated, actual)::timeseries(single_reading))   FROM meters, date_listWHERE dbspace = 'cont${c}'    AND ClipGetCount (estimated, start_date,       extend ((start_date + $days units day), year to minute) - 15

units minute) = 480     AND start_date BETWEEN '$start' AND '$end'

Performance:– 6 Streams - 17 minutes for all meters in 5 out of 30 days.

Scalability:– Depends on start/end number of days to be scanned.– Depends on parallelism of disk read performance.– Increase disks and CPUs for higher speed/more data.– Custom UDR would be faster.

Page 108: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation108

Some Notes

Oncor is very enthusiastic about Informix:–They feel having time series support built into an RDBMS is a perfect fit

for smart meter applications.

–They have mentioned they would like to write a white paper with us.

–The GBS team feels Oncor would be very interested in promoting this solution at IOD, for instance in one of the keynotes.

Page 109: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation109

What is TimeSeries Data?

Time series data is:– A set of data where each item is time-stamped

• Think of an array where each element is indexed by time.

“Give me the Jan 1st element from time series “X”

Most useful when a range of data is normally read:

“Give me the Jan 1st thru Jan 10th elements from time series “X”

Access to one time series is usually completed before moving to the next time series.

Page 110: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation110

Key Strengths of Informix TimeSeries

Performance– Extremely fast data access:

• Data clustered on disk to reduce I/O.– Handles operations hard or impossible to do in standard SQL.

Space Savings:– Can be over 50% space savings over standard relational layout.

Toolkit approach allows users to develop their own algorithms:

– Algorithms run in the database to leverage buffer pool for speed.

Conceptually closer to how users think of time series.

Page 111: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation111

Relational Schema for Smart Meters Data

1 Tue Value 1 Value 2 Value N…….

1 Wed Value 1 Value 2 Value N…….

... ... ... ... ...…….

13 Mon Value 1 Value 2 Value N…….

13 Tue Value 1 Value 2 Value N…….

13 Wed Value 1 Value 2 Value N…….

... ... ... ... ...…….

1 Mon Value 1 Value 2 Value N…….

Smart_MetersPrimary Key

Col1 Col2 ColNdatemtr_id

Page 112: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation112

Same Table using an Informix TimeSeries Schema

mtr_id Series

1

2

3

4

(int) timeseries(mtr_data)

Smart_Meters

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

Page 113: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation113

TimeSeries Data is stored outside the Table in “Containers”

Container “A”

Container “B”

Container “C”

mtr_id Series

1

2

3

4

5

6

7

8

(int) timeseries(mtr_data)

Smart_Meters

Page 114: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation114

Each time series has a unique ID. This plus the time stamp is used to search the btree

What a Container Looks Like

Data Pages:

MTR1 MTR1 MTR2 MTR3 MTR4 MTR5Each data page holds sorted data for exactly one time series

BTREE

Page 115: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation115

30 MbTimeseries

43 MbIndex on all columns

28 MbIndex on stock + timestamp

35 MbData

TimeSeries Space Savings Example Time series takes much less space than traditional relational storage

– For instance, for 30 MB of time series storage relational will take either:• 30MB of data + 28MB of index on stock and timestamp = 58MB on disk.• 30MB of data + 43MB of index on all columns = 73MB bytes on disk.

– In either case time series is a huge savings.

The reason for this is that:– The time series does not require a separate index.– Only earliest item on each time series page is indexed.– Missing data takes no space.

Savings could be more if there are many NULL values

Page 116: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation116

Performance:– Slower for singleton selects.– Faster accessing sets of data.– Much faster combining timeseries.

1000

3

1

Aggregate 80,000 1,000 consec.

f(t1, t2)

1,000 consec.Singleton

400

2

1

2

.5

1 1Timeseries

650Traditional (2)

5Traditional (1)

– Traditional(2) = index on key columns– Traditional(1) = index on key and all columns

TimeSeries Performance

(values in table are relative to each other, not in seconds)

Page 117: Scott Pickett – WW Informix Technical Sales

© 2010 IBM Corporation117

Virtual Table Interface Makes Time Series Data Appear Relational

mtr_id Series

1

2

3

4

5

6

7

8

(int) timeseries(mtr_data)

SM_vt

1 Tue Value 1 Value 21 Wed Value 1 Value 2... ... ... ...

3 Mon Value 1 Value 23 Tue Value 1 Value 23 Wed Value 1 Value 2... ... ... ...

1 Mon Value 1 Value 2col_1 col_2datemtr_id

Smart_meter

...

...

...

...

...

...

...

...

TimeSeries Table TimeSeries Virtual Table

Execute procedure tscreatevirtualtable

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)](‘SM_vt’, ‘Smart_meter’);