scott pickett – ww informix technical sales

© 2011 IBM CorporationApril 21, 2023

Scott Pickett – WW Informix Technical SalesScott Pickett – WW Informix Technical SalesFor questions about this presentation contact: [email protected]

TimeSeriesTimeSeries

© 2010 IBM Corporation2

Agenda

Time Series Real Time Loader


Key Points

Describe two complex extenders.

– TimeSeries

– Real Time Loader

Why time series data can be a problem. How TimeSeries solves these problems. Applications. Why streaming data can be a problem. How the RealTime Loader solves these problems.


TimeSeries

TimeSeries:– What is a time series ?

• Regular

• Irregular

– Where they are used ?

– What they are used for ?

– How they are used ?

112

2

3

45

67

8

9

10

11


Regular TimeSeries Data

Timestamps have a regularly repeating pattern of intervals daily, hourly, etc...

Might be breaks in the pattern:– A work week is data captured for 5 days in a row then nothing for 2 days.

Only one piece of data per interval:– If an interval has not been inserted into then it has the value of NULL– Intervals not inserted into occupy some space

• Heuristic: intervals not inserted into at the end of series do not take up space.

Can be thought of as an array. Optimized to return data at offset, not timestamp.


Irregular Time Series Data

Data in an irregular time series does not have a regularly repeating pattern of intervals:– Any interval may have zero or more pieces of data.

Missing data takes no space on disk:– There really is no concept of missing data.

Only efficient way to access data is by timestamp:– You can retrieve the Nth piece of data, but the code does a linear search.

Data can be stair stepped:– Value persists until next value arrives – for example stock prices.

Data can be discrete points:– Value is valid only at the given time – for example heart beats.


The TimeSeries "Problem"

Timeseries are usually stored as "tall – thin" tables with a very large number of rows.

Need fast access to an ordered set of rows. May need one index to enforce uniqueness and

another for index only read, more space used for index than data.

Can be difficult to write SQL to work with the data.


TimeSeries in Traditional RDBMS

Irregular time series of stock "ticks".

Typical volumes are 50M per day and increasing.

Could access by "Stock" and "Timestamp" but often need an index on all columns for index only read.

25012.542006-01-01 09:30:01.70000ABC

10000194.002006-01-01 09:30:01.05000KKK

1012.442006-01-01 09:30:00.19000ABC

10009.342006-01-01 09:30:00.01000XYZ

10012.342006-01-01 09:30:00.00000ABC

VolumePriceTimestampStock

43 MbIndex on all columns

28 MbIndex on stock + timestamp

35 MbData


TimeSeries in Traditional RDBMS

Can be difficult to write SQL to process tick data:

– Applying stock splits.

– Calculating Volume Weighted Average Price (VWAP)

select t.tstamp, t.price * s.fac, t.vol/ s.fact from tick t, split s where t.stock = s.stock and t.tstamp::datetime year to day = s.tstamp and t.stock = "ABC" and t.tstamp::date between ('2006-01-01' and '2006-01-05')

select stock, ststamp::datetime year to hour, sum(price * vol) / sum(vol) from tick where stock = "ABC" and tstamp::date between ('2006-01-01' and '2006-01-05') group by 1, 2


TimeSeries Solution (1 of 7)

Store timeseries elements as an ordered set of elements:– Uses less space because the "key" is factored out and the

time can be represented in 4 bytes (regular).

– Access is as fast (or faster) than index only read but no additional index.

– SQL can be made much simpler.



Store timeseries data as:– Start time.– Calendar information.– Ordered set of elements.

2006-01-01, daily, {(12.34,1234567), (12.56,950000), (12.34,5555567),..}ABC

2003-09-01, daily, {(9.34,890567), (9.56,989000), (9.40,1000780),..}XYZ

2006-05-05, daily, {(199.08,678900), (198.55,345000), (198.12,850250),..}KKK

TicksStock



How a timeseries is stored:– Start time and calendar name.– Index (btree) of the first element in each

page.– Each page contains all the elements

between 2 timestamps.

30 MbTimeseries



35 MbData

– Traditional methods use 2 to 3 times more space or even more depending on the number of NULL values.



Performance:– Slower for singleton selects.– Faster accessing sets of data.– Much faster combining timeseries.

1000

3

1

Aggregate 80,000 1,000 consec.

f(t1, t2)

1,000 consec.Singleton

400

2

1

2

.5

1 1Timeseries

650Traditional (2)

5Traditional (1)

– Traditional(2) = index on key columns.– Traditional(1) = index on key and all columns.



Much simpler SQL, consider the "stock split"/calibration problem:

select t.tstamp, t.price * s.fac, t.vol/ s.fact from tick t, split s where t.stock = s.stock and t.tstamp::datetime year to day = s.tstamp and t.stock = "ABC" and t.tstamp::date between ('2006-01-01' and '2006-01-05')

select func(ticks, splits, '2006-01-01', '2006-01-05') from tick where stock = "ABC"



Much simpler SQL, the VWAP problem:

select stock, vwap(ticks, '2006-01-01', '2006-01-05', '01:00:00') from tick where stock = "ABC"

select stock, tstamp::datetime year to hour, sum(price * vol) / sum(vol) from tick where stock = "ABC" and tstamp::date between ('2006-01-01' and '2006-01-05') group by 1, 2



Other Timeseries features:– Calendar support.

– VTI to make time series look like traditional tables.

– C API.

– Java Class Library.

Easy to combine with own or 3rd party code.


TimeSeries Practicalities (1 of 6)

Calendar Patterns – stored in the CalendarPatterns table

– All time series are associated with these.

– Used to bound the times at which timeseries data is recorded:

• Data recorded during 'on' times.

• Data not recorded during 'off' times.

• Data recorded for the interval specified within the bounds.

• Within the calendar definition.

A Monday to Friday 5 day work week can be represented as 1 day off (Sunday), 5 days on, 1 day off (Saturday): – Insert into CalendarPatterns values

('Five_Day_Work_Week', '{1 off, 5 on, 1 off}, day');

– Note the brackets '{ }' above.

– The trailing day is an interval for a single unit of time.



How do I create a data type that defines each element in a time series ?– Must first create a row type whose first column timestamp

must be a datetime year to fraction(5) value:

• create row type stockVals ( tstamp datetime year to fraction(5),• price decimal(16,2),• high decimal(16,2),• low decimal(16,2),• median decimal(16,2),• volume integer,• trades integer,• …..

• );– Each timestamp must be unique.– Row types cannot contain types:

• blob, clob, text, serial



Once a calendar pattern is established, create the calendar to use it in the table CalendarTable:

• insert into CalendarTable ( c_name, c_calendar) • values ('Trading Week','startdate(2010-01-01

00:00:00.00000), pattstart(2010-01-07 00:00:00.00000), pattname(Five_Day_Work_Week)');

Regular timeseries calendars are used to guide the mapping between the timestamp and the offset

– Represented by Calendar Data Type, consisting of:

• Start date of the calendar.• Calendar repeating pattern during the duration of the data type for the

valid and invalid recording of the data.• Interval – calibration of the calendar pattern.• Starting date of the calendar pattern

– Must be >= Calendar start date and

– Less than one calendar pattern length after calendar start.



Containers are required data structures that hold data for one or more time series.

You cannot mix data for regular and irregular time series in the same container.

How do I create a container to store a time series ?– execute procedure

TSContainerCreate(“ts_ctrname”,”dbspace1”,”stockVals”,40000,40000); A container is made up of index pages and data pages. Containers allow time series data to be spread onto many disk

partitions.



How do I create a table that includes a time series ?– create table dailyStocks (• stockId integer,• abbrev char(4),• stockData timeseries(stockVals));

How do I create a time series record ?– insert into dailyStocks values (901, 'IBM', TSCReate('Trading Week','2010-01-03 00:00:00.00000', 20, 0, 0, 'stockCon'));



How do I add data to a time series ?– update dailyStocks• set stockData = PutElem(stockData, row(NULL::datetime year to fraction(5), 3.3, 4.4, 2.2, 3.0, 123456, 789, NULL)::stockData)

• where stock = 'XYZ';

How do I query a row with a time series ?– select clip(stockData, current – interval(1) month to month, NULL) from stockData;

Get all of IBM since the beginning of the 2001– select clip(series, ‘2001-01-01 00:00:00.00000’, Current) from S_P_500 where name = ‘IBM’;


Building Applications with TimeSeries Several interfaces are available:

– SQL– VTI– SPL– Java– C-API

It’s a toolkit approach! Allow people to build their analytics.


TimeSeries SQL Interface

TimeSeries data is usually accessed through user defined routines (UDR’s) from SQL, some of these are:– Clip() - clip a range of a time series and return it.– LastElem(), FirstElem() - return the last (first) element in the

time series.– Apply() - apply a predicate and expression to a range of

elements in a time series.– AggregateBy() - change the interval of a time series using a

aggregate function.– SetContainerName() - move a time series from one container to

another.– BulkLoad() - load data into a time series from a file.

See Appendix A for a list of all of SQL Interface UDR's.


TimeSeries VTI Interface

Makes time series data look like standard relational data:– Useful for programs that can’t handle objects.– Useful when application connects with ODBC.

There is a small (10%) penalty for using VTI. Restrictions:

– A VTI table can only reference one time series column from the base table.

– No secondary indices are allowed. SQL to create a VTI table:

– execute procedure tscreatevirtualtab(‘t_vti’, ‘t’);


TimeSeries C-API Interface

Client and server versions of the API.

Treats a time series like a table (sort of). Functions to:

– Open and close a time series. – Scan a time series between 2 timestamps.– Create a time series.– Retrieve, insert, delete, update.

Plus another 70 functions defined. See Appendix C for a list of these.


Stored Procedure (SPL) Example

-- count non-null elements in a time series create function spl_test(arg lvarchar) returning integer define var daybar; define cnt integer; let cnt = 0; foreach execute function transpose((select series from S_P_500 where name = arg)) into var let cnt = cnt + 1; end foreach return cnt; end function;


TimeSeries Application

Existing applications:– Finance (Capital Markets).

– Some in Manufacturing and Environmental.

– Smart Metering / Electrical Grid Usage

Possible applications:– Telco.

• Real Time Cell Phone Tower Data

• Sales/sales analysis.

– Energy• Oil Field Geology data.

– Water/Sewer System Usage– Satellite Telemetry – Assembly Line Q/A & Analysis– Your application ?


Real Time Loader


RealTime Loader (1) Next problem after time series

"problem" solved. Large number of ticks generated

at market open/close and in reaction to events.

Money is made at these times. Need to make the data available

as soon as possible.

Ticks per minute

0

10,000

20,000

30,000

40,000

50,000

60,000

Time(hours)Ti

cks

per m

inut

e

Edison Stock Ticker - 1892


RealTime Loader (2)

Traditionally data can be loaded very quickly by loading it in batches. This causes problems:– It takes time for the data to become available.

– The table can be locked while the data is loaded.

The RealTime Loader solves this problem by holding the data in memory and making it available to the SQL layer as if it was already in the database:– Allows real-time analysis of data as it come thru the feeds

• Uncommitted and committed data reads are possible.


RealTime Loader Architecture

Data Feeds

RTL MemoryRTL RTL

Database Memory

Database Server

Client


RealTime Loader Performance (1 of 3)

Load performance of traditional systems depends on the system but rates of 1,000s of ticks per second are reasonable but there is a high latency as the data is batched and loaded.

With the RTL the data is made accessible within a few ms. There are 2 load rates to consider:– How fast the data can be added to shared memory.– How fast the data can be loaded into the database.



How fast can the data be added to shared memory ?– Rates of 10,000s per cpu sec are common.

How fast can the data can be loaded into the database ?– This is roughly the same as the traditional RDBS solution.

The system can run at the "top" rate until the shared memory allocated to RTL is full.



The peaks don’t last very long, the system can be sized for something just above the average rate.

Traditional systems have to be sized to handle the peak rate or the user has to accept a high latency during the crucial market open/close periods.

Ticks per minute

0

10,000

20,000

30,000

40,000

50,000

60,000

Time(hours)

Tic

ks p

er m

inu

te


RTL Data Access

Access can be through the usual SQL layer. Even faster access through an API that gets the data

directly from the shared memory segment. The SQL layer can see all the ticks, or just those in

RTL, or just those on the database.

Ticks in RTL Shared Memory

Ticks in Database


RealTime Loader Analytics

The RealTime Loader can do some simple aggregation and statistics on the incoming data making this available to SQL or through the API:– VWAP - Volume Weighted Moving Average Price

• = total amount spent on a security / total number of shares traded.

• This needs all of the share purchase history stored unbounded by time.

– EWAP – Exponentially Weighted Moving Average Price• A share price moving average where previous moving average

data is considered but dampened by the effect of the new, more recent moving average.

• Only needs 3 pieces of data:– the old and new MAP data.

– plus the damping factor.

– max, min etc.


RealTime Loader Applications

Stock ticker trades data analysis. Gaming Telco:

– Cell Phone Tower Call Analysis Apps.– Cell Phone Account Apps.

24 x 7 Satellite Telemetry Data Applications:

– Millions of records per minute ....... Geological Analysis Applications:

– Oil Exploration Data. Land Resource Analysis Applications. Water/Sewer Flow Data


Conclusion

TimeSeries allows time series data to be:– Efficiently stored.– Quickly accessed.– Conveniently queried.

The RealTimeLoader allows time series data to be:– Quickly stored.– Accessed by applications with the minimum of delay.

Appendix D has more information on a recent successful POC and customer win worth a read.


Questions?


Time Series Improvements 11.70


Time Series – Delete Performance Improvements

Delete Performance Improvements – When a user deletes rows from a table that contain a TimeSeries, the

TimeSeries needs to be deleted. Since they can be very large, improvements on the delete performance is important to provide a good response time to the users. Performance improves as more records are deleted:• < 30,000 --> 2X faster• 100,000 --> 3X• 1,000,000 --> 18X• 3,000,000 --> 101X


Time Series – Container Improvements

Container Improvements:– A TimeSeries is stored in a container. We removed the requirement of

having to identify or even create containers when inserting a new TimeSeries into a table. This makes using TimeSeries more user-friendly.

– The container name limit was raised from 18 characters to 128 characters to allow customers to be more descriptive in the naming of their containers.

– Container statistics functions were added so that System Administrators can get a better view of their space utilization and

therefore can better plan for future upgrades.


Container Pools

New column (pool) in the function TSContainerTable.

autopool: default pool of containers created by auto-create container.

Create containers (as before):TSContainerCreate(<container_name>, <dbspace>,

<initial_size_kb>, <next_size_kb>);

Add a container to a pool:TSContainerSetPool(<container_name>, <pool_name>);– Pool created automatically.

– A container can be in only one pool at a time. Remove a container from its current pool:

TSContainerSetPool(<container_name>);


Container Selection

Search autopool for a set of candidate containers.

Select the container with the greatest amount of free space.


Pool Policy: TSContainerPoolRoundRobin

Select a container from a pool in a round robin manner and return the name.

Example*: INSERT INTO smartmeters(meter_id,rawreadings)

VALUES ('met00001','origin(2006-01-01 00:00:00.00000), calendar(smartmeter),regular,threshold(0),

container(TSContainerPoolRoundRobin(readings)), [(33070,-13.00,100.00,9.98e+34),

(19347,-4.00,100.00,1.007e+35), (17782,-18.00,100.00,9.83e+34)]');

Position kept in memory.

* Note: the call to TSContainerPoolRoundRobin is a string.


Pool Policy: TSContainerPoolRoundRobin

function TSContainerPoolRoundRobin( tabname lvarchar, colname lvarchar, subtype lvarchar, irregular integer, poolname lvarchar)returns lvarchar;

tabname: Table name where the TS value is being inserted. colname: TS column name where a TS value is being inserted. subtype: Subtype TS value (row type) being inserted. irregular: Kind of TS (0 – regular, 1 – irregular, integer). poolname: Name of the container pool to use.


Pool Policy: Create your own

function <pool_policy_name>( tabname lvarchar, colname lvarchar, subtype lvarchar, irregular integer [,user_data lvarchar] )returns lvarchar;

Tailor the container selection policy to fit your environment.


Time Series – New Container Functions (1)

All functions take a container name or NULL– If NULL, cover all containers

TSContainerTotalPages– Total number of pages allocated to a container.

TSContainerTotalUsed– Number of pages containing TimeSeries data.

TSContainerNElems– Total number of elements stored in a container.

TSContainerPctUsed– (used * 100) / total

TSContainerUsage– Returns used, nelems, and total


Time Series – New Container Functions (2)EXECUTE FUNCTION TSContainerTotalUsed(NULL);

pages 2029

EXECUTE FUNCTION TSContainerTotalPages(NULL); total 2169

EXECUTE FUNCTION TSContainerNElems(NULL); elements 241907

EXECUTE FUNCTION TSContainerPctUsed(NULL); percent 93.545

EXECUTE FUNCTION TSContainerUsage(NULL); pages slots total 2029 241907 2169


Time Series – VTI Improvements

Virtual Table Interface (VTI) Improvements– The VTI view of a table containing a TimeSeries now has more

complete information to the user on the column’s characteristics: • Allow nulls in non-primary keys.• More than one “view” can be created on a TimeSeries.• Expression pushdown (keyword substitution).

A VTI view can include pre-processing with TimeSeries functions that will give the user the exact view he wants to see in a relational format while getting the performance of the TimeSeries API functions. – Example:

• Creating an hourly moving average. • Getting the max value per day.


TS VTI Improvements

Make the NOT NULL constraint on base table visible through VTI.

Use only the primary key columns to locate the appropriate row to update:

– Benefit: Allow NULLs in non-primary key columns.– If the primary key of the base table is altered, any TS VTI tables

based on that base table will need to be recreated.

– If no primary key, use all non-TS columns. New TSVTMode flags

– TS_VTI_UPDATE_NONKEY_NOT_NULLS (32)

• Only not null updatable columns of the insert row will be added to the set clause.

– TS_VTI_UPDATE_NONKEY_INCLUDE_NULLS (64)• All the updateable columns of the insert row will be added to the set

clause.


VTI Expression Pushdown

Provide pre-processed views of TimeSeries

Example:“Create a hourly moving average and get the max value per day”(Table: smartmeters, column: readings)

EXECUTE PROCEDURE TSCreateExpressionVirtualTab( 'smartmeters_expr1_vti', 'smartmeters', "AggregateBy('max($value)','smartmeter_daily', Apply('TSRunningAvg($energy, 4)', $ts_begin_time, $ts_end_time,readings )::TimeSeries(onereal_row), 0)",'onereal_row');


VTI Expression Pushdown: Result

SELECT * FROM smartmeters_expr1_vti WHERE t >= '2011-01-01 00:00:00.00000'::datetime year to

fraction(5) AND t <= '2011-01-02 23:59:59.99999'::datetime year to

fraction(5);

meter_id t value

met00000 2011-01-01 00:00:00.00000 30.25000000000met00000 2011-01-02 00:00:00.00000 29.50000000000met00001 2011-01-01 00:00:00.00000 29.75000000000met00001 2011-01-02 00:00:00.00000 31.00000000000. . .


VTI: TSCreateExpressionVirtualTab

More than one “view” can be created on a TimeSeries.

Very similar to TSCreateVirtualTab but a read-only interface.

TSCreateExpressionVirtualTab( VirtualTable VARCHAR(128,1), BaseTablename VARCHAR(128,1), expression LVARCHAR, subtype VARCHAR(128,1) TSVTMode INTEGER DEFAULT 0 TSColName VARCHAR(128,1) DEFAULT NULL);


VTI Expression Pushdown

Keyword substitution: – $ts_begin_time, $ts_end_time

Used to substitute the time from the runtime query in the expression view:

Example expression: Apply('TSRunningAvg($energy, 4)', $ts_begin_time, $ts_end_time, readings )::TimeSeries(onereal_row)

Query including:

WHERE t >= '2011-01-01 00:00:00.00000'::datetime year to fraction(5) AND t <= '2011-01-02 23:59:59.99999'::datetime year to fraction(5);


Time Series – All IDS Platform Support

Increased Platform Support– Customers now have more choices when it comes to platform support

to run Informix with TimeSeries:• Redhat Linux 32 and 64 bit• Mac OS 10• Linux on pSeries• Sol x86_64

– TimeSeries is now built in for all platforms for IDS 11.70xC3:• https://www-304.ibm.com/support/docview.wss?rs=630&uid=swg27013343&

S_CMP=rnav

112

2

3

45

67

8

9

10

11


Time Series – New Standard Calendars

General Usage Improvements– Customers don't need to define commonly used calendars. They

can now use new, built-in, predefined calendars based on the following intervals of time:

Calendar Interval Calendar Name

1 minute ts_1min

15 minutes ts_15min

30 minutes ts_30min

1 hour ts_1hour

1 day ts_1day

1 week ts_1week

1 month ts_1month


Time Series - XML Generation (1)

Customers can now use a standard function format to exchange messages using XML. With this capability, they can easily transform TimeSeries data into XML messages.

Syntax:

CREATE FUNCTION TSToXML( doctype lvarchar, id lvarchar, ts TimeSeries, output_max integer DEFAULT 0)RETURNS lvarchar;


Time Series - XML Generation (2)

doctype: Name of the topmost XML element id: Identification for the TimeSeries ts: TimeSeries to process output_max: Max output size, 0 default represents 32KB,

minimum of 4096, maximum of 2GB

Output additional element: <AllData>

<AllData>1</AllData>: All data was returned <AllData>0</AllData>: The TS data was truncated to fit the

output max.

Still a valid XML document.


XML Generation Example

SELECT TSToXML('meterdata', esi_id, Clip(actual, '2010-09-08 12:00:00'::datetime year to second, '2010-09-08 13:00:00'::datetime year to second) ) FROM meters WHERE esi_id = '22505613346851835';

<meterdata> <id>2250561334</id> <AllData>1</AllData> <meter_data> <tstamp>2010-09-08 12:15:00.00000</tstamp> <value>0.9170000000</value> </meter_data> <meter_data> . . .</meterdata>


Time Series (7)

The ‘stores demo’ database has been updated with smart meter information so that customers now have a ready to use environment to experiment with TimeSeries and increase their understanding of the technology.

Multi-Cultural Support:

– Time Series now supports multiple locales.

AggregateBy accumulator too small:– Accumulators for smaller integer types have been increased for the

AVG function to prevent overflow.

– This is a bug fix.


Questions?


Logo


Appendix A - Time Series Built In Routines


Appendix – TimeSeries Built In Routines (1 of 9)Task Type Description Routine Name

Get info from a timeseries. Get the origin. GetOrigin

Get the interval. GetInterval

Get the calendar. GetCalendar

Get the calendar name. GetCalendarName

Get the container name. GetContainerName

Get the user defined metadata. GetMetaData

Get the metadata type. GetMetaTypeName

Determine whether a time series is regular.

IsRegular

Get the instance Id if the timeseries is stored in a container.

InstanceId

Convert between a timestamp and an offset.

Return the offset, given the timestamp. GetIndex

Return the timestamp, given the offset. GetStamp

Count the number of elements.

Return the number of elements. GetNelems

Get the number of elements between two timestamps

ClipGetCount


Appendix - Time Series Built in Routines (2 of 9) Select Individual elements. Get the element associated with a

timestamp.GetElem

Get the element at or before a timestamp.

GetLastValid

Get the element after a timestamp. GetNextValid

Get the element before a timestamp. GetPreviousValid

Get the element at a specified position.

GetNthvalid

Get the first element. GetFirstElem

Get the last element. GetLastElem

Get the last non null element. GetLastNonNull

Get the next non null element. GetNextNonNull


Appendix - Time Series Built in Routines (3 of 9) Modify elements or a set of elements.

Add or update a single element. PutElem

Add or update a single element. PutElemNoDups

Add or update a single element at a given offset.

PutNthElem

Add or update an entire set. PutSet

Delete an element at a given timepoint. DelElem

Delete all elements in a specified time range. DelClip

Delete all elements in a specified time range. DelTrim

Insert an element. InsElem

Insert a set. InsSet

Update an element. UpdElem

Update a set. UpdSet

Put every element of one time series into another time series.

PutTimeSeries

Modify metadata. Update user-defined metadata. UpdMetadata


Appendix - Time Series Built in Routines (4 of 9) Make Elements visible or invisible to a scan.

Make an element invisible. HideElem

Make a range of elements invisible. HideRange

Make an element invisible. RevealElem

Make a range of elements visible. RevealRange

Check for null or hidden elements.

Determine if an element is hidden. ElemIsHidden

Determine if an element is null. ElemIsNull

Extract and use part of a time series.

Extract a period between two timestamps to a set of values and run an expression or function on every entry.

Apply

Extract data between two timepoints. Clip

Clip a certain number of elements. ClipCount

Extract a period that includes a given time. WithinC

Extract a period starting or ending at a given time.

WithinR

Apply a new calendar to a time series.

Apply a calendar. ApplyCalendar


Appendix - Time Series Built in Routines (5 of 9) Create and load time series. Load data from a client file. BulkLoad

Create a regular empty time, a regular populated time series, or a regular time series with metadata.

TsCreate

Create an irregular empty time, an irregular populated time series, or an irregular time series with metadata.

TsCreateIrr

Find the intersection or union of time series.

Build the intersection of multiple time series and optionally clip the result.

Intersect

Build the union of multiple time series and optionally clip the result.

Union

Iterator functions. Convert time series data to tabular form. Transpose

Aggregate Functions. Return a list (collection of rows) containing all elements in a time series.

TSSetToList

Return a list of columns in a timeseries. TSColNameToList

Return a list of columns in a timeseries. TSColNumToList

Return a list containing the columns of a time series plus non time series columns.

TSRowToList


TsRowNameToList


TsRowNumToList


Appendix - Time Series Built in Routines (6 of 9) Used within the apply function to perform statistical calculations on a time series.

Perform a sum over a time series type. Sum

Sum SMALLFLOAT or DOUBLE PRECISION values.

TSAddPrevious

Compute the decay function. TSDecay

Compute a running average over a specified number of values.

TSRunningArg

Compute a running correlation between two time series over a specified number of values.

TsRunningCor

Compute a running median over a specified number of values.

TsRunningMed

Compute a running sum over a specified number of values.

TSRunningSum

Compute a running variance over a specified number of values.

TSRunningVar

Compare SMALLFLOAT or DOUBLE PRECISION values.

TSCmp

Return a previously saved value. TSPrevious


Appendix - Time Series Built in Routines (7 of 9)Perform an arithmetic operation on one or two time series.

Add two time series together. Plus

Subtract one time series from another. Minus

Multiply one time series by another. Times

Divide one time series by another. Divide

Raise the first argument to the power of the second.

Pow

Get the absolute value. Abs

Exponentiate the time series. Exp

Get the natural logarithm of a time series. Logn

Get the modulus or remainder of a division of one time series by another.

Mod

Return the argument, is bound to the unary + operator.

Negate

Round the time series to the nearest whole number.

Positive


Appendix - Time Series Built in Routines (8 of 9)Perform an arithmetic operation on one or two time series (cont'd).

Round the time series to the nearest whole number.

Round

Get the square root of the time series. Sqrt

Get the cosine of the time series. Cos

Get the sine of the time series. Sin

Get the tangent of the time series. Tan

Get the arc cosine of the time series. Acos

Get the arc sine of the time series. Asin

Get the arc tangent of the time series. Atan

Get the arc tangent for two time series. Atan2

Apply a binary function to a pair of time series, or to a time series and a compatible row type or number.

ApplyBinaryTsOp

Apply a unary function to a time series. ApplyUnaryTsOp

Apply another function to a set of time series.

ApplyOpTsSet

Reset the origin. Reset the origin. SetOrigin


Appendix - Time Series Built in Routines (9 of 9)

Following routines are used only with regular time series:– Lag

– PutNthElem

– TSCreate

The TSCreateIrr function is only used with regular time series.

Aggregate values in a time series.

Aggregate values in a time series. AggregateBy

Aggregate values in a time series over a specified time range.

AggregateRange

Create a time series that lags. Create a time series that lags the source time series by a given offset.

Lag (regular only)

Manage containers. Create a container. TsContainerCreate

Destroy a container. TsContainerDestroy

Set the container name. SetContainerName


Appendix B – Calendar Pattern Routines


Appendix - Calendar Pattern Routines Return the intersection of two calendar patterns.

Calendar pattern that has every interval on that was on in both calendar patterns; the rest are off. If the given patterns do not have the same interval unit, the pattern with the larger interval unit is expanded to match the other.

AndOp

Obtain the start date of the pattern for a given calendar.

Returns the start dates of the calendar patterns for each calendar. CalPattStartDate

Convert a calendar from a smaller calendar into a larger calendar pattern.

Collapses the given calendar pattern into destination units, which must have a larger interval unit than that of the given calendar pattern, i.e. Converting hours to days.

Collapse

Convert a calendar from a larger calendar into a smaller calendar pattern.

Converts the given calendar pattern into the destination units, which must have a smaller interval unit than that of the given calendar pattern.

Expand

Return an inverted calendar intervals pattern.

Turns all on intervals off and all off intervals on in the given calendar pattern.

NotOp

Returns the union of the two calendar patterns.

This function returns a calendar pattern that has every interval on that was on in either of the calendar patterns; the rest are off. If the two patterns have different sizes of interval units, the resultant pattern has the smaller of the two intervals.

OrOp


Appendix – Calendar RoutinesReturn the intersection of two calendars.

Returns a calendar that has every interval on that was on in both calendars; the rest are off. The resultant calendar takes the later of the two start dates and the later of the two pattern start dates. If the two calendars have different size interval units, the resultant calendar has the smaller of the two intervals.

AndOp

Return the number of valid intervals in the named calendar between the two timestamps.

CalIndex

Return a set of valid timestamps within a range.

There are two syntaxes: the first syntax specifies the range as between two given timestamps. The second syntax specifies the number of valid timestamps to return after a given timestamp.

CalRange

Return the timestamp at a given number of calendarintervals after a given timestamp.

CalStamp

Returns a DATETIME value containing the start date of a given calendar name.

CalStartDate

Return a calendar that is the result of two others combined with the OR operator

This function returns a calendar that has every interval on that was on in either calendar; the rest are off. The resultant calendar takes the earlier of the two start dates and the two pattern start dates. If the two calendars have different sizes of interval units, the resultant calendar has the smaller of the two intervals.

OrOp


Appendix C - Time Series API Information


Appendix - Time Series API's Application Programmer Interfaces for a Time Series: Two types -

– Client version – tsfeapi

• Sample Location: $INFORMIXDIR/extend/TimeSeries.4.07.UC7/lib/tsfeapi.a (Unix/linux)

• Similar directory for Windows.• Operates on a private copy of timeseries datum.• Changed data must be written back into the database.

– Function ts_copy is used prior to the execution of a prepared statement.

– Server version – tsbeapi

• Sample Location: $INFORMIXDIR/extend/TimeSeries.4.07.UC7/lib/tsbeapi.a (Unix/Linux)

• Similar directory for Windows.• Operates directly on timeseries datum.

– Direct execution of prepared/unprepared statements.

Header file – tseries.h - – Must be included in header whenever using bsfeapi or tsfeapi


Appendix - Time Series API – Data Structures

ts_timeseries – the header for a time series data. ts_tscan – allows you to look at no more than two time

series elements at a time. ts_tsdesc – A structure containing a time series and the

data structures it works with. ts_tselem – A pointer to on element of a time series.


Appendix – Time Series API Routines Task type Description API Routine SQL Routine

Open and close a time series. Open a time series. ts_open()

Close a time series. ts_close()

Return a pointer to the time series associated with the given time series descriptor.

ts_get_ts()

Create and copy a time series. Create a time series. ts_create() TSCreate, TSCreateIrr

Create a time series with metadata.

ts_create_with_metadata()

TSCreate, TSCreateIrr

Copy a time series. ts_copy()

Free all memory associated with a time series created with ts_copy() or ts_create().

ts_free()

Copy all elements of one time series into another.

ts_put_ts() PutTimeSeries


Appendix – Time Series API Routines (cont'd)

Task type Description API Routine SQL Routine

Scan a time series. Scan a time series. ts_begin_scan()

Retrieve the next element from a scan.

ts_next()

End a scan. ts_end_scan()

Find the timestamp of the last element returned by the ts_next().

ts_current_timestamp()

Return the offset for the last element returned by ts_next().

ts_current_offset()

Make elements visible or invisible to a scan.

Make an element invisible.

ts_hide_elem() HideElem

Make an element visible.

ts_reveal_elem() RevealElem




Select individual elements from a time series.

Get the element closest to a given timestamp.

ts_closest_elem()

Get the element associated with a given timestamp.

ts_elem() GetElem

Get the element at a specified position.

ts_nth_elem() GetNthElem

Get the first element. ts_first_elem() GetFirstElem

Get the last element. ts_last_elem() GetLastElem

Find the next element after a given timestamp.

ts_next_valid() GetNextValid

Find the last element before a given timestamp.

ts_previous_valid() GetPreviousValid

Find the last element at or before a given timestamp.

ts_last_valid()




Update a time series. Insert an element. ts_ins_elem() InsElem

Update an element. ts_upd_elem() UpdElem

Delete an element. ts_del_elem() DelElem

Put an element in a place specified by a timestamp.

ts_put_elem()ts_put_elem_no_dups()

PutElemPutElemNoDups

Append an element. ts_put_last_elem()(regular only)

Put an element in a place specified by an offset.

ts_put_nth_elem()(regular only)

Modify metadata. Update metadata. ts_update_metadata() UpdMetaData

Convert between an index and a timestamp.

Convert timestamp to index.

ts_index() GetIndex

Put an element in a place specified by an offset.

ts_time() GetStamp


Appendix – Time Series API Routines (cont'd)Task type Description API Routine SQL Routine

Transform an element. Create an element from an array of values and nulls.

ts_make_elem()ts_make_elem_rowdesc()ts_make_elem_no_dups()

Convert an MI_ROW value to an element.

ts_row_to_elem()

Convert an element to a MI_ROW value.

ts_elem_to_row()

Free memory from a time series element created by ts_make_elem() or ts_row_to_elem()

ts_free_elem()

Extract column data from an element.

Get a column from an element by name.

ts_get_col_by_name()

Get a column from an element by number.

ts_get_col_by_number()

Pull columns from an element into values and nulls arrays.

ts_get_all_cols()

Create and perform calculations with timestamps.

Compare two timestamps. ts_datetime_cmp()

Get fields from a timestamp. ts_get_stamp_fields()

Create a timestamp. ts_make_stamp()




Create and perform calculations with timestamps (cont'd).

Calculate the number of intervals between two timestamps.

ts_tstamp_difference()

Subtract N intervals from a timestamp.

ts_row_to_elem()

Add N intervals to a timestamp.

ts_elem_to_row()

Get information about element data.

Find the number of a column. ts_col_id()

Return the number of columns contained in each element.

ts_get_col_cnt()

Get type information for a column specified by number.

ts_get_colinfo_number()

Get type information for a column specified by name.

ts_get_colinfo_name()

Determine if an element is hidden.

ts_ELEM_HIDDEN

Determine if an element is NULL.

ts_ELEM_NULL()




Get information about a timeseries.

Get the name of calendar associated with a timeseries.

ts_get_calname() GetCalendar Name

Return the number of elements in a timeseries.

ts_nelems() GNelems

Return the flags associated with the timeseries.

ts_get_flags()

Get the name of the container.

ts_get_containername() GetContainerName

Determine if the timeseries is in a container.

TS_IS_INCONTAINER()

Get the origin of the timeseries.

ts_get_origin() GetOrigin

Get the metadata associated with the timeseries.

ts_get_metadata() GetMetaData

Determine if the timeseries is irregular.

TS_IS_IRREGULAR()




Get information about a calendar.

Return the number of valid intervals between two timestamps.

ts_cal_index() CalIndex

Return all valid timepoints between two timestamps.

ts_cal_range() CalRange

Return a specified number of timestamps starting at a given timestamp.

ts_cal_range_Index()

Return the timestamp at a given number of intervals after a given timestamp.

ts_cal_stamp() CalStamp


Appendix D – Oncor POC Win Details and a bit more on TimeSeries


About Oncor and their Solution Who is Oncor?

Oncor is a regulated electric distribution and transmission business that uses superior asset management skills to provide reliable electricity delivery to consumers. Oncor operates the largest distribution and transmission system in Texas, providing power to 3 million electric delivery points over more than 120,000 miles of distribution and 14,000 miles of transmission lines.

Current Solution– Architected by IBM GBS:

• Primary architect Keith Hall.– A number of different components used including Oracle.– They currently manage about 1 million smart meters.– Every meter generates one record every 15 minutes.– ERCOT (Energy Regulatory Council of Texas) requires data be kept

for 25 months.

25 months * 1 million meters every 15 minutes = 73 billion records

(from the Oncor website)


Why Did they Contact Us?

Oracle is barely keeping up with the data– It is taking Oracle about 7 hours to read in the data for 1 million

meters.– The ERCOT reports are taking between 2 and 7 hours to run.– Some problems with reading and writing data simultaneously.

Oncor is planning to expand– In the next 1 to 2 years they expect to manage 3.5 million meters:

• Their 73 billion records stored becomes 220 billion records stored.– Even if Oracle scales linearly there will not be enough time in the day

to read and process all this data.

Oncor is looking for ways to cut their costs– Oracle is currently using about 1.3 TB to store 90 days worth of data.– They would like to do more with the hardware they have:

• Process the data more quickly to allow other applications to run against the data.


Design Decisions

Why they picked Informix:– Low cost/Low administration well established.– High performance.– Security and High Availability also of interest.

Why they used the native time series support in Informix:– Provides high level of compression.– Provides continuous loading of data with minimal impact on concurrent

queries.– Provides very high degree of parallelism on reads and writes.– Time series tool kit allows custom analytics to be written.– No other RDBMS has native time series support.


Results of the POC with Informix

Informix takes about 18 minutes to read in the data for 1 million meters:– Oracle is taking about 7 hours.

Informix takes about 6 minutes to run each ERCOT report and about 25 seconds if the data is already cached:– Oracle is taking from 2 to 7 hours depending on the report.

Data space used by Informix is about 350GB:– Oracle is using about 1.3TB.

Results were very linear:– Better results if you increase cpus and storage.– If less performance is acceptable then cpus and storage can be

reduced.


Next Steps

Oncor feels they can save a tremendous amount of money with Informix due to:– Reduced storage requirements (350GB vs 1.3TB per million meters

for 90 days).– Ability to do more work on less hardware

• They currently run on Sun hardware.

As of Aug 14, 2010 Oncor has been given a price for an Informix system to handle 3.5 million meters.

Oncor is deciding whether they want to expand Informix to other systems:– Customer Portal.– Some additional billing applications currently running on Oracle.– They are also now considering high availability and whether they

should have one or more secondary servers.


POC Details Simulation:

– 90 days worth of meter data for 1 million meters.– 200 locations.– 500 feeders.– 34 substations.

Hardware used in POC– Power7 with 2 sockets each with 8 cores.– 64 bit SUSE Linux 11.– 128 GB of memory:

• Memory actually needed = 44GB, although could probably be less.– 6 disks dedicated to the database, 2 additional for OS and LSE

staging:• Disk space actually used by the database = about 350GB.

– Additional disks for the operating system and staging area for files.

Software:– Informix Ultimate edition.– Informix Time Series.


Data Generation

We started with one data file (LSE file) with data for 9960 meters.

We duplicated this file by adding a suffix to the meter id (esi_id) and changing the date.

End result is 1 million meters for 90 days. Because we are duplicating data it means aggregates for each

meter looks the same.


Informix Time Series Schema

The Meter table looks like this:

create table meters (esi_id char(64) not

null primary key,suffix char(32),location char(16),feeder char(16),sub_station char(16),dbspace varchar(128),container varchar(128),actual timeseries(meter_data),estimatedtimeseries(meter_data),validtimeseries(update_day)

)

A Meter reading looks like this:

create row type meter_data ( tstamp datetime year to fraction(5), value decimal (14,3));

An update (correction) record looks like:

create row type update_day ( tstamp datetime year to fraction(5), last_update datetime year to fraction(5),);

Hierarchy is sub_station->feeder->meter. There are also tables for location, sub_station and feeder not shown above.


Primary Use Cases

Load 90 days worth of data for 1 million meters from LSE files– Original set of LSE files massaged to generate 1 million distinct meters.

6-day ERCOT Settlement Extract– Show support for the ERCOT settlement processes by creating LSE file

consisting of every record (every meter) for operating day - 6 (calendar day that occurred 6 days prior to current day). Must be able to extract and create the LSE files for 1M meters for a specific day.

22-Day Update ERCOT Settlement Extract– Show support for the ERCOT settlement processes by creating LSE files

consisting of every record that has had a consumption interval record update since the prior extract / pull (6-Day). Only extract the last or most current update for each meter, so if a meter has been updated four times, only the last / current record is sent. The entire 96 15 minute intervals are sent each time as well.

Missing Record ERCOT Settlement Extract– Show support for the ERCOT settlement processes by creating an LSE file

consisting of only the meter IDs and date that is provided in a missing meter ID file from ERCOT. The dates will be as far back as 90 days and no sooner than 28 days back in time.


Other Use Cases

Determine the count and the list of meter ID's for all meters with missing intervals and / or register reads on a given day.

Determine the 90 day history for a given meter (90 record pull for same meter).

Determine the count and list of meter IDs that exceeded a given high interval value for a given day or given time period (multiple days). For example, count and list of meters that had interval value of 12 or higher for a given period of time.

Determine list of meters that have 5 consecutive or more days with estimated values only (no actual interval reads during a 5 day or more period).


Results: Load 90 days worth of data for 1 million meters from LSE files Query syntax:

– Custom loader process to read LSE data.– Uses Smart Meter Loader UDR for high speed loading.

Performance:– 6 Parallel streams - 18 minutes to load 1 million meters for 1 day.

Scalability:– Depends on parallelism of disk write performance. – Increase disks and CPUs for higher speed/more data.


Results: 6-day ERCOT Settlement Extract

Query syntax:– The LSE file is generated from the base query:

SELECT esi_id,

daily_output (actual, estimated, $db_start_date) FROM meters;

– daily_output() is a custom UDR that merges the actual & estimated interval data into a single day

Performance:– 6 Streams - 6 minutes 30 second to extract 1 day (1 million

meters)

Scalability:– Depends on parallelism of disk read performance – Increase disks and CPUs for higher speed/more data


Results: 22-Day Update ERCOT Settlement Extract Query syntax:

– Uses standard Timeseries functions and an stored procedure (SPL) routine: SELECT apply ('"'||trim(esi_id)||'", ($last_update)',

'was_updated($last_update,$tstamp)', '${day} 00:00:00.00000'::datetime year to fraction(5), '${day} 00:00:00.00000'::datetime year to fraction(5) +

${period} units day, m.valid)::timeseries(esi_one_day) FROM meters m WHERE dbspace = 'cont${c}' AND getnelems (apply ('$last_update',

'was_updated($last_update,$tstamp)', '${day} 00:00:00.00000'::datetime year to fraction(5), '${day} 00:00:00.00000'::datetime year to fraction(5) +

${period} units day, m.valid)::timeseries(one_day)) > 0 Performance:

– 22 day - 6 Streams 4 minutes– 90 day - 6 Streams 11 minutes

Scalability:– Depends on period and number of meters.– Increase in speed possible with custom UDR.


Results: Missing Record ERCOT Settlement Extract

Query syntax:– Joins meter+date info to LSE extract query.

Performance:– 4000 readings for a particular day in 6 seconds – when data is

cached.– 4000 readings in 24 seconds for 90 days – when data is not

cached.– Depends on number of meters+days and whether data is in memory

already.– Directly proportional to settlement extract.

Scalability:– Depends on parallelism of disk read performance. – Increase disks and CPUs for higher speed/more data.


Results: Missing Intervals and / or Register Reads on a Given Day

Query syntax:– Uses a custom UDR to join actual and estimated readings.

SELECT esi_id FROM meters WHERE ClipGetCount(tagged_reading (actual, estimated, '$day 00:00'::datetime year to minute, '$day 23:45'::datetime year to minute )::timeseries(tagged_reading)) < 96 AND dbspace = 'cont${c}'

Performance:– 6 Streams - 6 minutes 10 seconds all meters, 1 day

Scalability:– Depends on parallelism of disk read performance. – Increase disks and CPUs for higher speed/more data.


Results: Determine the 90 Day History for a Given Meter

Query– Uses standard Timeseries functions and a user defined aggregate row

type:

SELECT AggregateBy('sum($value), min($value), max($value), avg($value)',‘cal1day', actual, 0,

'2010-01-01 00:00:00.00000'::datetime year to fraction(5), '2010-03-31 23:45:00.00000'::datetime year to fraction(5)) ::timeseries(agg_stats1) FROM meters WHERE esi_id = '80937225860964649#1'

Performance:– Single query - 0.04 seconds for 1 meter with 90 day aggregation.

Scalability:– Not investigated, but will depend on whether time series is already in

memory.


Result: Determine the count and list the number of meter IDs that exceeded a given high interval value for a given day Query:

– Uses standard Timeseries functions:

SELECT Apply ('("'||trim(esi_id)||'") , $value', '(($value) > ${limit})', '${day} 00:00:00.00000'::datetime year to fraction(5), '${day} 23:45:00.00000'::datetime year to fraction(5) +

${period} units day, actual )::timeseries(esi_reading) FROM meters WHERE getnelems (apply ('$value', '$value > ${limit}', '${day} 00:00:00.00000'::datetime year to fraction(5), '${day} 23:45:00.00000'::datetime year to

fraction(5) + ${period} units day, actual)::timeseries(meter_data)) > 1 AND dbspace = 'cont${c}'

Performance:– 6 Streams - 5 minutes 40 seconds all meters for 1 day.

Scalability:– Depends on number of days to be assessed.– Depends on parallelism of disk read performance.– Increase disks and CPUs for higher speed/more data.


Results: Determine list of meters that have 5 consecutive or more days with estimated values only Query:

– Joins to a list of required dates.– Filter function is faster but less selective.– Uses standard Timeseries functions and an Stored procedure routine.

SELECT esi_id, start_date, ClipGetCount (Apply ('non_null_value ($0.value, $1.value)',start_date,

extend ((start_date + $days units day), year to minute) - 15 units minute, estimated, actual)::timeseries(single_reading)) FROM meters, date_listWHERE dbspace = 'cont${c}' AND ClipGetCount (estimated, start_date, extend ((start_date + $days units day), year to minute) - 15

units minute) = 480 AND start_date BETWEEN '$start' AND '$end'

Performance:– 6 Streams - 17 minutes for all meters in 5 out of 30 days.

Scalability:– Depends on start/end number of days to be scanned.– Depends on parallelism of disk read performance.– Increase disks and CPUs for higher speed/more data.– Custom UDR would be faster.


Some Notes

Oncor is very enthusiastic about Informix:–They feel having time series support built into an RDBMS is a perfect fit

for smart meter applications.

–They have mentioned they would like to write a white paper with us.

–The GBS team feels Oncor would be very interested in promoting this solution at IOD, for instance in one of the keynotes.


What is TimeSeries Data?

Time series data is:– A set of data where each item is time-stamped

• Think of an array where each element is indexed by time.

“Give me the Jan 1st element from time series “X”

Most useful when a range of data is normally read:

“Give me the Jan 1st thru Jan 10th elements from time series “X”

Access to one time series is usually completed before moving to the next time series.


Key Strengths of Informix TimeSeries

Performance– Extremely fast data access:

• Data clustered on disk to reduce I/O.– Handles operations hard or impossible to do in standard SQL.

Space Savings:– Can be over 50% space savings over standard relational layout.

Toolkit approach allows users to develop their own algorithms:

– Algorithms run in the database to leverage buffer pool for speed.

Conceptually closer to how users think of time series.


Relational Schema for Smart Meters Data

1 Tue Value 1 Value 2 Value N…….

1 Wed Value 1 Value 2 Value N…….

... ... ... ... ...…….

13 Mon Value 1 Value 2 Value N…….

13 Tue Value 1 Value 2 Value N…….

13 Wed Value 1 Value 2 Value N…….

... ... ... ... ...…….

1 Mon Value 1 Value 2 Value N…….

Smart_MetersPrimary Key

Col1 Col2 ColNdatemtr_id


Same Table using an Informix TimeSeries Schema

mtr_id Series

1

2

3

4

…

(int) timeseries(mtr_data)

Smart_Meters

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

…


TimeSeries Data is stored outside the Table in “Containers”

Container “A”

Container “B”

Container “C”

mtr_id Series

1

2

3

4

5

6

7

8


Smart_Meters


Each time series has a unique ID. This plus the time stamp is used to search the btree

What a Container Looks Like

Data Pages:

MTR1 MTR1 MTR2 MTR3 MTR4 MTR5Each data page holds sorted data for exactly one time series

BTREE


30 MbTimeseries



35 MbData

TimeSeries Space Savings Example Time series takes much less space than traditional relational storage

– For instance, for 30 MB of time series storage relational will take either:• 30MB of data + 28MB of index on stock and timestamp = 58MB on disk.• 30MB of data + 43MB of index on all columns = 73MB bytes on disk.

– In either case time series is a huge savings.

The reason for this is that:– The time series does not require a separate index.– Only earliest item on each time series page is indexed.– Missing data takes no space.

Savings could be more if there are many NULL values


Performance:– Slower for singleton selects.– Faster accessing sets of data.– Much faster combining timeseries.

1000

3

1

Aggregate 80,000 1,000 consec.

f(t1, t2)

1,000 consec.Singleton

400

2

1

2

.5

1 1Timeseries

650Traditional (2)

5Traditional (1)

– Traditional(2) = index on key columns– Traditional(1) = index on key and all columns

TimeSeries Performance

(values in table are relative to each other, not in seconds)


Virtual Table Interface Makes Time Series Data Appear Relational

mtr_id Series

1

2

3

4

5

6

7

8


SM_vt

1 Tue Value 1 Value 21 Wed Value 1 Value 2... ... ... ...

3 Mon Value 1 Value 23 Tue Value 1 Value 23 Wed Value 1 Value 2... ... ... ...

1 Mon Value 1 Value 2col_1 col_2datemtr_id

Smart_meter

...

...

...

...

...

...

...

...

TimeSeries Table TimeSeries Virtual Table

Execute procedure tscreatevirtualtable

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)](‘SM_vt’, ‘Smart_meter’);

scott pickett – ww informix technical sales

Documents