kdb

27
Contents Articles Startingkdbplus/introduction 1 Startingkdbplus/qlanguage 3 Startingkdbplus/ipc 12 Startingkdbplus/tables 14 Startingkdbplus/hdb 18 Startingkdbplus/rdb 21 References Article Sources and Contributors 24 Image Sources, Licenses and Contributors 25 Article Licenses License 26

Upload: elfty

Post on 09-Nov-2014

253 views

Category:

Documents


6 download

DESCRIPTION

kdb

TRANSCRIPT

Page 1: kdb

ContentsArticles

Startingkdbplus/introduction 1Startingkdbplus/qlanguage 3Startingkdbplus/ipc 12Startingkdbplus/tables 14Startingkdbplus/hdb 18Startingkdbplus/rdb 21

ReferencesArticle Sources and Contributors 24Image Sources, Licenses and Contributors 25

Article LicensesLicense 26

Page 2: kdb

Startingkdbplus/introduction 1

Startingkdbplus/introduction

1.1 OverviewThis is a quick start guide to kdb+ from Kx Systems, aimed primarily at those learning independently. It coverssystem installation, the q environment, q ipc, kdb+ tables and typical databases, and where to find more material.After completing this you should be able to follow the Borror tutorials Q for Mortals and Kdb+ For Mortals, and thewiki Reference and Cookbooks pages.One caution: you can learn kdb+ reasonably well by independent study, but for serious evaluation of the product youneed the help of a kdb+ consultant. This is because kdb+ is typically used for very demanding applications thatrequire experience to set up properly. Contact Kx Systems or one of its partners for help with such evaluations.

1.2 Kdb+The kdb+ system is both a database and a programming language:

kdb+ the database (k database plus).q the programming language for working with kdb+

Both kdb+ and q are written in the k programming language. You do not need to know k to work with kdb+, but willoccasionally see references to it. For example, q is defined in the distributed script q.k.

1.3 Resources

Kx wikiThe Kx wiki is the best resource for learning kdb+, and includes:• Jeff Borror's tutorials Q for Mortals and Kdb+ For Mortals.• a cookbook of common tasks• a reference on the built-in functions• an svn repository with user and Kx contributed code.

Kx Html PagesSome older, but still useful, html pages are at kx.com/documentation.php [1]. See in particular, Dennis Shasha'sKdb+ Database and Language Primer [2].

Other Web Pages• the Knowledge_Base Kdb [3] has a good overview

Discussion groups• the main discussion forum is the k4 listbox [4]. This is available only to licensed customers - please use a work

email address to apply for access.• the Kdb+ Personal Developers [5] forum is an open Google discussion group for users of the trial system.

Page 3: kdb

Startingkdbplus/introduction 2

Additional FilesThe kx.com/q [6] directory has various supporting files, for example the script sp.q referenced in this guide (which isalso included with the trial system). These files are also copied to the svn repository, so for example, the sp.q scriptcan also be found at kdb+.

1.4 Install Trial SystemIf you do not already have access to a licensed copy of kdb+, then get the 32-bit trial version fromkx.com/Developers [7]. This is limited to a 32-bit address space, a 2 hour timeout and expiry every 90 days.Otherwise, it is a complete system and ideal for learning kdb+.When unzipping the install, take care to retain the folder structure.

1.5 Directory LayoutKdb+ can be installed anywhere, but typically into a q subdirectory. For example:in Linux/Mac:

~/q / main q directory (under home)

~/q/l32 / location of l32 executable

in Windows:

c:\q / main q directory

c:\q\w32 / location of w32 executable

If you need to install q elsewhere, set the environment variable QHOME to point to the new directory. If QHOME isnot defined, kdb+ defaults to $HOME/q for unix-based systems, and c:\q for Windows.To run the system, see instructions in the next section, 2. Q Language.

1.6 Example FilesTwo sets of scripts are referenced in this guide:1. The trial system is distributed with the following example scripts in the main directory:•• sp.q - the Suppliers and Parts sample database•• trade.q - a stock trades sample databaseIf you do not have these scripts, get them from kx.com/q [6] and save in your q directory.2. Other examples are in the svn start directory. To install, download start.zip and unzip in the q directory, creatingdirectory q/start.

Page 4: kdb

Startingkdbplus/introduction 3

1.7 GUIkdb+ has only a console interface, but there are some GUIs:• the most popular is Charlie Skelton's studio for kdb+, a cross-platform execution environment. It is worth having

this available even if you use another interface.• First Derivatives [8] offer their clients the qIDE development system• Q and K Development Tools [9] has an eclipse plugin• Q Insight Pad [10] is an IDE for Windows• Qconsole is an IDE using GTK

Prev: Table of Contents Next: 2. Q LanguageTable of Contents

References[1] http:/ / kx. com/ documentation. php[2] http:/ / www. kx. com/ q/ d/ primer. htm[3] http:/ / www. thalesians. com/ finance/ index. php/ Knowledge_Base/ Databases/ Kdb[4] http:/ / www. listbox. com/ subscribe/ ?listname=k4[5] http:/ / groups. google. com/ group/ personal-kdbplus[6] http:/ / www. kx. com/ q[7] http:/ / kx. com/ Developers/ software. php[8] http:/ / www. firstderivatives. com[9] http:/ / www. qkdt. org[10] http:/ / www. qinsightpad. com

Startingkdbplus/qlanguage

2.1 OverviewQ is the programming system for working with kdb+. This corresponds to SQL for traditional databases, but unlikeSQL, q is a powerful programming language in its own right.Q is an interpreted language. Q expressions can be entered and executed in the q console, or loaded from a q script,which is a text file with extension .q.You need at least some familiarity with q to use kdb+. Try following the examples here in the q console interface.Also, ensure that you have the example files installed.The following wiki pages will also be useful:•• Function Reference•• Data Types•• System Commands•• Command Line Parameters

Page 5: kdb

Startingkdbplus/qlanguage 4

2.2 Loading qYou load q by changing to the main q directory, then running the q executable. You should not just click the qexecutable from the explorer - this will load q but in the wrong directory.It is best to create a startup script which might do other preprocessing such as setting environment variables, seeexamples q.sh and q.bat in the start [1] directory.•• in Windows, enter in a command window:

c:

cd \q

w32\q.exe

or create a batch file with contents below that allows parameters to be passed to q:

c:

cd \q

w32\q.exe %*

• in Linux/Mac, it is usual to call the q executable under rlwrap to support line recall and edit.In the console:

..$ cd ~/q

~/q$ rlwrap l32/q

The following Linux shell script changes to the q directory, sets the appropriate directory for 32 or 64 bit, then loadsq under rlwrap:

#!/bin/bash

cd ~/q

if [ "x86_64" == uname -m ]; then p=l64; else p=l32; fi

rlwrap $p/q "$@"

2.3 First StepsOnce q is loaded, you can enter expressions for execution:

q)2 + 3

5

q)2 + 3 4 7

5 6 9

You can confirm that you are in the main q directory by calling a directory list command, e.g.•• in Windows:

q)\dir *.q

"sp.q"

"trade.q"

...

•• in Linux/Mac:

q)\ls *.q

"sp.q"

"trade.q"

Page 6: kdb

Startingkdbplus/qlanguage 5

...

Command line parameters are given here. For example:

... q profile.q -p 5001

•• loads script profile.q at startup. This can in turn load other scripts.•• sets listening port to 5001At any prompt, enter \\ to exit q.

2.4 Console ModesThe usual prompt is q). Sometimes a different prompt is given; you need to understand why this is, and how toreturn to the standard prompt.1. If a function is suspended, then the prompt has two or more ")". In this case, enter a single \ to reduce one level ofsuspension, and repeat until the prompt becomes q). For example:

q)f:{2+x} / define function f

q)f `sym / function call fails with symbol argument

{2+x} / and is left suspended

'type

+

2

`sym

q))\ / prompt becomes q)). Enter \ to return to usual prompt

q)

2. If there is no suspension, then a single \ will toggle q and k modes:

q)count each (1 2;"abc") / q expression for length of each list item

2 3

q)\ / toggle to k mode

#:'(1 2;"abc") / equivalent k expression

2 3

\ / toggle back to q mode

q)

3. If you change namespace, then the prompt includes the namespace, see namespace directory. For example:

q)\d .h / change to h namespace

q.h)\d . / change back to root namespace

q)

Page 7: kdb

Startingkdbplus/qlanguage 6

2.5 Error MessagesError messages are terse. The format is a single quote, followed by error text:

q)1 2 + 10 20 30 / cannot add 2 numbers to 3 numbers

'length

q)2 + "hello" / cannot add number to character

'type

2.6 Introductory ExamplesTo gain experience with the language, enter the following examples and explain the results. Also experiment withsimilar expressions. The / marks a comment, which should not be entered.

q)x:2 5 4 7 5

q)x

2 5 4 7 5

q)count x

5

q)8 # x

2 5 4 7 5 2 5 4

q)2 3 # x

2 5 4

7 5 2

q)sum x

23

q)sums x

2 7 11 18 23

q)distinct x

2 5 4 7

q)reverse x

5 7 4 5 2

q)x within 4 10

01111b

q)x where x within 4 10

5 4 7 5

q)y:(x;"abc") / list of lists

q)y

2 5 4 7 5

"abc"

q)count y

2

q)count each y

5 3

The following has a function definition, where x represents the argument:

q)f:{2 + 3 * x}

q)f 5

17

Page 8: kdb

Startingkdbplus/qlanguage 7

q)f til 5

2 5 8 11 14

Q makes essential use of a symbol datatype:

q)a:`toronto / symbol

q)b:"toronto" / character string

q)count a

1

q)count b

7

q)a="o"

`type

q)b="o"

0101001b

q)a~b / a is not the same as b

0b

q)a~`$b / `$b converts b to symbol

1b

2.7 Data StructuresQ basic data structures are atoms (singletons) and lists. Other data structures like dictionaries and tables are builtfrom lists. For example, a simple table is just a list of column names associated with a list of corresponding columnvalues, each of which is a list.

q)item:`nut / atom (singleton)

q)items:`nut`bolt`cam`cog / list

q)sales: 6 8 0 3 / list

q)prices: 10 20 15 20 / list

q)(items;sales;prices) / list of lists

nut bolt cam cog

6 8 0 3

10 20 15 20

q)dict:`items`sales`prices!(items;sales;prices) / dictionary

q)dict

items | nut bolt cam cog

sales | 6 8 0 3

prices| 10 20 15 20

q)tab:flip dict / table

q)tab

items sales prices

------------------

nut 6 10

bolt 8 20

Page 9: kdb

Startingkdbplus/qlanguage 8

cam 0 15

cog 3 20

q)1!tab / keyed table

items| sales prices

-----| ------------

nut | 6 10

bolt | 8 20

cam | 0 15

cog | 3 20

The table created above is an ordinary variable in the q workspace, and could be written to disk. In general, youcreate kdb+ tables in memory and then write to disk.Since it is a table, you can use SQL-like query expressions on it:

q)select from tab where prices < 20

items sales prices

------------------

nut 6 10

cam 0 15

2.8 Functions, Verbs, AdverbsFunctions take arguments on their right. Verbs take arguments on left and right, as in * (multiplication). Adverbstake function or verb arguments (on their left) and produce derived functions or verbs. In practice, the term functionis used for both functions and verbs, except where the distinction is relevant. For example:

q)sales * prices / verb: *

60 160 0 60

q)sum sales * prices / function: sum

280

q)sumamt:{sum x*y} / define function: sumamt

q)sumamt[sales;prices]

280

q)(sum sales*prices) % sum sales / calculate weighted average

16.47059

q)sales wavg prices / built-in verb: wavg

16.47059

q)sales , prices / verb: , join lists

6 8 0 3 10 20 15 20

q)sales ,' prices / adverb: ' join lists in pairs

6 10

8 20

0 15

3 20

Functions can apply to dictionaries and tables:

Page 10: kdb

Startingkdbplus/qlanguage 9

q)-2 # tab

items sales prices

------------------

cam 0 15

cog 3 20

Functions can be used within queries:

q)select items,sales,prices,amount:sales*prices from tab

items sales prices amount

-------------------------

nut 6 10 60

bolt 8 20 160

cam 0 15 0

cog 3 20 60

2.9 ScriptsA q script is a plain text file with extension .q, which contains q expressions that are executed when loaded.For example, load the script sp.q and display the "s" table that it defines:

q)\l sp.q / load script

q)s / display table s

s | name status city

--| -------------------

s1| smith 20 london

s2| jones 10 paris

s3| blake 30 paris

s4| clark 20 london

s5| adams 30 athens

Within a script, a line that contains a single / starts a comment block. A line with a single \ ends the comment block,or if none, exits the script.A script can contain multi-line definitions. Any line that is indented is assumed to be a continuation of the previousline. Blank lines, superfluous blanks, and lines that are comments (begin with /) are ignored in determining this. Forexample, if a script has contents:

a:1 2

/ this is a comment line

3

+ 4

b:"abc"

Then loading this script would define a and b as:

q)a

5 6 7 / i.e. 1 2 3 + 4

Page 11: kdb

Startingkdbplus/qlanguage 10

q)b

"abc"

2.10 Q QueriesQ queries are similar to SQL, though often much simpler:

\l sp.q

q)select from p where weight=17

p | name color weight city

--| ------------------------

p2| bolt green 17 paris

p3| screw blue 17 rome

SQL statements can be entered, if prefixed with s)

q)s)select * from p where color in (red,green) / SQL query

p | name color weight city

--| -------------------------

p1| nut red 12 london

p2| bolt green 17 paris

p4| screw red 14 london

p6| cog red 19 london

The Q equivalent would be:

q)select from p where color in `red`green

Similarly, compare:

q)select distinct p,s.city from sp

s)select distinct sp.p,s.city from sp,s where sp.s=s.s

and

q)select from sp where s.city=p.city

s)select sp.s,sp.p,sp.qty from s,p,sp where sp.s=s.s

and sp.p=p.p and p.city=s.city

Note that the dot notation in q automatically references the appropriate table.Q results can have lists in the rows:

q)select qty by s from sp

s | qty

--| -----------------------

s1| 300 200 400 200 100 400

s2| 300 400

s3| ,200

s4| 100 200 300

ungroup will flatten the result:

Page 12: kdb

Startingkdbplus/qlanguage 11

q)ungroup select qty by s from sp

s qty

------

s1 300

s1 200

s1 400

s1 200

...

Calculations can be performed on the intermediate results:

q)select countqty:count qty,sumqty:sum qty by p from sp

p | countqty sumqty

--| ---------------

p1| 2 600

p2| 4 1000

p3| 1 400

p4| 2 500

p5| 2 500

p6| 1 100

Prev: 1. Introduction Next: 3. Q IPCTable of Contents

References[1] http:/ / code. kx. com/ wsvn/ code/ contrib/ cburke/ start

Page 13: kdb

Startingkdbplus/ipc 12

Startingkdbplus/ipc

3. Q IPC

3.1 OverviewA production kdb+ system may have several q processes, possibly on several machines. These communicate viatcp/ip. Any q process can communicate with any other q process as long as it is accessible on the network and islistening for connections.• a server process listens for connections and processes any requests• a client process initiates the connection and sends commands to be executedClient and server can be on the same machine or on different machines. A process can be both a client and a server.A communication can be synchronous (wait for a result to be returned) or asynchronous (no wait and no resultreturned).

3.2 Initialize ServerA q server is initialized by specifying the port to listen on, with either a command line parameter or a sessioncommand:

..$ q -p 5001 / command line

q)\p 5001 / session command

3.3 Communication HandleA communication handle is a symbol that starts with : and has the form:

`:[server]:port

where the server is optional, and port is a port number. The server need not be given if on the same machine.Examples:

`::5001 / server on same machine as client

`:genie:5001 / server on machine genie

`:198.168.1.56:5001 / server on given IP address

`:www.example.com:5001 / server at www.example.com

The function hopen starts a connection, and returns an integer connection handle. This handle is used for allsubsequent client requests. For example:

q)h:hopen `::5001

q)h "3?20"

1 12 9

q)hclose h

Page 14: kdb

Startingkdbplus/ipc 13

3.4 Synchronous/AsynchronousWhere the connection handle is used as defined (it will be a positive integer), the client request is synchronous. Inthis case, the client waits for the result from the server before continuing execution. The result from the server is theresult of the client request.Where the negative of the connection handle is used, the client request is asynchronous. In this case, the request issent to the server, but the client does not wait or get a result from the server. This is done when a result is notrequired by the client.For example:

q)h:hopen `::5001

q)(neg h) "a:3?20" / send asynchronously, no result

q)(neg h) "a" / again no result

q)h "a" / synchronous, with result

0 17 14

3.5 Message FormatsThere are two message formats:•• a string containing any q expression to be executed on the server• a list (function; arg1; arg2; ...) where the function is to be applied with the given argumentsFor example:

q)h "1 2 3 +/ 10 20" / send q expression

31 32 33

q)h (+/;1 2 3;10 20) / send function and its arguments

31 32 33

3.6 Http ConnectionsA qserver can also be accessed via http. To try this, run a q server on your machine with port 5001. Then, load a webbrowser, and go to http:/ / localhost:5001 [1]. You can now see the names defined in the base context.

Prev: 2. Q Language Next: 4. Kdb+ TablesTable of Contents

References[1] http:/ / localhost:5001

Page 15: kdb

Startingkdbplus/tables 14

Startingkdbplus/tables

4.1 OverviewA basic understanding of the internal structure of tables is needed to work with kdb+. The structure is actually quitesimple, but very different from conventional databases.This section gives a quick overview, followed by an explanation of the sp.q script, and then a typical table for stockdata. After completing this, you should read the page kdbplus database, which has a detailed comparison of kdb+ andconventional RDBMS.Kdb+ tables are created out of lists. A table with no key columns is essentially a list of column names associatedwith a list of corresponding column values, each of which is a list. A table with key columns is internally built froma pair of tables - the key columns associated with the non-key columns.Kdb+ tables are created in-memory, and then written to disk if required. When written to disk, smaller tables can bestored in a single file, while larger tables are usually partitioned in some way. The partitioning can be seen whenviewing the file directories, but the table is treated as a single object within a q process.

4.2 Creating TablesThere are two ways of creating a table. One way explicitly associates lists of column names and data; the other usesa q expression that specifies the column names and initial values. The second method also permits the each column'sdatatype to be given, and so is particularly useful when a table is created with no data.•• create table by association:

q)tab:flip `items`sales`prices!(`nut`bolt`cam`cog;6 8 0 3;10 20 15 20)

q)tab

items sales prices

------------------

nut 6 10

bolt 8 20

cam 0 15

cog 3 20

•• create table by specifying column names and initial values:

q)tab2:([]items:`nut`bolt`cam`cog;sales:6 8 0 3;prices:10 20 15 20)

q)tab~tab2 / tab and tab2 are identical

1b

The form for the second method, for a table with j primary keys and n columns in total, is:t:([c

1:v

1;...;c

j:v

j]c

j+1:v

j+1;...;c

n:v

n)

Here table t is defined with column names ci, and corresponding values vi. The square brackets are for primary keys,and are required even if there are no primary keys.

Page 16: kdb

Startingkdbplus/tables 15

4.3 Suppliers and PartsThe script sp.q defines C.J. Date's Suppliers and Parts database. You can view this script in an editor to see thedefinitions. Load the script with:

q)\l sp.q

Table sTable s has a primary key column, also called s, given as a list of symbols which should be unique. Note that in thisexample, the name "s" is used both for the table and the primary key column, but this is not required.The remaining columns are of type symbol, integer, symbol.

s:([s:`s1`s2`s3`s4`s5]

name:`smith`jones`blake`clark`adams;

status:20 10 30 20 30;

city:`london`paris`paris`london`athens)

Display in q:

q)s

s | name status city

--| -------------------

s1| smith 20 london

s2| jones 10 paris

s3| blake 30 paris

s4| clark 20 london

s5| adams 30 athens

Note that the column types are set from the data given. If this were first created as an empty table, say table "t", thenthe column types could be defined explicitly as follows:

q)t:([s:`$()]name:`$();status:"I"$();city:`$())

Insert a row:

q)`t insert (`s1;`smith;20;`london)

,0

q)t

s | name status city

--| -------------------

s1| smith 20 london

Table pTable p is created much like table s. As before, the table name and primary key name are both the same:

p:([p:`p1`p2`p3`p4`p5`p6]

name:`nut`bolt`screw`screw`cam`cog;

color:`red`green`blue`red`blue`red;

weight:12 17 17 14 12 19;

city:`london`paris`rome`london`paris`london)

Display in q:

Page 17: kdb

Startingkdbplus/tables 16

q)p

p | name color weight city

--| -------------------------

p1| nut red 12 london

p2| bolt green 17 paris

p3| screw blue 17 rome

p4| screw red 14 london

p5| cam blue 12 paris

p6| cog red 19 london

Table spTable sp is defined with no primary key. Columns s and p reference tables s and p respectively as foreign keys. Thesyntax for specifying another table's primary key as a foreign key is:

`tablename$data

The definition of sp is:

sp:([]

s:`s$`s1`s1`s1`s1`s4`s1`s2`s2`s3`s4`s4`s1;

p:`p$`p1`p2`p3`p4`p5`p6`p1`p2`p2`p2`p4`p5;

qty:300 200 400 200 100 100 300 400 200 200 300 400)

Display in q:

q)sp

s p qty

---------

s1 p1 300

s1 p2 200

s1 p3 400

s1 p4 200

s4 p5 100

...

4.4 Stock DataThe following is a typical layout populated with random data. The definitions are in script trades.q in the start [1]

directory.Load as:

q)\l start/trades.q

A trade table might include: date, time, symbol, price, size, condition code:

q)trades:([]date:`date$();time:`time$();sym:`symbol$();

price:`real$();size:`int$(); cond:`char$())

q)`trades insert (2010.02.21;10:03:54.347;`IBM;20.83e;40000;"N")

q)`trades insert (2010.02.21;10:04:05.827;`MSFT;88.75e;2000;"B")

q)trades

Page 18: kdb

Startingkdbplus/tables 17

date time sym price size cond

---------------------------------------------

2010.02.21 10:03:54.347 IBM 20.83 40000 N

2010.02.21 10:04:05.827 MSFT 88.75 2000 B

The ? verb will generate random data:

q)syms:`IBM`MSFT`UPS`BAC`AAPL

q)tpd:100 / trades per day

q)day:5 / number of days

q)cnt:count syms / number of syms

q)len:tpd*cnt*day / total number of trades

q)date:2010.02.21+len?day

q)time:"t"$raze (cnt*day)#enlist 09:30:00+15*til tpd

q)time+:len?1000

q)sym:len?syms

q)price:len?100e

q)size:100*len?1000

q)cond:len?" ABCDENZ"

q)`trades:0#trades / empty trades table

q)`trades insert (date;time;sym;price;size;cond)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

..

q)trades:`date`time xasc trades / sort on time within date

q)5#trades

date time sym price size cond

------------------------------------------------

2010.02.21 09:30:00.766 UPS 70.38 46900 A

2010.02.21 09:30:00.801 IBM 89.24799 58600 N

2010.02.21 09:30:00.942 UPS 38.4812 54600 A

2010.02.21 09:30:15.116 IBM 25.56824 55700 A

2010.02.21 09:30:15.224 MSFT 75.97006 23800 E

-- Prev: 3. Q IPC Next: 5. Historical DatabaseTable of Contents

Page 19: kdb

Startingkdbplus/hdb 18

Startingkdbplus/hdb

5.1 OverviewA historical database (hdb) holds data before today, and its tables would be stored on disk as being much too large tofit in memory. Each new day's records would be added to the hdb at the end of day.Typically, large tables in the hdb (such as daily tick data) are stored splayed, i.e. each column is stored in its ownfile, see cookbook/splayed tables and kdb+formortals/splayed. Typically also, large tables are stored partitioned bydate. Very large databases may be further partitioned into segments, using par.txt.These storage strategies give best efficiency for searching and retrieval. For example, the database can be writtenover several drives. Also, partitions can be allocated to slave threads so that queries over a range of dates can be runin parallel. The exact set up would be customized for each installation.For example, a simple partitioning scheme on a single disk might be as follows. Here, the daily and master tables aresmall enough to be written to single files, while the trade and quote tables are splayed and partitioned by date:

5.2 Sample DatabaseThe script buildhdb.q in the start [1] directory will build a sample hdb. This builds a month's random data in directorystart/db, and takes a few seconds to run. To do so, load q then:

q)\l start/buildhdb.q

To load the database, either start q with an argument of the database directory:

..$ q start/db

or load the database within a q session:

q)\l start/db

In q (actual values may vary):

q)count trade

342102j

Page 20: kdb

Startingkdbplus/hdb 19

q)count quote

1709919j

q)t:select from trade where date=last date, sym=`IBM

q)count t

1041

q)5#t

date time sym price size stop cond ex

---------------------------------------------------

2010.12.31 09:30:00.055 IBM 55.65 74 0 N N

2010.12.31 09:30:00.114 IBM 55.66 72 1 W N

2010.12.31 09:30:01.970 IBM 55.56 37 0 T N

2010.12.31 09:30:03.091 IBM 55.56 41 1 R N

2010.12.31 09:30:06.930 IBM 55.57 89 0 B N

q)select count i by date from trade

date | x

----------| -----

2010.12.01| 14991

2010.12.02| 14705

2010.12.03| 14817

2010.12.06| 14877

...

q)select[5] cnt:count i,sum size,last price,

wprice:size wavg price by 5 xbar time.minute from t

minute| cnt size price wprice

------| -----------------------

09:30 | 42 2138 55.24 55.37768

09:35 | 22 1329 55.32 55.35988

09:40 | 23 1279 55.2 55.25091

09:45 | 16 716 54.99 55.13633

09:50 | 24 1187 54.82 54.83702

q)select[-5] open:first price,lo:min price,hi:max price,

close:last price by 10 xbar time.minute from t

minute| open lo hi close

------| -----------------------

15:10 | 55.64 55.43 55.64 55.56

15:20 | 55.56 55.54 55.95 55.95

15:30 | 55.88 55.61 55.99 55.74

15:40 | 55.81 55.8 56.18 55.86

15:50 | 55.84 55.84 56.5 56.38

Page 21: kdb

Startingkdbplus/hdb 20

5.3 Sample Segmented DatabaseThe buildhdb.q script can be customized to build a segmented database. In practice, database segments should be onseparate drives, but for illustration, the segments are here written to a single drive. Both the database root, and thelocation of the database segments need to be specified.For example, edit the first few lines of the script as follows:

dst:`:start/dbs / new database root

dsp:`:/dbss / database segments directory

dsx:5 / number of segments

bgn:2007.01.01 / set 4 years data

end:2010.12.31

Ensure that the directory given in dsp is created, writable and empty, then load the modified script, which shouldnow take a minute or so. This should write the partioned data to subdirectories of dsp, and create a par.txt file like:

/dbss/d0

/dbss/d1

/dbss/d2

/dbss/d3

/dbss/d4

Restart q, and load the segmented database:

q)\l start/dbs

q)(count quote), count trade

81258538 16248124j

q)select cnt:count i,sum size,size wavg price from trade

where date in 2007.11.19+til 5, sym=`IBM

cnt size price

--------------------

4213 227283 47.12981

Prev: 4. Kdb+ Tables Next: 6. Realtime DatabaseTable of Contents

Page 22: kdb

Startingkdbplus/rdb 21

Startingkdbplus/rdb

6.1 OverviewA real-time database (rdb) stores today's data. Typically, it would be stored in memory during the day, and writtenout to the historical database (hdb) at end of day. Storing the rdb in memory results in extremely fast update andquery performance.As a minimum, it is recommended to have RAM of at least 4 times expected data size, so for 5 GB data per day, therdb machine should have at least 20 GB RAM. In practice, much larger RAM might be used.

6.2 Data FeedsData feeds can be any market or other time series data. A feedhandler converts the data stream into a format suitablefor writing to kdb+. These are usually written in a compiled language, such as c or c++.In the example described here, the data feed is generated at random by a q process.

6.3 TickerplantThe data feed could be written directly to the rdb. More often, it is written to a q process called a tickerplant, whichmay run several actions whenever data is received, for example:•• write all incoming records to a log file•• push all data to the rdb•• push all or subsets of the data to other processes•• run any other q code that should be executed as new data arrivesOther processes would subscribe to a tickerplant to receive new data, and each would specify what data should besent (all or a selection).The kdb+tick [1] product from Kx is a tickerplant that is recommended for production systems with large volumes ofreal time data.

6.4 ExampleThe scripts in start/tick [2] run a simple tickerplant/rdb configuration. Note that they are not suitable for productionuse (no logging, error handling, end of day roll over etc).The layout is:

feed

|

tickerplant

/ / | \ \ \

rdb vwap hlcv tq last show

/\ /\ /\ /\ /\

... client applications ...

Here:feed is a demo feedhandler, that generates random trades and quotes and sends them to the tickerplant. In practice,this would be replaced by real feedhandlers.The tickerplant gets data from feed and pushes it to clients that have subscribed. Once the data is written, it isdiscarded.

Page 23: kdb

Startingkdbplus/rdb 22

The rdb, vwap, hlcv, tq and last processes are databases that have subscribed to the tickerplant. Note that thesedatabases can be queried by a client application.•• rdb has all of today's data•• vwap has volume weighted averages for selected stocks•• hlcv has high, low, close, volume for selected stocks•• tq has a trade and quote table for selected stocks. Each row is a trade joined with the most recent quote.•• last has the last entries for each stock in the trade and quote tablesThe show process displays the incoming feed for selected stocks.Note that all the client processes load the same script file cx.q, with a parameter that selects the corresponding codefor the process in that file. Alternatively, each process could load its own script file, but since the definitions tend tobe very short, it is convenient to use a single script for all. See c.q [3] for more examples (written for kdb+tick).

6.5 Running the DemoThe start/tick [2] scripts run the demo, which should display each q process in a separate window. If necessary,update for the actual directories used.In Windows, call start/tick/run.bat. In !Linux/Gnome, call start/tick/run.sh. In any other system, either modify thescripts for your environment or start the processes manually, see next section.The calls starting each process are essentially:1. tickerplant - the parameter ticker.q is the script defining the tickerplant, and the port is 5010:

..$ q ticker.q -p 5010

2. feed - connects to the tickerplant and sends a new batch every 107 milliseconds:

..$ q feed.q localhost:5010 -t 107

3. rdb - the parameter cx.q defines the realtime database and its own listening port (similarly for other databases):

..$ q cx.q rdb -p 5011

4. show - the show process, which does not need a port:

..$ q cx.q show

6.6 Running Processes ManuallyIf the run scripts are unsuitable for your system, then you can call each process manually. In each case, open up anew terminal window, change to the q directory and enter the appropriate command. The tickerplant should bestarted first.For example on a Mac, for each of the following commands, open a new terminal, change to ~/q, then call thecommand:

m32/q start/tick/ticker.q -p 5010

m32/q start/tick/feed.q localhost:5010 -t 107

m32/q start/tick/cx.q rdb -p 5011

Refer to run1.sh for the remaining processes.

Page 24: kdb

Startingkdbplus/rdb 23

6.7 Process ExamplesSet focus on the last window, and view the trade table. Note that each time the table is viewed, it will be updatedwith the latest data:

q)trade

sym | time price size stop cond ex

----| ------------------------------------

AIG | 14:26:48.844 27.62 18 0 Z O

DELL| 14:26:49.058 11.83 57 0 K N

DOW | 14:26:49.058 19.81 69 1 G O

...

Set focus on the vwap window, and view the vwap table. Note that the "price" is actually price*size. This can beupdated much more efficiently than storing all prices and sizes:

q)vwap

sym | price size

----| -------------

IBM | 42153.14 998

MSFT| 51620.66 717

To get the correct weighted average price:

q)select price%size,size by sym from vwap

sym | price size

----| --------------

IBM | 41.74374 31824

MSFT| 73.38304 28612

Prev: 5. Historical DatabaseTable of Contents

References[1] http:/ / kx. com/ kdb+ tick. php[2] http:/ / code. kx. com/ wsvn/ code/ contrib/ cburke/ start/ tick[3] http:/ / kx. com/ q/ tick/ c. q

Page 25: kdb

Article Sources and Contributors 24

Article Sources and ContributorsStartingkdbplus/introduction  Source: http://code.kx.com/mediawiki/index.php?oldid=2528  Contributors: Chris Burke, Simon Garland

Startingkdbplus/qlanguage  Source: http://code.kx.com/mediawiki/index.php?oldid=2530  Contributors: Chris Burke, Colm Earley

Startingkdbplus/ipc  Source: http://code.kx.com/mediawiki/index.php?oldid=2481  Contributors: Chris Burke

Startingkdbplus/tables  Source: http://code.kx.com/mediawiki/index.php?oldid=2289  Contributors: Charlie Skelton, Chris Burke

Startingkdbplus/hdb  Source: http://code.kx.com/mediawiki/index.php?oldid=2274  Contributors: Chris Burke

Startingkdbplus/rdb  Source: http://code.kx.com/mediawiki/index.php?oldid=2285  Contributors: Chris Burke

Page 26: kdb

Image Sources, Licenses and Contributors 25

Image Sources, Licenses and ContributorsFile:tree.png  Source: http://code.kx.com/mediawiki/index.php?title=File:Tree.png  License: unknown  Contributors: -

Page 27: kdb

License 26

Licenseterms and conditionsTermsAndConditions