cloud computing in the seventies. or the discovery of hash...

30
1 HiNC 3, Stockholm 19.10.2010 Cloud computing in the seventies. Or the discovery of hash based relational algebra Kjell Bratbergsengen Department of Computer and Information Science (IDI) The Norwegian University of Science and Technology (NUST/NTNU) [email protected]

Upload: others

Post on 03-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

1

HiNC 3, Stockholm 19.10.2010

Cloud computing in the seventies.Or the discovery of hash based relational algebra

Kjell BratbergsengenDepartment of Computer and Information Science (IDI)

The Norwegian University of Science and Technology (NUST/NTNU)[email protected]

Page 2: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

2

HiNC 3, Stockholm 19.10.2010

1. The foundation, the database technology group2. The relational model, hard to run fast3. First attempt, the ASTRA project, together with Norsk Data4. However, hash based relational algebra algorithms was discovered5. TechRa, realization on a mono computer6. Had to build parallel hardware ourselves7. The flourishing period8. Commercialisation9. What we achieved10. And learnt ?

Outline.

Page 3: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

3

HiNC 3, Stockholm 19.10.2010

• A small database group was formed at NTH and SINTEF in 1972 lead by the author, then assistant professor at NTH.

• The group built several file and database systems right from its beginning: – AKUFIL, for SINTEF acoustics research group– INDSEQ, for CASCADE– Ra1, for BIBSYS– Ra2, for Kongsberg Vaapenfabrik– Ra3, back end database computer

The database group

Page 4: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

4

HiNC 3, Stockholm 19.10.2010

• Familiar with CODASYL systems

• SIBAS was developed by SI

• Adopted by ND

• No room for another system of this type in Norway

What about CODASYL?

Page 5: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

5

HiNC 3, Stockholm 19.10.2010

• In 1970 Edgar F. Codd published his seminal paper "A Relational Model of Data for Large Shared Data Banks"

• Relational algebra is the “instruction set” of relational database systems• Instructions:

– selection, search expression– projection, find records with equal key values– join, “– inter­section, “– union, “– difference, “ – division and “ – aggregation. “

• Operands are tables so are also results. Algorithms for doing relational algebra were normally nested loop algorithms or sort based. These methods were terrible for large operands.

The ASTRA project

Page 6: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

6

HiNC 3, Stockholm 19.10.2010

• Very slow execution, based on nested loop or sorting• David DeWitts join test: R = A join B, A 10000 records a

182 byte and B 1000 records 182 bytes (total 2 MB)

• A number of research projects tried to find effective ways to implement the relational model the next 15 years

• Because of the time consuming algorithms; Almost all projects involved building some type of new hardware, either parallel disks or machines with smart or especially associative memory

• The Norsk Data project was also based on parallel hardware

Efficient execution only on specialized hardware ?

Page 7: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

7

HiNC 3, Stockholm 19.10.2010

• A parallel computer• Distributed tables• A research proposal to the Norwegian

Science Foundation – the ASTRA project in 1977

• Norsk Data should make the hardware• We should develop a relational database

system for parallel computers. • The database data was to be stored in main

memory. Soon as cheap as disk memory ???

Norsk Data saw a market for selling more hardware

Page 8: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

8

HiNC 3, Stockholm 19.10.2010

• But disks became cheaper too !!• Price per MB: 2000 NOK in 1982, 0,07 ØRE 2010• Blue line: 1 kr today, 60 øre in one year

Main memory soon as cheap as disk space ? (1977)

1,00

10,00

100,00

1000,00

10000,00

100000,00

1000000,00

10000000,00

Series2 Series3

Page 9: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

9

HiNC 3, Stockholm 19.10.2010

Easy to parallelize selectionWhat about join?C = A0*B0+A0*B1+A0*B2+A0*B3+A0*B4 +...+ An*Bn• It is not that bad, it is only result produced if two records have equal values on the key of operation.

• Use of hashing:– Equal key values ­> equal hash values– Different key values ­> most often different hash values, some times same hash value

For sure: Different hash values ­> different key valuesMove potentially matching records on the same node, ­> redistribute

The problem of efficient parallel execution

Page 10: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

10

HiNC 3, Stockholm 19.10.2010

• A large value range (the key) is reduced to a smaller value range using some mathematical formula.

• The smaller value range is often called the address space. It is the number of physical resource units available for storing the records with the (large) key.

• The resource unit could be a table entry, disk or memory block or a node in a parallel computer

• Different key values transformed to the same address are called synonyms• The transformation formula should meet two requirements:

– All possible key values must fall within the address space (absolute requirement)– The keys should be evenly distributed within the address range (beneficial)

• Example, the key value “HYPERCUBES” could be transformed to the value “adr” using the formula :

• adr=MOD((HYPE)+(CUBE)+(S ___), N)– MOD(X,N) ≡ X­(X//N)*N, the residual after dividing with N

What is hashing?

Page 11: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

11

HiNC 3, Stockholm 19.10.2010

• Rings

• Arrays (two dimensional rings)

• Cubes (three dimensional rings)

• Hypercubes• N=2D

The search for scalable redistribution network

Page 12: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

12

HiNC 3, Stockholm 19.10.2010

• ASTRAL = Database tables + SIMULA• SIMULA was extended with a new data type: table

• Associative indexes (selection expressions)• Persistant data (the outer block)• Transaction handling and recovery• Major contributors were Tore Amble and Tor Stålhane.

• A compiler and a simple run time systems was made

• Evaluated in Smidt and Brodie: • The ASTRA project was abandoned in 1981.

The ASTRAL Language

Page 13: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

13

HiNC 3, Stockholm 19.10.2010

• The results from ASTRA was hash based algebra methods ­ a classical divide and conquer algorithm

• Also best on mono computers• SYSSCAN Mapping (a KV –subsidiary) needed a database system for storing geographical data

• TechRa – the first relational databse system using hash based algebra• Sequences for coordinates, images and text (better than blobs)• Implemented by people from Kvatro (A KV branch office in Trondheim), RUNIT and IDI.

• Kvatro marketed TechRa successfully for more than 20 years• Hash based algorithms stood the test!

After ASTRA came TechRa

Page 14: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

14

HiNC 3, Stockholm 19.10.2010

• The parallel algorithms were theory still• 1980­83 I worked on a VLSI­chip design for realizing hypercube networks• In 1983 we came across dual ported memory chips, 1 KB DPRAM ­­ good

enough?• Three pcs was connected with DPRAM during Christmas holidays in 1983.

My first parallel computer.• We got NFR support, a parallel computer with 8 nodes to demonstrate parallel

algebra was built• W promised to do the DeWitt test in less than 3 seconds, we did in 1,8. • The same test was run on several computers:

– TechRa on (hash)• ND540 used 17 seconds, • VAX 11/750. used 32 seconds

– Ingres on (sorting)• VAX 11/750 156 seconds

We had to build the hardware ourselves

Page 15: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

15

HiNC 3, Stockholm 19.10.2010

• CROSS8 was the name of this first operational parallel computer, it was finished in 1985.

• It had a direct connection – one common dual port RAM of 1024 bytes ­ between each pair of nodes, which explains it’s name.

• Each node was a single board computer with a SCSI disk.

• Each node had also a DPRAM in common with a supervising PC which controlled the whole system.

• We experienced DPRAM to a very flexible and useful tool for developing new hard­ and software.

• A curiosity: During start up, DPRAM was mapped in as boot memory, after start up the same area was used for transferring messages.

CROSS8

Page 16: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

16

HiNC 3, Stockholm 19.10.2010

Cross 8 and dual ported ram

Page 17: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

17

HiNC 3, Stockholm 19.10.2010

• Delta, the Christmas project• CROSS 8• We built 3 parallel hypercube computers:• HC16­186 and , • HC16­386 and• HC64­486, only 4 nodes built

• HC16­186 and HC16­386 were fairly stable systems and they were used to improve the database algorithms, and a large number of parallel algorithms were developed and tested by our PhD and master students.

• We had the world record in sorting for a period; in 1989 100 MB was sorted in 180 seconds on HC16­186.

5 parallel computers were built:

Page 18: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

18

HiNC 3, Stockholm 19.10.2010

HC16­186 node and some results

Page 19: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

19

HiNC 3, Stockholm 19.10.2010

HC16­386

Page 20: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

20

HiNC 3, Stockholm 19.10.2010

• From 1985 to 1990• We had an infrastructure, NFR gave us enough to keep two researchers and material for one new computer every 18 months

• Having a group also means a social arena• A number of PhDs and masters in key positions today.

The HCL years

Page 21: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

22

HiNC 3, Stockholm 19.10.2010

More projects, masters and PhD Thesis

Page 22: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

23

HiNC 3, Stockholm 19.10.2010

• We could have built a parallel supercomputer

• Not only a fantasy• We built a 386­node with a vector unit based on a Weitek chip. (Master thesis, Eirik Knutsen)

• 32 MFLOPS in every node• Was not well liked by those who planned to buy a supercomputer

• Used and improved by Hadland

Our supercomputer effort

Page 23: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

24

HiNC 3, Stockholm 19.10.2010

• In 1989 we were ready to commercialize, HypRa AS was established

• This author was board chairman• The 7 project participants were the initial share holders• Telenor was interested in our product: • Parallel computers good at searching and data mining• Telenor shifted interest: Mobile communication systems needed a fault tolerant ,high capacity, transaction system

• However this required more research • … and more funding

Commercialisation

Page 24: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

25

HiNC 3, Stockholm 19.10.2010

• Those who have the gold – rules • More capital meant less influence for the starters• New chairman of HypRa AS• Loss of control • Loss of resources• Left Hypra AS after two years

The golden rule

Page 25: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

26

HiNC 3, Stockholm 19.10.2010

• New owners and new profiles required new company names.• HypRa became Teleserve became Clustra• Fault tolerant• Continuously available• Low latency• High capacity• Transaction system • Became a reality after more years of research and development, and

• more than 40 MNOK from overseas investors• However, that is not my story

Clustra AS

Page 26: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

27

HiNC 3, Stockholm 19.10.2010

• Efficient execution of relational algebra• We broke the one machine performance barrier for database systems

• A world record in sorting• Great confidence in programming parallel computers• Establishing Trondheim as a center for advanced database technology

• VLDB was held in Trondheim 2005

• Yahoo and Google established branch offices in Trondheim

• SUN bought Clustra, Oracle bought SUN, Oracle is in Trondheim now

What we achieved

Page 27: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

28

HiNC 3, Stockholm 19.10.2010

• During the 5 intensive years at the Hypercube laboratory we produced a large number of master candidates with hands on experience in data management and parallel computers.

• We covered a broad field from VLSI design to parallel programming and algorithms and advanced database technology

• Candidate production: reason for international companies to establish branch offices in Trondheim (Yahoo, Google, Oracle).

• Many in key positions today

We educated great students

Page 28: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

29

HiNC 3, Stockholm 19.10.2010

and learnt …• Commercialization is very risky.

– It took away people, – the lab, and – financing. – The database group was never reestablished to the same level.

• Industrial partner– Write fool proof contracts– Conflicts of interest easily develops

• A new company should do as much as possible within the department– Do not move out of department until too large– Innovation should be an inspiration for colleagues and students– Offices with lights off are not …– We have had the same experience from several spin offs, FAST is the most prominent

• Commercialization is very inspiring, – Have the students participate– Keep it in the department (again)– Use it for rising interest and recruitment– Computer science is lucky, we do not need FDA approval for our products !

Page 29: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

30

HiNC 3, Stockholm 19.10.2010

• We discovered hash methods for relational algebra in 1978-79• Vaguely published in 1980 (Non numerical methods in Monterey

Ca)• Published on VLDB Singapore 1984, David DeWitt on SIGMOD

same year. (mono computer)• Parallel computers 1987--• Hash algorithms in DB2, Oracle and Sybase announced at

SIGMOD 1994• Cloud computing 2006 ---• We should have published more• Our tradition or culture: • ”Publishing a good idea is wasting a good idea, ”

A long way from lab to old commercial systems

Page 30: Cloud computing in the seventies. Or the discovery of hash ...dsv.su.se/polopoly_fs/1.113645.1355384838!/menu/...The foundation, the database technology group 2. The relational model,

31

HiNC 3, Stockholm 19.10.2010

• Torgrim Gjelsvik was the first researcher at HCL. He did most of the practical work building new machines and writing the low level software.

• Ole John Aske was the second researcher at HCL and he worked mainly with developing software.

• Øystein Torbjørnsen, Eirik Knutsen, Petter Moe and Olav Sandstå were also part of the HCL kernel.

• A large number of students did their project works and master thesis within the lab

• Thanks also to the Norwegian Science Foundation who financed both the ASTRA and the Hypercube projects.

Acknowledgements