homework 2 what is the role of the secondary database that we have to create? what is the role of...

42
Homework 2 Homework 2 What is the role of the secondary What is the role of the secondary database that we have to create? database that we have to create? A relational DBMS supports multiple A relational DBMS supports multiple index structures on a table: index structures on a table: Create B-tree index on Salary attribute of Create B-tree index on Salary attribute of Emp Emp Create Hash index on SS# of Emp Create Hash index on SS# of Emp When an application deletes a record from the When an application deletes a record from the Emp table (say using the hash index given the Emp table (say using the hash index given the SS# of the employee to be fired), all index SS# of the employee to be fired), all index structures are updated. structures are updated. The same concept exists with a The same concept exists with a relational storage manager (Berkeley relational storage manager (Berkeley DB). DB). Create primary and secondary databases and Create primary and secondary databases and associate them with one another: associate them with one another: Create a B+-tree primary database on Salary Create a B+-tree primary database on Salary

Upload: abril-maple

Post on 14-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Homework 2Homework 2

What is the role of the secondary database What is the role of the secondary database that we have to create?that we have to create? A relational DBMS supports multiple index A relational DBMS supports multiple index

structures on a table:structures on a table: Create B-tree index on Salary attribute of EmpCreate B-tree index on Salary attribute of Emp Create Hash index on SS# of EmpCreate Hash index on SS# of Emp When an application deletes a record from the Emp When an application deletes a record from the Emp

table (say using the hash index given the SS# of the table (say using the hash index given the SS# of the employee to be fired), all index structures are updated.employee to be fired), all index structures are updated.

The same concept exists with a relational The same concept exists with a relational storage manager (Berkeley DB).storage manager (Berkeley DB). Create primary and secondary databases and associate Create primary and secondary databases and associate

them with one another:them with one another: Create a B+-tree primary database on SalaryCreate a B+-tree primary database on Salary Create a Hash secondary database on SS#Create a Hash secondary database on SS# Associate the two indexes together.Associate the two indexes together.

Page 2: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Homework 2Homework 2

When we verify that all of our records have been When we verify that all of our records have been stored correctly, is it sufficient to just count the stored correctly, is it sufficient to just count the number of retrieved records, or do we have to keep number of retrieved records, or do we have to keep our own copy of all the records and verify each row our own copy of all the records and verify each row in the database in the database against our "shadow" copy?against our "shadow" copy? Either approach is acceptable.Either approach is acceptable.

Hint 1: Start by defining very small amount of Hint 1: Start by defining very small amount of memory for your main memory database (say 10 memory for your main memory database (say 10 MB) to minimize time required to debug your MB) to minimize time required to debug your program. Once your program is stable, scale to a program. Once your program is stable, scale to a large amount of memory.large amount of memory.

Hint 2: Do not be surprised if Berkeley DB stores 12 Hint 2: Do not be surprised if Berkeley DB stores 12 MB of data into 10 MB – read the documentation MB of data into 10 MB – read the documentation carefully! carefully!

Page 3: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Google’s BigtableGoogle’s Bigtable

Shahram GhandeharizadehShahram GhandeharizadehComputer Science DepartmentComputer Science DepartmentUniversity of Southern CaliforniaUniversity of Southern California

Page 4: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Overall ArchitectureOverall Architecture

Shared-nothing architecture consisting of Shared-nothing architecture consisting of thousands of nodes!thousands of nodes! A node is an off-the-shelf, commodity PC.A node is an off-the-shelf, commodity PC.

Google File SystemGoogle File System

Google’s Bigtable Data ModelGoogle’s Bigtable Data Model

Google’s Map/Reduce FrameworkGoogle’s Map/Reduce Framework

Yahoo’s Pig Latin Yahoo’s Pig Latin

…………..

Page 5: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

BigtableBigtable

A data model (a schema).A data model (a schema). A sparse, distributed persistent multi-dimensional A sparse, distributed persistent multi-dimensional

sorted map.sorted map. Data is partitioned across the nodes seamlessly.Data is partitioned across the nodes seamlessly. The map is indexed by a row key, column key, and a The map is indexed by a row key, column key, and a

timestamp.timestamp. Output value in the map is an un-interpreted array of Output value in the map is an un-interpreted array of

bytes.bytes. (row: byte[ ], column: byte[ ], time: int64) (row: byte[ ], column: byte[ ], time: int64) byte[ ] byte[ ]

Page 6: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

RowsRows

A row key is an arbitrary string.A row key is an arbitrary string. Typically 10-100 bytes in size, up to 64 KB.Typically 10-100 bytes in size, up to 64 KB.

Every read or write of data under a single row is Every read or write of data under a single row is atomic.atomic.

Data is maintained in lexicographic order by row Data is maintained in lexicographic order by row key.key.

The row range for a table is dynamically partitioned.The row range for a table is dynamically partitioned. Each partition (row range) is named a Each partition (row range) is named a tablettablet..

Unit of distribution and load-balancing.Unit of distribution and load-balancing.

Objective: make read operations single-sited!Objective: make read operations single-sited! E.g., In Webtable, pages in the same domain are grouped E.g., In Webtable, pages in the same domain are grouped

together by reversing the hostname components of the together by reversing the hostname components of the URLs: com.google.maps instead of maps.google.com.URLs: com.google.maps instead of maps.google.com.

Page 7: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Column FamiliesColumn Families

Column keys are grouped into sets called Column keys are grouped into sets called column families.column families.

A column family must be created before data A column family must be created before data can be stored in a column key.can be stored in a column key.

Hundreds of static column families.Hundreds of static column families. Syntax is family:key, e.g., Language:English, Syntax is family:key, e.g., Language:English,

Language:German, etc.Language:German, etc.

Page 8: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

TimestampsTimestamps

64 bit integers64 bit integers Assigned by:Assigned by:

Bigtable: real-time in microseconds,Bigtable: real-time in microseconds, Client application: when unique timestamps are Client application: when unique timestamps are

a necessity.a necessity.

Items in a cell are stored in decreasing Items in a cell are stored in decreasing timestamp order.timestamp order.

Application specifies how many versions (n) Application specifies how many versions (n) of data items are maintained in a cell.of data items are maintained in a cell. Bigtable garbage collects obsolete versions.Bigtable garbage collects obsolete versions.

Page 9: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

BigtableBigtable

Used in different applications supported by Used in different applications supported by Google.Google.

Page 10: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Application 1: Google AnalyticsApplication 1: Google Analytics

Enables webmasters to analyze traffic Enables webmasters to analyze traffic pattern at their web sites. Statistics such as:pattern at their web sites. Statistics such as: Number of unique visitors per day and the page Number of unique visitors per day and the page

views per URL per day,views per URL per day, Percentage of users that made a purchase given Percentage of users that made a purchase given

that they earlier viewed a specific page.that they earlier viewed a specific page. How? How?

A small JavaScript program that the webmaster A small JavaScript program that the webmaster embeds in their web pages.embeds in their web pages.

Every time the page is visited, the program is Every time the page is visited, the program is executed.executed.

Program records the following information about Program records the following information about each request:each request: User identifierUser identifier The page being fetchedThe page being fetched

Page 11: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Application 1: Google Analytics (Cont…)Application 1: Google Analytics (Cont…)

Two of the BigtablesTwo of the Bigtables Raw click table (~ 200 TB)Raw click table (~ 200 TB)

A row for each end-user session.A row for each end-user session. Row name include website’s name and the time at Row name include website’s name and the time at

which the session was created.which the session was created. Clustering of sessions that visit the same web site. Clustering of sessions that visit the same web site.

And a sorted chronological order.And a sorted chronological order. Compression factor of 6-7.Compression factor of 6-7.

Summary table (~ 20 TB)Summary table (~ 20 TB) Stores predefined summaries for each web site.Stores predefined summaries for each web site. Generated from the raw click table by periodically Generated from the raw click table by periodically

scheduled MapReduce jobs.scheduled MapReduce jobs. Each MapReduce job extracts recent session data from Each MapReduce job extracts recent session data from

the raw click table.the raw click table. Row name includes website’s name and the column Row name includes website’s name and the column

family is the aggregate summaries.family is the aggregate summaries. Compression factor is 2-3.Compression factor is 2-3.

Single-sitedSingle-sited

Page 12: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Application 2: Google Earth & MapsApplication 2: Google Earth & Maps

Functionality: Pan, view, and annotate Functionality: Pan, view, and annotate satellite imagery at different resolution satellite imagery at different resolution levels.levels.

One Bigtable stores raw imagery (~ 70 TB):One Bigtable stores raw imagery (~ 70 TB): Row name is a geographic segments. Names are Row name is a geographic segments. Names are

chosen to ensure adjacent geographic segments chosen to ensure adjacent geographic segments are clustered together.are clustered together.

Column family maintains sources of data for Column family maintains sources of data for each segment.each segment.

There are different sets of tables for serving There are different sets of tables for serving client data, e.g., index table.client data, e.g., index table.

Single-sitedSingle-sited

Page 13: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Application 3: Personalized SearchApplication 3: Personalized Search

Records user queries and clicks across Records user queries and clicks across Google properties.Google properties.

Users browse their search histories and Users browse their search histories and request for personalized search results request for personalized search results based on their historical usage patterns.based on their historical usage patterns.

One Bigtable:One Bigtable: Row name is useridRow name is userid A column family is reserved for each action type, A column family is reserved for each action type,

e.g., web queries, clicks.e.g., web queries, clicks. User profiles are generated using MapReduce.User profiles are generated using MapReduce.

These profiles personalize live search results.These profiles personalize live search results.

Replicated geographically to reduce latency and Replicated geographically to reduce latency and increase availability.increase availability.

Single-sitedSingle-sited

Page 14: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Bigtable APIBigtable API

Implements interfaces to Implements interfaces to create and delete tables and column families, create and delete tables and column families, modify cluster, table, and column family modify cluster, table, and column family

metadata such as access control rights,metadata such as access control rights, Write or delete values in Bigtable,Write or delete values in Bigtable, Look up values from individual rows,Look up values from individual rows, Iterate over a subset of the data in a table,Iterate over a subset of the data in a table, Atomic R-M-W sequences on data stored in a Atomic R-M-W sequences on data stored in a

single row key (No support for Xacts across single row key (No support for Xacts across multiple rows).multiple rows).

Page 15: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Function ShippingFunction Shipping

Similar to Gamma, Bigtable is based on Similar to Gamma, Bigtable is based on function shipping.function shipping.

Yay!Yay!Very smart!Very smart!

Page 16: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

AssumptionsAssumptions

Uses GFS to store log and data files.Uses GFS to store log and data files. Bigtable processes share the same Bigtable processes share the same

machines with processes from other machines with processes from other applications.applications. A shared cluster of commodity PCs.A shared cluster of commodity PCs.

A cluster management system:A cluster management system: Schedules jobs,Schedules jobs, Manages resources on shared machines,Manages resources on shared machines, Monitors PC status and handles failures.Monitors PC status and handles failures.

Page 17: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Building BlocksBuilding Blocks

Google File SystemGoogle File System High availability.High availability.

SSTableSSTable A key/value database.A key/value database.

ChubbyChubby Name space.Name space.

Page 18: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

SSTableSSTable

A database similar to a BDB database:A database similar to a BDB database: Stores and retrieves key/data pairs.Stores and retrieves key/data pairs.

Key and data are arbitrary byte arrays.Key and data are arbitrary byte arrays.

Cursors to iterate key/value pairs given a Cursors to iterate key/value pairs given a selection predicate (exact and range).selection predicate (exact and range).

Configurable to use either persistent store (disk) Configurable to use either persistent store (disk) or main-memory based.or main-memory based.

A SSTable is stored in GFS.A SSTable is stored in GFS.

Page 19: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Bigtable: Hybrid Range Partitioning [VLDB’90]Bigtable: Hybrid Range Partitioning [VLDB’90]

To minimize the impact of load imbalance, To minimize the impact of load imbalance, construct more (HN) ranges than (N) nodes, construct more (HN) ranges than (N) nodes, e.g., 10 ranges for a 5 node system; H = 2.e.g., 10 ranges for a 5 node system; H = 2.

H is higher in practice; 10 in the H is higher in practice; 10 in the experimental section of the paper.experimental section of the paper.

A range is named a A range is named a tablettablet. A tablet is . A tablet is represented as:represented as: A set of SSTable files.A set of SSTable files. A set of redo points which are pointers into any A set of redo points which are pointers into any

commit logs that main contain data for the tablet.commit logs that main contain data for the tablet.

0-100-1051-6051-60

11-2011-2061-7061-70

21-3021-3071-8071-80

31-4031-4081-9081-90

41-5041-5091-10091-100

Page 20: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

ChubbyChubby

A persistent and distributed lock service.A persistent and distributed lock service. Consists of 5 active replicas, one replica is Consists of 5 active replicas, one replica is

the master and serves requests.the master and serves requests. Service is functional when majority of the Service is functional when majority of the

replicas are running and in communication replicas are running and in communication with one another – when there is a quorum.with one another – when there is a quorum.

Implements a nameservice that consists of Implements a nameservice that consists of directories and files.directories and files.

Page 21: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Software InfrastructureSoftware Infrastructure

1.1. A Bigtable library linked to every client.A Bigtable library linked to every client.2.2. Many tablet servers.Many tablet servers.

Tablet servers are added and removed Tablet servers are added and removed dynamically.dynamically.

Ten to a thousand tablets assigned to a tablet Ten to a thousand tablets assigned to a tablet server.server.

Each tablet is typically 100-200 MB in size.Each tablet is typically 100-200 MB in size.

3.3. One master server responsible for:One master server responsible for: Assigning tablets to tablet servers,Assigning tablets to tablet servers, Detecting the addition and deletion of tablet Detecting the addition and deletion of tablet

servers,servers, Balancing tablet-server load,Balancing tablet-server load, Garbage collection of files in GFS.Garbage collection of files in GFS.

Client communicates directly with tablet Client communicates directly with tablet server for reads/writes.server for reads/writes.

Page 22: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Location of Tablets (Ranges)Location of Tablets (Ranges)

A 3-level hierarchy:A 3-level hierarchy:

11stst Level: A file stored in chubby contains location Level: A file stored in chubby contains location of the root tablet, i.e., a directory of ranges (tablets) of the root tablet, i.e., a directory of ranges (tablets) and associated meta-data.and associated meta-data. The root tablet never splits.The root tablet never splits.

22ndnd Level: Each meta-data tablet contains the Level: Each meta-data tablet contains the location of a set of user tablets.location of a set of user tablets.

33rdrd Level: A set of SSTable identifiers for each Level: A set of SSTable identifiers for each tablet.tablet.

Analysis:Analysis: Each meta-data row stores ~ 1KB of data,Each meta-data row stores ~ 1KB of data, With 128 MB tablets, the three level store addresses 2With 128 MB tablets, the three level store addresses 23434

tablets (2tablets (26161 bytes in 128 MB tablets). bytes in 128 MB tablets). Approaches a Zetabyte (million Petabytes).Approaches a Zetabyte (million Petabytes).

Page 23: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Client/Master Client/Master

Client caches tablet locations.Client caches tablet locations.

Page 24: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Bigtable and ChubbyBigtable and Chubby

Bigtable uses Chubby to:Bigtable uses Chubby to: Ensure there is at most one active master at a Ensure there is at most one active master at a

time,time, Store the bootstrap location of Bigtable data Store the bootstrap location of Bigtable data

(Root tablet),(Root tablet), Discover tablet servers and finalize tablet server Discover tablet servers and finalize tablet server

deaths,deaths, Store Bigtable schema information (column Store Bigtable schema information (column

family information),family information), Store access control list.Store access control list.

If Chubby becomes unavailable for an If Chubby becomes unavailable for an extended period of time, Bigtable becomes extended period of time, Bigtable becomes unavailable.unavailable.

Page 25: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Placement of TabletsPlacement of Tablets

A tablet is assigned to one tablet server at a time.A tablet is assigned to one tablet server at a time. Master maintains:Master maintains:

The set of live tablet servers,The set of live tablet servers, Current assignment of tablets to tablet servers (including Current assignment of tablets to tablet servers (including

the unassigned ones)the unassigned ones)

Chubby maintains tablet servers:Chubby maintains tablet servers: A tablet server creates and acquires an eXclusive lock on a A tablet server creates and acquires an eXclusive lock on a

uniquely named file in a specific chubby directory (named uniquely named file in a specific chubby directory (named server directoryserver directory),),

Master monitors Master monitors server directoryserver directory to discover tablet server, to discover tablet server, A tablet server stops processing requests if it loses its X A tablet server stops processing requests if it loses its X

lock (network partitioning).lock (network partitioning). Tablet server will try to obtain an X lock on its uniqely named Tablet server will try to obtain an X lock on its uniqely named

file as long as it exists.file as long as it exists. If the uniquely named file of a tablet server no longer exists If the uniquely named file of a tablet server no longer exists

then the tablet server kills itself. Goes back to a free pool to then the tablet server kills itself. Goes back to a free pool to be assigned tablets by the master. be assigned tablets by the master.

Page 26: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Placement of TabletsPlacement of Tablets

Master detects when a tablet server is in the Master detects when a tablet server is in the free pool.free pool. How? Master periodically probes each tablet How? Master periodically probes each tablet

server for the status of its lock.server for the status of its lock.

Page 27: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

MasterMaster

Should the Master die, a new Master is Should the Master die, a new Master is initiated. The master executes the following initiated. The master executes the following steps:steps:

Page 28: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Client Write & Read OperationsClient Write & Read Operations

Write operation arrives at a tablet server:Write operation arrives at a tablet server: Server ensures the client has sufficient privileges for the Server ensures the client has sufficient privileges for the

write operation (Chubby),write operation (Chubby), A log record is generated to the commit log file,A log record is generated to the commit log file, Once the write commits, its contents are inserted into the Once the write commits, its contents are inserted into the

memtable.memtable.

Read operation arrives at a tablet server:Read operation arrives at a tablet server: Server ensures client has sufficient privileges for the read Server ensures client has sufficient privileges for the read

operation (Chubby),operation (Chubby), Read is performed on a merged view of (a) the SSTables Read is performed on a merged view of (a) the SSTables

that constitute the tablet, and (b) the memtable. that constitute the tablet, and (b) the memtable.

Page 29: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Write OperationsWrite Operations

As writes execute, size of memtable increases.As writes execute, size of memtable increases. Once memtable reaches a threshold:Once memtable reaches a threshold:

Memtable is frozen,Memtable is frozen, A new memtable is created,A new memtable is created, Frozen metable is converted to an SSTable and written to Frozen metable is converted to an SSTable and written to

GFS.GFS.

This This minor compactionminor compaction minimizes memory usage of minimizes memory usage of tablet server, and reduces recovery time in the tablet server, and reduces recovery time in the presence of crashes (checkpoints).presence of crashes (checkpoints).

Merging compactionMerging compaction (in the background) reads a (in the background) reads a few SSTables and memtable to produce one few SSTables and memtable to produce one SSTable. (Input SSTables and memtable are SSTable. (Input SSTables and memtable are discareded.)discareded.)

Major compactionMajor compaction rewrites all SSTables into exactly rewrites all SSTables into exactly one SSTable (containing no deletion entries).one SSTable (containing no deletion entries).

Page 30: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

System PerformanceSystem Performance

Experiments involving random reads (from Experiments involving random reads (from GFS and main memory) and writes, GFS and main memory) and writes, sequential reads and writes, and scans.sequential reads and writes, and scans. Scan: A single RPC fetches a large sequence of Scan: A single RPC fetches a large sequence of

values from the tablet server.values from the tablet server.

Page 31: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Random ReadsRandom Reads

Page 32: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Random ReadsRandom Reads

Page 33: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Sequential ReadsSequential Reads

A read request is for 1000 bytes.A read request is for 1000 bytes.

Sequential reads perform better because a Sequential reads perform better because a tablet server caches the 64 KB SSTable tablet server caches the 64 KB SSTable

block (from GFS) and uses it to serve the block (from GFS) and uses it to serve the next 64 read requests.next 64 read requests.

Page 34: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Random Reads from MemoryRandom Reads from Memory

Random reads from memory avoid the overhead Random reads from memory avoid the overhead of fetching a 64 KB block from GFS.of fetching a 64 KB block from GFS.

Data is mapped onto the memory of the tablet serverData is mapped onto the memory of the tablet server

Page 35: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

WritesWrites

Tablet server appends all incoming writes to Tablet server appends all incoming writes to a single commit log and uses group commit a single commit log and uses group commit to stream these writes to GFS efficiently.to stream these writes to GFS efficiently.

Page 36: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Scale-upScale-up

As the number of tablet servers is increased by a factor of 500:As the number of tablet servers is increased by a factor of 500: Performance of random reads from memory increases by a factor Performance of random reads from memory increases by a factor

of 300.of 300. Performance of scans increases by a factor of 260.Performance of scans increases by a factor of 260.

Why?Why?

Page 37: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Scale-upScale-up

As the number of tablet servers is increased by a factor of 500:As the number of tablet servers is increased by a factor of 500: Performance of random reads from memory increases by a factor Performance of random reads from memory increases by a factor

of 300.of 300. Performance of scans increases by a factor of 260.Performance of scans increases by a factor of 260.

Why?Why?

Page 38: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

3R1 R2 R3R1 R3 R2

R2 R1 R3R2 R3 R1

R3 R1 R2R3 R2 R1

{R1, R2, R3}{R1, R2, R3}

{R1, R2, R3}

{R1, R3} R2{R1, R3} R2

{R1, R3} R2

{R1, R3}R2{R1, R3}R2

6Idealcases

{R1, R3}R2

{R2, R3} R1{R2, R3} R1

{R2, R3} R1

{R2, R3}R1{R2, R3}R1

{R2, R3}R1

{R2, R1} R3{R2, R1} R3

{R2, R1} R3

{R2, R1}R3{R2, R1}R3

{R2, R1}R3

21

27 ways to 27 ways to assign 3 assign 3

requests to requests to the 3 the 3

nodes!nodes!

Page 39: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Brain TeaserBrain Teaser

Given N servers and M requests, Given N servers and M requests, compute the probability of:compute the probability of:

M/N requests per node.M/N requests per node. Number of ways M requests may map onto N servers Number of ways M requests may map onto N servers

and the probability of each scenario.and the probability of each scenario.

Reward for correct answer:Reward for correct answer:

Page 40: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

Data Shipping?Data Shipping?

Data shipping will saturate resources.Data shipping will saturate resources. Do not be fooled by this discussion because Do not be fooled by this discussion because

Bigtable has “function shipping” (not Bigtable has “function shipping” (not reported in the evaluation section).reported in the evaluation section).

Page 41: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

LessonsLessons

Many types of errors in a real system.Many types of errors in a real system. Delay adding new features until it is clear Delay adding new features until it is clear

how the new feature will be used.how the new feature will be used. Very important to have eyes that can see:Very important to have eyes that can see:

Page 42: Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational

ConclusionConclusion

From the very first lecture of this semester:From the very first lecture of this semester: