Евгений Курпилянский "Индексирование поверх...

Post on 25-May-2015

538 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Indexing Cassandra data in SQL-storage

Indexing Cassandra data in SQL-storage

Kurpilyansky Eugene

SKB Kontur

December 9th, 2013

Indexing Cassandra data in SQL-storage

What do we want?

Suppose, we want to store objects of di�erent types in

Cassandra.

Any object has a primary string key.

Cassandra is well-suited for using it as key-value storage.

But we usually want to search among all objects of same type

by some criterion.

Results of searching must be consistent and re�ect current

state of database.

How can we implement storage which satis�es these

requirements?

Indexing Cassandra data in SQL-storage

What do we want?

Suppose, we want to store objects of di�erent types in

Cassandra.

Any object has a primary string key.

Cassandra is well-suited for using it as key-value storage.

But we usually want to search among all objects of same type

by some criterion.

Results of searching must be consistent and re�ect current

state of database.

How can we implement storage which satis�es these

requirements?

Indexing Cassandra data in SQL-storage

What do we want?

Suppose, we want to store objects of di�erent types in

Cassandra.

Any object has a primary string key.

Cassandra is well-suited for using it as key-value storage.

But we usually want to search among all objects of same type

by some criterion.

Results of searching must be consistent and re�ect current

state of database.

How can we implement storage which satis�es these

requirements?

Indexing Cassandra data in SQL-storage

What do we want?

Suppose, we want to store objects of di�erent types in

Cassandra.

Any object has a primary string key.

Cassandra is well-suited for using it as key-value storage.

But we usually want to search among all objects of same type

by some criterion.

Results of searching must be consistent and re�ect current

state of database.

How can we implement storage which satis�es these

requirements?

Indexing Cassandra data in SQL-storage

What do we want?

Suppose, we want to store objects of di�erent types in

Cassandra.

Any object has a primary string key.

Cassandra is well-suited for using it as key-value storage.

But we usually want to search among all objects of same type

by some criterion.

Results of searching must be consistent and re�ect current

state of database.

How can we implement storage which satis�es these

requirements?

Indexing Cassandra data in SQL-storage

Using native Cassandra indexes

We can use native Cassandra indexes.

Advantages

There is no need to support additional storage.

Disadvantages

Every custom query may require new CF-structure for

e�ective searching.

SQL-indexes are more e�cient than Cassandra's indexes.

There exist a lot of complex indexes (e.g. full-text search

indexing).

Indexing Cassandra data in SQL-storage

Using native Cassandra indexes

We can use native Cassandra indexes.

Advantages

There is no need to support additional storage.

Disadvantages

Every custom query may require new CF-structure for

e�ective searching.

SQL-indexes are more e�cient than Cassandra's indexes.

There exist a lot of complex indexes (e.g. full-text search

indexing).

Indexing Cassandra data in SQL-storage

Using native Cassandra indexes

We can use native Cassandra indexes.

Advantages

There is no need to support additional storage.

Disadvantages

Every custom query may require new CF-structure for

e�ective searching.

SQL-indexes are more e�cient than Cassandra's indexes.

There exist a lot of complex indexes (e.g. full-text search

indexing).

Indexing Cassandra data in SQL-storage

Using native Cassandra indexes

We can use native Cassandra indexes.

Advantages

There is no need to support additional storage.

Disadvantages

Every custom query may require new CF-structure for

e�ective searching.

SQL-indexes are more e�cient than Cassandra's indexes.

There exist a lot of complex indexes (e.g. full-text search

indexing).

Indexing Cassandra data in SQL-storage

Using native Cassandra indexes

We can use native Cassandra indexes.

Advantages

There is no need to support additional storage.

Disadvantages

Every custom query may require new CF-structure for

e�ective searching.

SQL-indexes are more e�cient than Cassandra's indexes.

There exist a lot of complex indexes (e.g. full-text search

indexing).

Indexing Cassandra data in SQL-storage

Using synchronization with SQL-storage

Main idea

Main idea

Run IndexService application which is synchronizing data in

SQL-storage with data in Cassandra (constantly,

in background thread).

To perform a search we should make a query to IndexService

which will return the search result after �nishing SQL-storage

synchronization process.

Indexing Cassandra data in SQL-storage

Using synchronization with SQL-storage

Main idea

Main idea

Run IndexService application which is synchronizing data in

SQL-storage with data in Cassandra (constantly,

in background thread).

To perform a search we should make a query to IndexService

which will return the search result after �nishing SQL-storage

synchronization process.

Indexing Cassandra data in SQL-storage

Using synchronization with SQL-storage

Implementation of EventLog

Create event log

One event per one write-request or delete-request.

Event log sorted by time of event.

Indexing Cassandra data in SQL-storage

Using synchronization with SQL-storage

Implementation of EventLog

Create event log

One event per one write-request or delete-request.

Event log sorted by time of event.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of EventLog

Event

string EventId;

long Timestamp;

string ObjectId;

interface IEventLog

void AddEvent(Event event);

IEnumerable<Event> GetEvents(long fromTicks);

New implementation of IObjectStorage

Before writing or deleting objects call method

IEventLog.AddEvent.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of EventLog

Event

string EventId;

long Timestamp;

string ObjectId;

interface IEventLog

void AddEvent(Event event);

IEnumerable<Event> GetEvents(long fromTicks);

New implementation of IObjectStorage

Before writing or deleting objects call method

IEventLog.AddEvent.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of EventLog

Event

string EventId;

long Timestamp;

string ObjectId;

interface IEventLog

void AddEvent(Event event);

IEnumerable<Event> GetEvents(long fromTicks);

New implementation of IObjectStorage

Before writing or deleting objects call method

IEventLog.AddEvent.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of EventLog

EventLog.AddEvent(Event event)

Create column:

ColumnName = event.Timestamp + ':' + event.EventId

ColumnValue = event

EventLog.GetEvents(long fromTicks)

Execute get_slice from exclusive column for one row.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of EventLog

EventLog.AddEvent(Event event)

Create column:

ColumnName = event.Timestamp + ':' + event.EventId

ColumnValue = event

EventLog.GetEvents(long fromTicks)

Execute get_slice from exclusive column for one row.

We should split all event log into rows using

PartitionInterval to limit size of rows.

PartitionInterval is some constant period of time (e.g.

one hour, or six minutes).

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of EventLog

We should split all event log into rows using

PartitionInterval to limit size of rows.

PartitionInterval is some constant period of time (e.g.

one hour, or six minutes).

EventLog.AddEvent(Event event)

Create column:

RowKey = event.Timestamp / PartitionInterval.Ticks

ColumnName = event.Timestamp + ':' + event.EventId

ColumnValue = event

EventLog.GetEvents(long fromTicks)

Execute get_slice from exclusive column for one or more rows.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

IndexService

It has a local SQL-storage (one storage per one service replica).

There is one SQL-table per one type of object.

There is one speci�c SQL-table for storing times of last

synchronization for each type of object.

There is one background thread per one type of object, which

is reading event log and updating SQL-storage.

For executing incoming SQL-query, we can use data from

SQL-storage and a little range of events.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

IndexService

It has a local SQL-storage (one storage per one service replica).

There is one SQL-table per one type of object.

There is one speci�c SQL-table for storing times of last

synchronization for each type of object.

There is one background thread per one type of object, which

is reading event log and updating SQL-storage.

For executing incoming SQL-query, we can use data from

SQL-storage and a little range of events.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

IndexService

It has a local SQL-storage (one storage per one service replica).

There is one SQL-table per one type of object.

There is one speci�c SQL-table for storing times of last

synchronization for each type of object.

There is one background thread per one type of object, which

is reading event log and updating SQL-storage.

For executing incoming SQL-query, we can use data from

SQL-storage and a little range of events.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

IndexService

It has a local SQL-storage (one storage per one service replica).

There is one SQL-table per one type of object.

There is one speci�c SQL-table for storing times of last

synchronization for each type of object.

There is one background thread per one type of object, which

is reading event log and updating SQL-storage.

For executing incoming SQL-query, we can use data from

SQL-storage and a little range of events.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

IndexService

It has a local SQL-storage (one storage per one service replica).

There is one SQL-table per one type of object.

There is one speci�c SQL-table for storing times of last

synchronization for each type of object.

There is one background thread per one type of object, which

is reading event log and updating SQL-storage.

For executing incoming SQL-query, we can use data from

SQL-storage and a little range of events.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Periodic synchronization action

Set startSynchronizationTime = NowTicks.

Find all events which should be processed.

Process these events: update SQL-storage and keep

unprocessed events (they should be processed on the next

iteration).

Update time of last synchronization to

startSynchronizationTime in SQL-storage.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Periodic synchronization action

Set startSynchronizationTime = NowTicks.

Find all events which should be processed.

Process these events: update SQL-storage and keep

unprocessed events (they should be processed on the next

iteration).

Update time of last synchronization to

startSynchronizationTime in SQL-storage.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Periodic synchronization action

Set startSynchronizationTime = NowTicks.

Find all events which should be processed.

Process these events: update SQL-storage and keep

unprocessed events (they should be processed on the next

iteration).

Update time of last synchronization to

startSynchronizationTime in SQL-storage.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Periodic synchronization action

Set startSynchronizationTime = NowTicks.

Find all events which should be processed.

Process these events: update SQL-storage and keep

unprocessed events (they should be processed on the next

iteration).

Update time of last synchronization to

startSynchronizationTime in SQL-storage.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

ProcessEvents(Event[] events)

This function actualizes values of related objects in SQL-storage.

Remember, that we update object after creating an event.

So, we can not process some of events at the moment, because

correspoding object isn't updated yet.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

ProcessEvents(Event[] events)

This function actualizes values of related objects in SQL-storage.

Remember, that we update object after creating an event.

So, we can not process some of events at the moment, because

correspoding object isn't updated yet.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Event[] ProcessEvents(Event[] events)

This function actualizes values of related objects in SQL-storage

and returns events, which have not been processed.

How will this function be implemented?

For every event we should analyze corresponding objects from both

Cassandra and SQL-storage.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Event[] ProcessEvents(Event[] events)

This function actualizes values of related objects in SQL-storage

and returns events, which have not been processed.

How will this function be implemented?

For every event we should analyze corresponding objects from both

Cassandra and SQL-storage.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Event[] ProcessEvents(Event[] events)

This function actualizes values of related objects in SQL-storage

and returns events, which have not been processed.

How will this function be implemented?

For every event we should analyze corresponding objects from both

Cassandra and SQL-storage.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Example 1

event = {Timestamp: 2008}

cassObj = {Timestamp: 2008, School: 'USU'}

sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}

What should we do?

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Example 1

event = {Timestamp: 2008}

cassObj = {Timestamp: 2008, School: 'USU'}

sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}

Write cassObj in SQL-storage and mark event as processed.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Example 1

event = {Timestamp: 2008}

cassObj = {Timestamp: 2008, School: 'USU'}

sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}

Write cassObj in SQL-storage and mark event as processed.

Example 2

event = {Timestamp: 2012}

cassObj = {Timestamp: 2008, School: 'USU'}

sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}

What should we do?

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Example 1

event = {Timestamp: 2008}

cassObj = {Timestamp: 2008, School: 'USU'}

sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}

Write cassObj in SQL-storage and mark event as processed.

Example 2

event = {Timestamp: 2012}

cassObj = {Timestamp: 2008, School: 'USU'}

sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}

Timestamp of event is greater than timestamp of cassObj.

Probably, it needs to wait for updating of object.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Example 1

event = {Timestamp: 2008}

cassObj = {Timestamp: 2008, School: 'USU'}

sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}

Write cassObj in SQL-storage and mark event as processed.

Example 2

event = {Timestamp: 2012}

cassObj = {Timestamp: 2008, School: 'USU'}

sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}

Timestamp of event is greater than timestamp of cassObj.

Probably, it needs to wait for updating of object.

Write cassObj in SQL-storage and mark event as unprocessed.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Example 3

event = {Timestamp: 1997}

cassObj is missing

sqlObj is missing

What should we do?

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Example 3

event = {Timestamp: 1997}

cassObj is missing

sqlObj is missing

Probably, that event corresponds to the creation of object.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Example 3

event = {Timestamp: 1997}

cassObj is missing

sqlObj is missing

Probably, that event corresponds to the creation of object.

Mark event as unprocessed.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Example 3

event = {Timestamp: 1997}

cassObj is missing

sqlObj is missing

Probably, that event corresponds to the creation of object.

Mark event as unprocessed.

Example 4

event = {Timestamp: 2017}

cassObj is missing

sqlObj = {Timestamp: 2012, School: 'UFU'}

What should we do?

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Example 3

event = {Timestamp: 1997}

cassObj is missing

sqlObj is missing

Probably, that event corresponds to the creation of object.

Mark event as unprocessed.

Example 4

event = {Timestamp: 2017}

cassObj is missing

sqlObj = {Timestamp: 2012, School: 'UFU'}

Two cases are possible:

1 That event corresponds to the deletion of object.

2 That event corresponds to the creation of object. sqlObj is

not missing, because there were two operationsin a row: delete

and create.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Example 4

event = {Timestamp: 2017}

cassObj is missing

sqlObj = {Timestamp: 2012, School: 'UFU'}

Two cases are possible:

1 That event corresponds to the deletion of object.

2 That event corresponds to the creation of object. sqlObj is

not missing, because there were two operationsin a row: delete

and create.

Delete sqlObj from SQL-storage and mark event as unprocessed.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Event[] ProcessEvents(Event[] events)

Read objects, which occured in these events, from Cassandra and

SQL-storage (some of them can be missing).

For each (event, cassObj, sqlObj) do

If cassObj is not missing

Save cassObj in SQL-storageIf event.Timestamp <= cassObj.Timestamp

then mark event as processed;

else mark event as unprocessed.

else (i.e. cassObj is missing)

Delete sqlObj from SQL-storage if it's not missing.

Mark event as unprocessed.

Return events which has been marked as unprocessed.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Periodic synchronization action

Set startSynchronizationTime = NowTicks.

Find all events which should be processed.

Process these events: update SQL-storage and keep

unprocessed events (they should be processed on the next

iteration).

Update time of last synchronization to

startSynchronizationTime in SQL-storage.

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

What events should we use as arguments in ProcessEvents

function?

Of course, all unprocessed events from previous iteration.

Also all new events, i.e. IEventLog.GetEvents(fromTicks).

What is fromTicks?

fromTicks = lastSynchronizationTime?

No. Unfortunately, any operation with Cassandra can be

executed for a long time.

This time is limited by

writeTimeout = attemptsCount · connectionTimeout.

We should make undertow back, otherwise we can lose some

events.

fromTicks = lastSynchronizationTime - writeTimeout

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

What events should we use as arguments in ProcessEvents

function?

Of course, all unprocessed events from previous iteration.

Also all new events, i.e. IEventLog.GetEvents(fromTicks).

What is fromTicks?

fromTicks = lastSynchronizationTime?

No. Unfortunately, any operation with Cassandra can be

executed for a long time.

This time is limited by

writeTimeout = attemptsCount · connectionTimeout.

We should make undertow back, otherwise we can lose some

events.

fromTicks = lastSynchronizationTime - writeTimeout

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

What events should we use as arguments in ProcessEvents

function?

Of course, all unprocessed events from previous iteration.

Also all new events, i.e. IEventLog.GetEvents(fromTicks).

What is fromTicks?

fromTicks = lastSynchronizationTime?

No. Unfortunately, any operation with Cassandra can be

executed for a long time.

This time is limited by

writeTimeout = attemptsCount · connectionTimeout.

We should make undertow back, otherwise we can lose some

events.

fromTicks = lastSynchronizationTime - writeTimeout

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

What events should we use as arguments in ProcessEvents

function?

Of course, all unprocessed events from previous iteration.

Also all new events, i.e. IEventLog.GetEvents(fromTicks).

What is fromTicks?

fromTicks = lastSynchronizationTime?

No. Unfortunately, any operation with Cassandra can be

executed for a long time.

This time is limited by

writeTimeout = attemptsCount · connectionTimeout.

We should make undertow back, otherwise we can lose some

events.

fromTicks = lastSynchronizationTime - writeTimeout

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

What events should we use as arguments in ProcessEvents

function?

Of course, all unprocessed events from previous iteration.

Also all new events, i.e. IEventLog.GetEvents(fromTicks).

What is fromTicks?

fromTicks = lastSynchronizationTime?

No. Unfortunately, any operation with Cassandra can be

executed for a long time.

This time is limited by

writeTimeout = attemptsCount · connectionTimeout.

We should make undertow back, otherwise we can lose some

events.

fromTicks = lastSynchronizationTime - writeTimeout

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

What events should we use as arguments in ProcessEvents

function?

Of course, all unprocessed events from previous iteration.

Also all new events, i.e. IEventLog.GetEvents(fromTicks).

What is fromTicks?

fromTicks = lastSynchronizationTime?

No. Unfortunately, any operation with Cassandra can be

executed for a long time.

This time is limited by

writeTimeout = attemptsCount · connectionTimeout.

We should make undertow back, otherwise we can lose some

events.

fromTicks = lastSynchronizationTime - writeTimeout

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

What events should we use as arguments in ProcessEvents

function?

Of course, all unprocessed events from previous iteration.

Also all new events, i.e. IEventLog.GetEvents(fromTicks).

What is fromTicks?

fromTicks = lastSynchronizationTime?

No. Unfortunately, any operation with Cassandra can be

executed for a long time.

This time is limited by

writeTimeout = attemptsCount · connectionTimeout.

We should make undertow back, otherwise we can lose some

events.

fromTicks = lastSynchronizationTime - writeTimeout

Indexing Cassandra data in SQL-storage

Synchronizing SQL-storage with Cassandra

Implementation of IndexService

Executing search request

Indexing Cassandra data in SQL-storage

Advantages.

Scalability.

Availability.

Fault tolerance.

Sharding.

Indexing Cassandra data in SQL-storage

Advantages.

Scalability.

Availability.

Fault tolerance.

Sharding.

Indexing Cassandra data in SQL-storage

Questions

Thank you for your attention. Any questions?

top related