big data and mstr bridge the elephant

Post on 22-Nov-2014

1.034 Views

Category:

Technology

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation: “Big Data and MicroStrategy: Building a Bridge for the Elephant” Intelligent engineering of an agile business requires the ability to connect the vast array of requirements, technologies and data that build up over time, while avoiding the pitfalls commonly encountered on the road to giving users comprehensive, yet nimble business analytics with MicroStrategy. The Google generation armed with iPads, Droid Phones bring big bold ideas on how “Big Data” will solve the new wave of business problems; traditional users know that addressing them requires more than just embracing the buzzwords like “sentiment”, “R” and “Hadoop.” Overall success requires building a bridge between the stable, proven, mature BI solutions in place today with the disruptive new world. Enabling deeper analytics, predictive modeling, social media analysis in combination with scalable self-service dashboards, reporting and analytics is no longer an idea but a MUST DO. This informative presentation describes these business challenges and how an organization leveraged the Kognitio Analytical Platform under MicroStrategy to build such a bridge.

TRANSCRIPT

Big Data and MicroStrategy: Building a Bridge for the Elephant

Jan 2013Paul Groom, Chief Innovation Officer

Let’s start at…

The End.

Panacea

You…built the DWE

You…built the BICC

and yes you built… lots of cool reports and dashboards

EpilogueA comfortable status quo

How are you really judged?

• Fast?• Consistent?• All users?

Rrrrrriiiiiiinnnnnngggggg!

Back to the real world

Disruption

Disruptor: New Data

Disruptor: Social Media & Sentiment

Data ?

Disruptor:

Disruptor: More Connected Users

Disruptor: Data Discovery Tools

Choices for engaging quickly with data

Business users head’s distracted from core BI!

BI Wild West

Where it matters

Lots of variety of DW and EDW

analytical workload

The Reality of the DW

EDW says no or not now!…and CFO says no big upgrades

Pragmatism

…ok so you enable plenty of caching,limit drill anywhere and add Intelligent Cubes

And then came…

http://oris-rake.deviantart.com/

BoonDistraction

or

Scalable, resilient, bit bucket

Experimenting

© 20th Century Fox

The Hadoop stack

HDFSHDFS

HB

ase

HB

ase

MapReduceMapReduceO

ozie

Ooz

ie

ZooK

eppe

r/ A

mba

riZo

oKep

per/

Am

bari

HCatalogHCatalog

PigPig HiveHive

Hadoop Performance Reality

• Hadoop is batch oriented• HDFS access is fast but crude• MapReduce is powerful but has overheads

– ~30 second base response time– Too much latency in stack and processing model– Trade-off in optimization and latency

• MapReduce complex– Typically multiple Java routines

https://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920

SQL to the Rescue• So MapReduce is complicated

HDFSHDFS

HB

ase

HB

ase

MapReduceMapReduce

Ooz

ieO

ozie

ZooK

eppe

r/ A

mba

riZo

oKep

per/

Am

bari

HCatalogHCatalog

PigPig HiveHive

– use Hive (SQL) as the easy way out

Hive• Simplifies access

“Hive is great, but Hadoop’s execution engine

makes even the smallest queries take minutes!”

• Only basic SQL support• Concurrency needs careful system admin• It’s not a silver bullet for interactive BI usage

Hadoop just too slow for interactive BI!

…loss of train-of-thought

Conclusion

“while hadoop shines as a processing

platform, it is painfully slow as a query tool”

Hive is based on Hadoop which is a batch processing system. Accordingly, this system does not and cannot promise low latencies on queries. The paradigm here is strictly of submitting jobs and being notified when the jobs are completed as opposed to real time queries. As a result it should not be compared with systems like Oracle where analysis is done on a significantly smaller amount of data but the analysis proceeds much more iteratively with the response times between iterations being less than a few minutes. For Hive queries response times for even the smallest jobs can be of the order of 5-10 minutes and for larger jobs this may even run into hours.

I remain skeptical on the practical performance of the Hive query approach and have yet to talk to any beta customers. A more practical approach is loading some of the Hadoop data into the in-memory cube with the new Hadoop connector.

Why can’t Hadoopbe in-memory?Why can’t I have a

giant icubes?

Lots of these

Not so many of these

Remember…

Hadoop inherently disk oriented

Typically low ratio of CPU to Disk

Larger cubes

Issues: Time to Populate, Proliferation

Analytics requires CPU,RAM keeps the data close

Alternative - In-memory Processing

Cores do the work!Scale with the data

Goals: Minimise Disruption, Cut Latency

• Don’t change the existing BI and analytics• Support more creative and dynamic BI• Don’t introduce yet more slow disk

– Help the DW investment• No complex ETL, just pull data as required• Pull data simply and intelligently from Hadoop• Simplify – less cubes, caches• Improve sharing of data• Increase concurrency and throughput

– Its all about queries per hour!• Minimal DBA requirement

Kognitio Hadoop Connectors

HDFS Connector• Connector defines access to hdfs file system• External table accesses row-based data

in hdfs• Dynamic access or “pin” data into memory• Selected hdfs file(s) loaded into memory

Filter Agent Connector• Connector uploads agent to Hadoop nodes• Query passes selections and relevant

predicates to agent• Data filtering and projection takes place

locally on each Hadoop node• Only data of interest is loaded into memory

via parallel load streams

Centrally defined data modelsPersist data in natural storeFetch when needed, agileAvailable to all tools

Analytical power

BI – Central Governance

Engineering for Success

Thomas Herbrich

connect

www.kognitio.com

twitter.com/kognitiolinkedin.com/companies/kognitio

tinyurl.com/kognitio youtube.com/kognitio

NA: +1 855  KOGNITIOEMEA: +44 1344 300 770

top related