edh offloading
DESCRIPTION
EDH offloading by Sunil SitaulaTRANSCRIPT
CONFIDENTIAL - RESTRICTED‹#› CONFIDENTIAL - RESTRICTED
EDH Off-loadOctober 15, 2014
CONFIDENTIAL - RESTRICTED‹#›
Agenda
• What does it mean • Why • Approaches • Things to consider • Questions
CONFIDENTIAL - RESTRICTED‹#›
What does it mean..
data applications users
.. from existing system (enterprise data warehouses) to Cloudera Enterprise Data Hub (EDH)
CONFIDENTIAL - RESTRICTED‹#›
Why....a number of reasons... .. Cost .. Flexibility – structured/un-structured
CONFIDENTIAL - RESTRICTED‹#›
Approaches..
.. Specific .. Use Case .. Application .. Partial .. Full
CONFIDENTIAL - RESTRICTED‹#›
Specific..
.. This is the way to start.. .. Pick a use case or small to medium non-critical application .. End-to-end
CONFIDENTIAL - RESTRICTED‹#›
Why Specific..
.. Reveal ah-ha moments
.. Gain experience
.. Iron out support, operations, admin, issues .. In some cases, complete switch may not be feasible, still do end-to-end but feed needed data back to old system
CONFIDENTIAL - RESTRICTED‹#›
Partial..
.. Now that there is in-house experience and expertise built, focus on extending the migration effort to other areas .. Follow the same pattern, end-to-end
CONFIDENTIAL - RESTRICTED‹#›
Full..
.. In some cases a full off-load may be feasible .. But don’t fool yourself .. Existing systems might have been there for years .. May have 100s of TB, hundreds of databases, thousands of tables, views, stored procs, scripts, macros, workflows, reports and dozens of apps pointed to it.. .. This may entail finishing lots of partial offloads staged, verified, and ready to go before a full migration
CONFIDENTIAL - RESTRICTED‹#›
Planning..
.. How to keep existing systems in sync .. Feedback/keep-alive loop ..Processed data may need to be pumped back and forth .. Keeping ID’s in sync (deciding system of record) .. Impact on existing environment
.. While migrating existing data
.. While keeping old and new system in sync
.. Number of connections
CONFIDENTIAL - RESTRICTED‹#›
Sqoop..
.. Will help significantly in both migrating data as well schemas .. Automate as much as possible
.. Give script a DB.. list of tables or ones to avoid and have it take care of the rest
.. But will still involve manual touch points .. Data types .. Not all data types maybe supported .. Mappings .. Connectors – go through options properly
CONFIDENTIAL - RESTRICTED‹#›
Key take ways..
.. Start with specific use case
.. Identify dependencies and keep alive processes
.. Avoid scope creep.. Oh no we need that dataset too. .. Engage developers, testers business owners early .. Could be complex but done properly could result in significant savings, flexibility and new capabilities..
CONFIDENTIAL - RESTRICTED‹#› CONFIDENTIAL - RESTRICTED
Questions