seattle bi meetup bi & etl @ big fish april 2 nd, 2014 emre motan
TRANSCRIPT
About the Speaker
Emre Motan BI Engineer, Big Fish Been in Seattle 1.5 years, previously in
Chicago Involved in BI community
Chicago SQL BI PASS chapter UW BI Certificate Program BI Over Beers TDWI
Random: Basketball, co-rec sports, greyhounds
About the Seattle BI Meetup
Started in 2012, I took over after period of inactivity
Meetups will be monthly Primary goal is to educate
Topics will be wide-ranging but more technical
Networking is encouraged Speakers will be people who use
technologies
About BI Meetup (cont.)
Looking to develop relationships with willing speakers, venues, and sponsors
Desire is to have meetings in different venues each month
As part of hosting, it would be nice to “sponsor” with food & drink
About Big Fish
World’s largest producer of casual games
Core business used to be PC-based “Hidden Object” games
Now pushing deep into Mobile space with titles such as Big Fish Casino and Fairway Solitaire
Today’s Story
BI/DW Implementation ETL Framework ETL Development BIML
BIDS Helper Mist BIML IDE w/ Hadron BIML Compiler
Summary
What do we want from our ETL? Minimize manual coding errors Minimize time spent on boilerplate Support agile software development
practices Use software development best
practices Pair / collaborate Source control / diff changes Auto-generate / automate
Reduce effort spent on operations and support
ETL Framework Overview
Cycles (subject areas, master package)
Jobs (restart-able units of work, package)
Steps (logic units, Data Flow / Execute SQL / …)
12
Job AStep 1
Step 2
Step 3
Job BStep 4
Step 5
Step 6
Job CStep 7
Step 8
Step 9
Job DStep 10Step 11Step 12
Job EStep 13Step 14Step 15
CycleLog Start
Log Success
Do Work Log Fail
13
1st RunJob AStep 1
Step 2
Step 3
Job BStep 4
Step 5
Step 6
Job CStep 7
Step 8
Step 9
Job DStep 10Step 11Step 12
Job EStep 13Step 14Step 15
Success
FailureDidn’t Run
14
2nd RunJob AStep 1
Step 2
Step 3
Job BStep 4
Step 5
Step 6
Job CStep 7
Step 8
Step 9
Job DStep 10Step 11Step 12
Job EStep 13Step 14Step 15
Completed Successfullylast run; Skip Next Time
Rerun on Failure
Choose to Retry because of a logicaldependency
Choose to SkipRun because it didnot run last time
Example Job Flow
Data Warehouse
Staging ODS Reporting Layer
Stg_sales
Stg_customers
Stg_products
Stg_payment_methods
…
ods_sales
ods_customers
ods_products
ods_payment_methods
…
fact_sales
dim_customers
dim_products
dim_payment_methods
…
Source Systems
Ecommerce
ELT Server
Extract/Load
ETL Framework Summary
Framework is injected at compile time Uses SQL Server 2008 R2 via stored procedures Logical units automatically logged Event handlers automatically added Metadata based alterations for flow control (skip/restart)
Metadata based balances and validation scripts to detect warning/error conditions
Variables and values stored for use in ETL Metadata based run-time Alterations for flow control
(e.g. skip job, skip step) Custom tools to administer ETL infrastructure and
metadata
Language and Compiler
BIML (BI Markup Language) Lightweight XML dialect Represents SQL Server BI Stack objects
(SSIS, SSAS, SQL Server) Works like ASP.NET / PHP (combines
declarative & imperative language) Hadron
Compiles BIML to SQL Server BI Stack artifacts
Called via MSBuild, Mist, …
Tools
BIDS Helper Free, open-source extension to BIDS Code in BIML, then generate SSIS Subset of functionality
Mist IDE Graphical & Text Based Editors Transformers Extensions
Big Fish BI Engineering
We integrate a wide variety of data sources
We don’t develop in BIDS/SSIS We code in BIML, compile in Mist or via
Hadron directly
BI Engineering ETL Development Flow
Develop BIML locally, committed to SVN Generate most of the code besides
business logic Run code validations before / during
compile Compile BIML during development or on
deployment to ELT boxes Produce SSIS packages Handle pushing to target environment
Kick off Cycles using Job Scheduler or DtExec
Demo: Simple SSIS Package
Demo of BIMLScript1.biml Show BIDS environment Show BIML Generate SSIS Run SSIS
Demo: Programmatic BIML
Demo of BIMLScript2.biml Introduce .NET addition to BIML script Describe what we’re doing with getting
tables from DB Describe how we’ll loop over each table,
and then each column of table, to generate insert commands
Generate SSIS Run SSIS
Demo: Mist Visual Designer
Show audience visual designer of one job
Select elements to see visual designer We don’t use visual designer very often
since most code is auto-generated now and we have established patterns
Demo: Mist Project
Show Mist environment with sample cycle
Show cycle file with one job Show job file Show metadata for sample table (source,
ODS) Show Extensions
Demo: Auto-generating ETL
One substantial accelerator of our work is auto-generating ETL for new extracts, loads, and processing
BIML representation of table Columns, business keys, primary keys, data types Annotations like ETL pattern required (full load,
incremental new, incremental new/updated) Only need to code transformation logic, all
boilerplate is auto-generated BimlScript to autogenerate boilerplate ETL code
Why did Big Fish choose BIML? Non-standard technology needs
(extensibility) Ease of developing and maintaining ETL to
leave more time for high business value work Plenty of people with SSIS experience in
Seattle Cost effective Organization already supported SQL Server
and SSIS Happy developers