d maeda bi portfolio

20
Business Intelligence Portfolio David N. Maeda [email protected] 919-606-5772

Upload: dmaeda

Post on 01-Jul-2015

440 views

Category:

Documents


0 download

DESCRIPTION

Personal Portfolio

TRANSCRIPT

Page 1: D Maeda Bi Portfolio

Business Intelligence

Portfolio

David N. Maeda

[email protected]

919-606-5772

Page 2: D Maeda Bi Portfolio

In the Beginning …

• “Put all your eggs in one basket, and … watch the basket.”

Mark Twain

• “Data is only valuable if it can be accessed in a timely fashion.”

An IMS/DC Axiom

Page 3: D Maeda Bi Portfolio

Table of Contents

• An Introduction

• A Problem Sampler

– Diagnostician at Play

– A Little Dirty Data

– A SQL Query

• SSIS and ETL Options

– SSIS and Data Management

• BIDS, SSAS, and MDX

– New Tools, Growing Arsenal

• At Your Service …

Page 4: D Maeda Bi Portfolio

David Maeda: An Introduction

• Completing an intense 10 week course on Microsoft Business Intelligence technologies, i.e. SQL Server, T-SQL, SSIS, SSAS, SSRS, and Visual Studio interfaces.

• Broad background in IT including expertise in database and transaction management systems.

• Experience includes leadership and project management positions.

• An accomplished diagnostician and software engineer.

Page 5: D Maeda Bi Portfolio

Diagnostician At Play

• Earlier this year, I got a good deal on a nice fly reel intended for 9 and 10 weight lines. While using the reel for striped bass on the Roanoke River several weeks later, I noticed that the drag did not tightened down to a point where it was effectively useful.

• An exchange of emails with the US distributor got me a new one way clutch bearing but it did not fix the issue.

• Examining the parts diagram for the reel, I decided to add a 7 cent wave lock washer to the drag assembly. Tested reel on the Roanoke. Problem resolved.

• Notified the distributor. After an evaluation, the fix was adopted by the manufacturer several days later.

Page 6: D Maeda Bi Portfolio

A Little Dirty Data Problem

• In dealing with a national organization, membership information was found to have the following issues:– 30% to 60% of the email address were bad

– 10% of the regular mail addresses were bad

– Inconsistent data formats in downloaded CSV files

– Multiple entries per member

• The Problem: How to work around the “questionable” data and maintain effective membership communications with the following criteria:– Minimize expenses

– On average, needs less than 4 hours per week to manage

Page 7: D Maeda Bi Portfolio

A Little Dirty Data Problem

• The Solution:

o Design a database to allow

downloads to update

existing data without

affecting “local” data.

o The Members table is what

gets downloaded.

o The MemberExtension

table is the repository for

“local” data.

o Manage both tables via a

web based user interface

(UI).

o UI is implemented with

PHP and JavaScript.

o Automate as much as

possible.

Page 8: D Maeda Bi Portfolio

A Little Dirty Data Problem

• Implementation:– A Nasty Surprise: CSV Data as downloaded would not import cleanly

into MySQL. This was due to MySQL load data infile processing requiring certain characters to be escaped.

• A short Java script was written to transform the downloaded CSV file into the necessary format prior to importing it into MySQL.

– Any downloaded data is considered “questionable”.

• MySQL load data infile processing overlays existing records.

• Restrict downloaded updates to only affect the Members table.

– The Members and MemberExtension tables are synchronized as part of the update process invoked from the UI.

• Every Members entry has a corresponding MemberExtension entry.

• A new MemberExtension will be created if necessary and initialized with date and email info if present.

• Existing MemberExtension entries are not touched.

Page 9: D Maeda Bi Portfolio

A Little Dirty Data Problem

o A Utilitarian UI• Apache

• HTML Frames

• AJAX

• PHP

Page 10: D Maeda Bi Portfolio

A Little Dirty Data Problem

• In Summary:– We were able to circumvent most of the dirty data issues by isolating

the “questionable” data.

– The MySQL RDBMS supports ad hoc SQL queries should the necessity to alter tables, etc arise.

– Expenses were minimized by:

• Using freely available components, i.e. Java, Apache 2.2, PHP 5, MySQL 5.2, and JavaScript.

• Using volunteer labor to write the ETL code.

– A download and update sequence takes less than 10 minutes.

– A typical request to update the email distribution takes less than 5 minutes.

– Managing the database and generating the necessary distribution lists via the provided UI takes typically less than 4 hours per week.

Page 11: D Maeda Bi Portfolio

A SQL Query

• On a recent phone interview, I was asked:

– How would you construct an SQL query to find the second highest sales total?

• My answer was:

– Use a pair of nested queries. The inner query would ascertain the top 2 totals. The outer query would return the lower of the two totals.

• In T-SQL this looks something like (It may look somewhat different in other SQL dialects):

select top 1 orderid, (unitprice * quantity) as 'totalsale'

from [order details] where (unitprice * quantity) in

(

select top 2 (unitprice * quantity) as 'ordertotal'

from [order details]

group by (unitprice * quantity)

order by ordertotal desc

)

order by totalsale asc

Page 12: D Maeda Bi Portfolio

ETL Options and SSIS

o All CSV files are not

created equal. Neither are the

ETL tools used to prepare

and load them into a

database. Compare:

o To the left is a more

traditional approach (as used

for the Dirty Data problem).

o To the right is an approach

utilizing Microsoft’s SSIS

facility.

o SSIS has Data Management

applications beyond ETL.

package appCSV;

import java.io.*;

import java.util.StringTokenizer;

/**

* @author Dave Maeda

*

* Class to convert csv field form

*

* Invoke as: java appCSV.Convert

*

* Where: filename is the name of

* ext is the file extension.

*

* Output: A file named <filename>.

* Note: ext will default to "csv" if

*/

public class Convert

{

private static void usage()

{

System.out.println("\n");

System.out.println(" >> Usage:

Page 13: D Maeda Bi Portfolio

Data Management 101: DID

• Three basic principles:

– Disclosure

• Viewing of data– Who’s viewing your data and are they authorized to do so?

– Integrity

• Accuracy and currency of data– Data is only meaningful if it is accurate and up to date.

– Durability

• Data loss prevention– More data is lost to accidents than malicious actions.

Page 14: D Maeda Bi Portfolio

BIDS, SSAS, and MDX

o Business Intelligence Design Studio (BIDS)

• Ships as part on MS SQL Server

o SQL Server Analysis Server (SSAS)

• OLAP store and engine

• Builds multi-dimensional cubes

o Multi-Dimensional eXpressions (MDX)

• Used to retrieve cube data

• Used in SSAS Calculations and KPIs

Page 15: D Maeda Bi Portfolio

SSRS

o Web Enabled

• Report Management

• Distribution

o Charts

• Conditional Fonts

• Calculated Members

• Multiple Charting Options

• Custom Colors

o Tables

• Multiple Formatting Options

• Data

• Calculated Members

• Conditional Fonts

Page 16: D Maeda Bi Portfolio

MOSS, PPS, Dashboards, and KPIs

o MOSS

• SharePoint Server

o PPS

• PerformancePoint Server

o Dashboard

• Scorecard

o KPIs

• Parameters

• Values

• Goals and Status

• Trends (not shown)

Page 17: D Maeda Bi Portfolio

Excel Services

o Excel Local Client

• Parameters

• Pivot Table

• Associated Chart

o Excel Services

• MOSS

• PPS Dashboard

• PPS Report

Parameters

Chart

Page 18: D Maeda Bi Portfolio

New Tools, Growing Arsenal

• Latest additions: BIDS, SSIS, SSAS, SSRS, and MDX

• Arsenal already includes:

– OS platforms: z/OS, Windows, Unix (AIX and Sun), and Linux (Red Hat and SUSE)

– Databases: IMS, DB2, Oracle, MySQL, and SQL Server

– Languages: Assembler (IBM and Intel), C/C++, Java, JavaScript, PHP, Smalltalk, SQL, and REXX.

– Core competencies: Leadership, process improvement, team facilitation, interpersonal communications, client relations, and project management.

Page 19: D Maeda Bi Portfolio

At Your Service …

• David Maeda

– Software Engineer

• Business Intelligence Analyst

• Diagnostician/Programmer

– Hard working and Persevering

• Personal Integrity and High Standards

– Team Leader and Team Player

• “Your prime directive as a leader is to position your team for success.”

Page 20: D Maeda Bi Portfolio

The End