presenter: charles krajewski, colby-sawyer 6/18/2014 college · apologies to david letterman –...
TRANSCRIPT
1
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
Introduction:
Scope of talk – meant for across the board users; IT and operational
Keeping the ‘tech talk’ to a minimum; both types IT and User
Definitions – there are many definitions for this topic– data mart/warehouse
ODS, etc. some conflict – it’s a matter of particular approach, but the end product,
whatever you call it, should meet the data reporting and analytical needs of the user
departments that will rely on it
This presentations definitions – ‘Data Repository’ -> the entire database
designed to span all the reporting departments; ‘Data Warehouse’ -> the portion of the
Repository that addresses a particular ‘business process’ (i.e. student academics, Alumni
relations, etc). Not necessarily a given department within the institution
2
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
Apologies to David Letterman – the top 5 reasons why users need a Data Repository
#5 – IT won’t have to do all those nasty application integrations –
a. yes you will, those integration between PC and PF, between PC, PF, GP will
all still be there
b. application releases will still be part of you life
c. Well, with a data repository there will be yet one more database to
maintain and integrate; however, this one will be of your own design (if done according to
YOUR school’s needs); the access rights will be focused on the data here and not
necessarily into the application databases; and the repository will be (mostly) free of the
foibles of the integrations provided by the applications (i.e. ‘The Bridge’)
#4 – It will ‘empower’ the users to build their own reports/charts
a. A data repository, at base, is still a database with all the technical
challenges of an application; except the DR will absorb a lot of the technical detail (i.e.
filtering criteria, access rights, etc.) the an application database has.
b. Remember: an application database is built to collect, edit, and store
data; the DR is built for ease in reporting
c. Besides is it the best use of your other-technically skilled people’s time to
have to be trained in the intricacies of database management– does your CFO really need
to know about left outer joins?
d. From an IT point of view the reporting function will still exist, but it will
be easier.
3
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
#3 – FINALLY we’ll have a single information source to answer all our questions!
a. The issue here is not so much the answer but the question. A DR provides
the data needed TOWARD answering a question, but we will still be faced with formatting
the question.
b. A real benefit from the building of the DR will be in the meeting to agree
on the terms we use– the “Language of the Institution”
#2 – We need to be competitive with ‘Jones’ University’!
a. The DR will NOT fix your retention rates, your slumping annual fund giving,
or get that one big endowed gift.
b. The DR only provides the data that can be pulled together in a meaningful
way, analyzed, and present the possibilities– the opportunities and pitfalls of past actions. It
can not act on its own.
c. From the data you can see trends, but from there you will need to adjust
your organizational way of working to take advantage of the insights.
#1 – It’s so WAY COOL!!
a. Everybody LOVES that chart and graph that lays out exactly how we’re
doing.
b. Drooling over a piece of that pie chart; bellying up to that bar graph;
wrapping your arms around those ample pivot charts
c. The DR is the data provider; your charting/graphing choices are only the
benefit of having a centralized data store
d. Beware what’s worse than ‘Garbage in Garbage out’ look out for ‘Garbage
in Gospel out’ the flash and dazzle could blind you to inadequate and/or inaccurate data
e. BUT with a DR correctly built and vetted for accuracy and precision a well-
designed and placed chart/graph/spreadsheet can be a thing of beauty
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
3
The President’s Initiative
The Concerns #1– senior management
a. The Plan will require a new way of looking at the data to determine ‘How
are we doing?’
b. Not sure what measures will be needed
The Concerns #2– operationally
a. The complexity of reporting only getting worse
b. Interestingly, the access to data was creating a situation where technical
users were creating their own reports that should be produced elsewhere resulting in
inaccurate reporting
c. Report sharing – using reports for purposes that they were NOT designed
for
d. External reporting changes and new reporting requirements was pointing
out a need for a more efficient reporting method
4
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
Membership and Roles:
a. Structure of PCUG at Colby-Sawyer – a member of each operational
department with ‘one foot in IT and the other in the user area;’ meet bi-weekly to discuss
the use of PC– new options, upgrades, reporting requirements, etc.
b. PCUG formed the core of the committee
c. Added the IT –DBA; Director of IR, and other users of the reporting but
not necessarily PC users
Statement of Understanding the ‘Project Charter’:
a. Describes the scope of the project, project/task process, reporting
b. A written document in the ‘Language of the School’ – use the terms and
definitions that would be understood by your colleagues; stay away from technical jargon.
It doesn’t help if the charter is written in such a way that only a single
individual/department can understand it.
c. Part of the charter will describe how the DR will be built but ALSO how it
will be maintained! Without constant attention to what the DR holds and HOW it holds the
data it will soon fall to disuse and obsolescence.
Reporting:
a. This is a project that will influence the campus-wide reporting and
analysis into the future; it will start to influence how you even talk-the-talk of the
institution. As the charter needs to be a public document.
b. Not only the charter but the progress of the project should be public– it
5
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
does no good if you make the ‘grand announcement’ and then go into some dark
development corner and emerge with some product that people are supposed to trust and
use.
c. Reporting will keep the project in front of the users; no mysteries here
Institutional Support:
a. Time, Talent, Treasure:
b. Time: schedule regular meetings to keep the project committee wheels
turning, attendance must be made a priority. BTW: this is not going to be a quick and easy
project, most documents indicate that getting a DR off the ground is a 3-5 year commitment.
c. Talent: this is a time to garner the most experienced and skilled people you
have to insure that the knowledge that builds the DR will be the best
d. Treasure: this will cost not only dollars for the technical infrastructure, but
in the time away from other duties spent by you committee members
e. Senior Management support – keep the committee members feet to the
fire of attending and participating in the DR project; cut them the slack needed to contribute,
ask for reporting back from the meetings and keep engaged in their work.
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
5
Identify the building blocks for the Data Repository:
a. This is a gathering/organizing phase and NOT a design phase
b. It’s helpful to track and name the different business processes that each
department executes. For example: in Admissions look at the candidate funnel process–
what distinct steps do you go through to get some one in the door. In Advancement– name
the areas that address Alumni relations, Major gift stewarding, etc.
c. Identify the areas not through what setflows are used but how you collect
data and how you report data
d. Each one of these identified processes is a candidate for becoming a
distinct subsection of the DR– these will be your datamarts
Collect the data ‘end points’
a. An ‘end point’ is simply a place where your collected data will show up in
the decision making and reporting process anywhere in the institution
b. Look at:
your internal (i.e. departmental) reporting – the reporting that keeps
your department running
interdepartmental reporting – those reports that you’re constantly
sending to other department to keep your departments coordinated and allow the other to
do THEIR job
Senior Management – what reports are you sending up the ‘food
chain’ so they can make their decisions
External reporting – this is leagend IPEDS, VSE, NSLC, CASE, NEASC,
6
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
FastFacts, etc. Reports that your IR people are creating to report out the status of the
institution
c. Look for those manual reports, spreadsheets, post-it notes that are being
kept in ‘shadow systems’ – those extras that are used to make decisions
Selecting the software platform:
a. The database structure, development language product, and reporting tool
b. Over looked and as such can be ignored and ‘tools of convenience’ slip in
to create a DR reporting/development coordination nightmare.
c. This is where your DBA / IT members will be most helpful!
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
6
Select from the datamarts a sample mart that will be used to proto type your development
efforts.
Keep it:
Small
Well defined
Contained
Public
Remember this first one will be a model for the development process for what follows.
Once decided upon, determine which data ‘end points’ are utilized and decide on a sample
data set that will be used in the development process.
The ‘data set’ should include a typical reporting group of data (i.e. an academic year’s
worth of students; a fiscal year’s worth of giving, etc.); nothing too large or you’ll get
bogged down in seas of data when you want to verify the accuracy of your datamart
development procedures. Part of the data set should also include a sampling of your
‘problem children’ – any data that you know you will come across but are very unusual in
content (i.e. double majors, Alumni with more than one class year, gifts through a DAF, etc.)
these will help stress test your developing system.
7
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
The Bill of Materials
Collect the datamarts data elements:
a. When labeling/describing your data elements do not use application-
specific terms; use the ’language of the school’
b. Application talk will come
c. Don’t try to merge terms across end-points yet, do it when you’ve
identified the complete set
Collate the collected elements:
a. NOW combine elements. Look for
Elements with different names that describe the same data
Elements with the same name that describe different data
b. When labeling the elements for the DR work across departments so that
you’re naming will work no matter what department is using the term; this can be a
challenge– we all love our terms!
Construct the Data Dictionary:
a. Select a method for collecting and storing your data elements– excel,
share point, home-built database (we use an Access database for this purpose), etc.
b. Whatever it is make it THE place to go for your definitions– the arbiter
c. Make a member of the project committee the ‘keeper’ of the dictionary–
funnel the definitions, etc. through this person for recording
8
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
d. Make sure it’s available to each member; later it should be available to
anyone who will be using the DR.
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
8
Getting the DBA very involved
The translation of all the logical work into the needed technology
This is where the IT membership on the steering committee will be
most useful – translation, talking through, translating,
Security Issues
This is where the access rights will be granted and the overall
understanding of the appropriateness of who will have access to what tables �
could/will lead to actual structure of the tables within the database
Structuring the Schema
Collect the data elements based on relevancy – work through by
describing the differing entities (i.e. a Student, a Donor, etc.)
What will be structured as a table; what will be structured as a data
view (define a table vs. a view)
Define and build the Tables and Views
Decide on the Logical collections (joins) of tables
Then building Views look to the data end-points to help identify and
collect what the common views might be (example: Registrar’s reporting—17
reports with only 3 different data extracts(views) once we looked at them)
9
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
Example: Why you might ask would I counsel you to keep close track of where you do
your transformations? Remember 3-4 years ago when, I think it was Ellucian,
changed the way phone numbers were being stored in the database? I was in the
middle of shaking down our Students datamart at the time; this change required that I
add 3 fields to the datamart (to capture the number), adding one query (an extract),
and modifying one query to capture the data. All the queries and views that needed
the phone number needed no changes at all. The datamart took care of that.
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
9
Define the queries/scripts
This is where the technical decisions made by the committee come
into play
The table/view structure and the data dictionary will have a bit of a
‘shotgun’ wedding – the elements defined in the dictionary will now need to be
translated from the source application (i.e. PowerCAMPUS) to the datamart – the
data will be extracted -> transformed -> loaded � the ETL process as it’s called in BI
parlance
Build the refresh process
To ‘refresh’ the datamart (repository) is the process by which the data
repository either in total or partially is updated to reflect the state of the
data as of a certain point in time.
Discuss real-time vs. real ENOUGH time � what is current enough to make
for meaningful reporting
Considerations
Getting the datamart up and running
Initial testing
Vetting the test data set
Going live (well almost)
10
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
Remember buy-in
Probably one of the toughest parts of the whole process
WILL require running parallel reports to validate the mart
Watch out for the changes made -> that innocent change to a report structure
or how a data element is defined ‘with the new system’ could be a stumbling
block to acceptance
Major credibility issues at this point -> especially for the prototype
Approval and sign off -> celebration time, really! Party down!! It’s hard work,
it shows, and it’s usable!
A word of caution here – so impressed will your users be with what you’ve
accomplished they’ll holler for more! Beware of the “You did that, can you do
this…” syndrome. Take the suggestions, acknowledge them, look at them as
evidence that your users ARE interested and excited, log them into your
project… as items that will form the core of, let’s call it, Version 2.0 You can
get very hung up with unplanned for iterations.
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
10
Myth: http://en.wikipedia.org/wiki/Sisyphus
11
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
Doing some demo:
Description of our platform
MS – SQL database structure (I am, admittedly, NOT a
systems – type)
PowerCAMPUS w/ all setflows minus Academic Plan,
PFaids, Moodle (with Remote Learner), Symplicity Residence
Development – MS-Access 2010
Reporting – through Access moving to Evisions Argos
reporting (2 years now)
Short stint with Crystal and
MSRS
Very little
VistaView/VistaReports- mostly through the Billing setflow
Scheduled refresh tasks – MSIS, MARS
12
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
Traps:
Time, talent, treasure– not spending the time, not utilizing you best
talent, not investing in the infrastructure required
Buy in– non involvement and even ‘sabotage’ by management but
also the users. Remember Machiavelli!
Shadow Systems– the will be those systems that lay hidden which will
work to undermine the effort to make the DR relevant
Data “Ownership” – Different departments will view that data-work
they do as creating ‘their’ data and will make the effort to keep access
to it under their control (this could be the largest hurdle you’ll
encounter)
Gate Keeping– goes along with the above ‘data ownership’ issue– user
department will insist that they have the authority to limit who sees
what
Immediacy– When that last-minute, gotta have it now report looms
there will be the temptation to pull the data “just this once” using the
13
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
old ways of doing things; this undermines the understanding that the
DR will be the source of reporting. Senior managers need to get into
this one and insist that the DR be used even if the immediacy is slowed
a bit
But, there are promises as well! Click
Promises
Time, talent, treasure– Once established (and even during the build
process) time will be saved in not only building those reports and charts
but also in the verifying of the data; time saved to use in moving the
school forward and coming up with new ways of pulling and analyzing;
As the DR grows the work of the committee will build the talent of you
staff across departmental divides (which disappear through the
cooperation of the committee); and that is a treasure for the school–
well trained users accesing a uniformly designed and agreed upon data
store.
Consistent Reporting– your reporting will benefit through the clear
understanding of what reports contain and how they were developed
and what each data element on the report represents
Higher data integrity – across campus; consistent and agreed upon
definitions and methods for creation will lead to high trust in what the
users sees and uses
Informed reporting – users; no more “What does this report do?” it will
aid in using the proper report for the proper reason.
Buy in – the more involvement in the DR development process you have
the more the users will come to trust and with that they will come to
accept, use, and move to further refine your DR
Data Ownership; Ownership becomes Stewardship– each department
will understand their part in the creation and maintenance (the care
and feeding) of the DR. Increased trust in the use and ‘professional’
access to the data will allow posessive users to loosen their hold on the
gatekeeping function.
PLUS– It’s WAY COOL!!
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014
13
14
Presenter: Charles Krajewski, Colby-Sawyer
College
6/18/2014