I’ve Always Wanted To Data Model
Ian Varley, Salesforce.comData Week, 2013-10-02
Lightning Talk (10 minutes)
Who am I?Ian VarleyAustin, TX
Salesforce.comBig Data Team@thefutureian
What’s Data Modeling?
The act of taking the intelligible structure of the world around us, and
making it concrete enough for computers to act on it.
(More specifically, data modeling usually has to do with storing it in a database.)
Traditionally, data modeling has meant Entity Attribute Relationship
modeling techniques.
There are variants that are more “OO” (like UML) but they share most of the same core assumptions.
Many a project was sunk due to shitty data modeling.
It’s a difficult occupation.You have to be part engineer, part psychologist, and part philosopher.
If you’re doing it, you’re not alone.Lots of smart folks think about this stuff.
(David Hay, Steve Hoberman, Joe Celko, many more.)
But.
The expressive power of our conceptual modeling techniques hasn’t
improved much since the 1970s.
We mostly look at the world in the same static way we did 40 years ago.
Partly, this is because our discipline is wedded to relational (SQL) DBs.
When the only tool you have is a hammer ...
A book that opened my eyes ...
(He said a lot of the stuff I’m about to say back in 1978!)
I don’t have a lot of answers.But I want to raise some questions.
And hopefully, start a conversation.
Here are 5 observations about the tools of traditional data modeling.
#1: nobody actually knows what an “entity” really is.
“Entity” is another word for Category, in linguistics terms.
And an important property of linguistic categories is that they are slippery.
See:● Steven Pinker: The Stuff Of Thought● Douglas Hofstadter: Surfaces & Essences● George Lakoff: Women, Fire, and Dangerous Things
part: an abstract definition of a connected set of physical materials that serve some purpose, and that people are willing to buy
part: one instance of a part type, which arrives on the QA line at a specific time and either does or doesn't meet quality standards
And if you think you can “solve” the problem, I’ve got some world trade
center insurance policies to sell you.
That said, there are a couple tools we could adopt that would help:
● First-class Sub- / Super-Typing● First-class Scoping and Aliasing
(Not that there aren’t ways to do this in ERD models, but they’re unobvious and not widely used.)
#2: entities, attributes, and relationships are really the
same thing, maaaan ...
http://the-hippie-portfolio.tumblr.com/
Say I’ve got a “parent” in my model.
Is it:● A “parent” entity?● A “person” entity with
an “isParent” attribute?● Two “person” entities in
a “parent” relationship?
It’s all of them; the distinction is arbitrary.
The real structure is just a graph … but none of our modeling tools are that flexible, nor is it helpful to think that
abstractly about most software.
Normally, we make the choice based on our experience and gut feeling, and
pretend there’s a science to it.
But the whole way of thinking is a convenience based on “records”.
I have no idea what to do about this.
Tools that allow you to view any part of your model in any of those ways?
I have no idea what to do about this.
Tools that allow you to view any part of your model in any of those ways?
I have no idea what to do about this.
Tools that allow you to view any part of your model in any of those ways?
This isn’t realistic with today’s tools, so this is just idle speculation.
#3: prescriptive models encourage black & white thinking in a gray world
You have to make decisions (about entities, attributes, relationships, types) up front. But sometimes that’s not right.
This is a strength of (some) NoSQL databases: you can do data first, and
surface structure later.
Sometimes the deep structure is actually ambiguous.
This can apply broadly.(What if an employee isn’t really “in” a department, but has
flexible membership based on where she spends her time?)
You can represent that in a traditional data model, sure.
But you’re not encouraged to.
#4: static models make the time dimension unwieldy
Entity models are generally silent on the ways data changes.
Many modern databases can keep older versions of objects.
But should they? For which entities How many versions? etc.
Worse, what about when the model changes at runtime, and you need to also retain knowledge of what the old
model was?
As in #3, there are ways to model this in entity models, but it’s not easy, so most people just don’t think about it.
#5: boxes & lines aren’t how we actually think
Our spatial processing of diagrams doesn’t map well to our temporal,
spatial, and causal comprehension of data structure.
What do people really do?
Skip making models when their models look too complicated.
F*** THAT NOISE.
Is there an alternative? Not yet.
What could move the needle?● Prototype based modeling● Proper scoping● Semantic zooming
The map is not the territory.
In conclusion … if you dig this stuff, let’s talk!
@thefutureian