Download - Django Meetup: Django Multicolumn Joins
Django Meetup:Django Multicolumn Joins
Jeremy TillmanSoftware Engineer, Hearsay Social
@hssengineering
Django Multicolumn Joins | © 2012 Hearsay Social 2
About Me
• Joined Hearsay Social May 2012 as Software Engineering Generalist
• Computer Engineer BA, Purdue University
• 3 years @ Microsoft working on versions of Window Server
• 9 years of databases experience– Access, SQL Server, MySql
• Loves Sea Turtles!
Django Multicolumn Joins | © 2012 Hearsay Social 3
Why do we want multicolumn joins?
Django Multicolumn Joins | © 2012 Hearsay Social 4
Django First App: Poll example
class Poll(models.Model): question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')
class Choice(models.Model): poll = models.ForeignKey(Poll) choice_text = models.CharField(max_length=200) votes = models.IntegerField(default=0)
Django Multicolumn Joins | © 2012 Hearsay Social 5
What if we stored Polls for X number of customers?
class Customer(models.Model): name = models.CharField(max_length=100)
class Meta: ordering = (‘name’,)
class Choice(models.Model): poll = models.ForeignKey(Poll) choice_text = models.CharField(max_length=200) votes = models.IntegerField(default=0)
class Poll(models.Model): customer = models.ForeignKey(Customer) question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')
CREATE TABLE customer(id INT NOT NULL AUTO_INCRMENT PRIMARY KEY,name VARCHAR(100) NOT NULL);
CREATE TABLE poll(id INT NOT NULL AUTO_INCRMENT PRIMARY KEY,customer_id INT NOT NULL,question VARCHAR(200) NOT NULL,pub_date DATETIME NOT NULL,INDEX idx_customer (customer_id));
CREATE TABLE choice(id INT NOT NULL AUTO_INCRMENT PRIMARY KEY,poll INT NOT NULL,choice_text VARCHAR (200),votes INT NOT NULL DEFAULT 0,INDEX idx_poll (poll_id));
Django Multicolumn Joins | © 2012 Hearsay Social 6
How is our data being stored?
CREATE TABLE choice(id INT NOT NULL AUTO_INCRMENT PRIMARY KEY,poll_id INT NOT NULL,choice_text VARCHAR (200),votes INT NOT NULL DEFAULT 0,INDEX idx_poll (poll_id));
id poll_id choice_text votes1 1 Ham 52 7 Aries 83 2 Elephant 9…. … … …23,564,149 1 All of the above 223,564,150 74 Sea turtle 7
Django Multicolumn Joins | © 2012 Hearsay Social 7
Data locality part 1: Scope by poll
CREATE TABLE choice(id INT NOT NULL,poll_id INT NOT NULL,choice_text VARCHAR (200),votes INT NOT NULL DEFAULT 0,PRIMARY KEY (poll_id, id));
id poll_id choice_text votes1 1 Ham 51,562 1 Turkey 4623,564,149 1 All of the above 2…. … … …18,242,234 74 Jelly fish 023,564,150 74 Sea turtle 7
Django Multicolumn Joins | © 2012 Hearsay Social 8
Data locality part 2: Scope by customer
CREATE TABLE choice(id INT NOT NULL,customer_id INT NOT NULL,poll_id INT NOT NULL,choice_text VARCHAR (200),votes INT NOT NULL DEFAULT 0,PRIMARY KEY (customer_id, poll_id, id));
id poll_id customer_id choice_text votes1 1 1 Ham 51,562 1 1 Turkey 4623,564,149 1 1 All of the above 218,242,234 74 1 Jelly fish 023,564,150 74 1 Sea turtle 7… … … … …
Django Multicolumn Joins | © 2012 Hearsay Social 9
Representation in Django Models
class Customer(models.Model): name = models.CharField(max_length=100)
class Meta: ordering = (‘name’,)
class Choice(models.Model): customer = models.ForeignKey(Customer) poll = models.ForeignKey(Poll) choice_text = models.CharField(max_length=200) votes = models.IntegerField(default=0)
class Poll(models.Model): customer = models.ForeignKey(Customer) question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')
Django Multicolumn Joins | © 2012 Hearsay Social 10
Customer Load/Data Balance
customer_id id1 12 23 34 4
Django Multicolumn Joins | © 2012 Hearsay Social 11
Customer Load/Data Balance: Split Customers
customer_id id3 33 54 44 6
customer_id id1 11 52 22 6
Django Multicolumn Joins | © 2012 Hearsay Social 12
Add DB and Balance Load: id collision
customer_id id3 33 5
customer_id id1 11 5
customer_id id2 22 64 44 6
Django Multicolumn Joins | © 2012 Hearsay Social 13
Queries: Find all choices for a poll?customer_id id question1 1 What’s your seat pref.?
1 2 Are you married?
2 1 Gender?
2 2 Did you have fun?
customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?
Poll
Choice
Django Multicolumn Joins | © 2012 Hearsay Social 14
Queries: Find all choices for a poll?
Attempt 1) Using related set
target_poll.choice_set.all()or
Choice.objects.filter(poll=target_poll)
SELECT * FROM choice WHERE poll_id = 1
customer_id id question1 1 What’s your seat pref.?
1 2 Are you married?
2 1 Gender?
2 2 Did you have fun?
customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?
Poll
Choice
Django Multicolumn Joins | © 2012 Hearsay Social 15
Queries: Find all choices for a poll?
Attempt 2) Adding a F expression
target_poll.choice_set.all(customer=F(‘poll__customer’))or
Choice.objects.filter(poll=target_poll,
customer=F(‘poll__customer’))
SELECT c.* FROM choice c INNER JOIN poll pON c.poll_id = p.idWHERE
c.poll_id = 1AND
c.customer_id = p.customer_id;
customer_id id question1 1 What’s your seat pref.?
1 2 Are you married?
2 1 Gender?
2 2 Did you have fun?
customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?
Poll
Choice
Django Multicolumn Joins | © 2012 Hearsay Social 16
Queries: Find all choices for a poll?
Attempt 3) Filter explicitly
target_poll.choice_set.all(customer=target_poll.customer)or
Choice.objects.filter(poll=target_poll, customer=target_poll.customer)
SELECT * FROM choiceWHERE
poll_id = 1AND
customer_id = 2;
customer_id id question1 1 What’s your seat pref.?
1 2 Are you married?
2 1 Gender?
2 2 Did you have fun?
customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?
Poll
Choice
Django Multicolumn Joins | © 2012 Hearsay Social 17
Field Assignment
quantity_inn = Customer.objects.create(id=15, name=‘Quantity Inn’)
quantity_poll = Poll.objects.create(id=1, company=quantity_inn, question=‘What size bed do you prefer?’)
choice1 = Choice(id=1, choice_text=“King”, poll=quantity_poll)
choice1.customer_id ??????
choice1.customer = quantity_poll.customer Repetitive
Django Multicolumn Joins | © 2012 Hearsay Social 18
What do we do?
Django Multicolumn Joins | © 2012 Hearsay Social 19
Solution via Django 1.6
class ForeignObject(othermodel, from_fields, to_fields[, **options])
where:
from django.db.models import ForeignObject
Django Multicolumn Joins | © 2012 Hearsay Social 20
ForeignObject Usage
class ForeignModel(models.Model):
id1 = models.IntegerField()
id2 = models.IntegerField()
class ReferencingModel(models.Model):
om_id1 = models.IntegerField()
om_id2 = models.IntegerField()
om = ForeignObject(ForeignModel,
from_fields=(om_id1, om_id2),
to_fields=(id1, id2))
Django Multicolumn Joins | © 2012 Hearsay Social 21
Conversion from ForeignKey to ForeignObject
class Choice(models.Model):
customer = models.ForeignKey(Customer)
poll = models.ForeignKey(Poll)
choice_text = models.CharField(max_length=200)
votes = models.IntegerField(default=0)
class Choice(models.Model):
customer = models.ForeignKey(Customer)
poll_id = models.IntegerField()
choice_text = models.CharField(max_length=200)
votes = models.IntegerField(default=0)
poll = models.ForeignObject(Poll,
from_fields=(‘customer’, ‘poll_id’),
to_fields=(‘customer’, ‘id’))
Django Multicolumn Joins | © 2012 Hearsay Social 22
Queries with ForeignObject
Attempt 1) Using related set
target_poll.choice_set.all()
SELECT * FROM choiceWHERE
poll_id = 1AND
customer_id = 2;
customer_id id question1 1 What’s your seat pref.?
1 2 Are you married?
2 1 Gender?
2 2 Did you have fun?
customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?
Poll
Choice
Django Multicolumn Joins | © 2012 Hearsay Social 23
Queries with ForeignObject
Attempt 2) Manually stated
Choice.objects.filter(poll=target_poll)
SELECT * FROM choiceWHERE
poll_id = 1AND
customer_id = 2;
customer_id id question1 1 What’s your seat pref.?
1 2 Are you married?
2 1 Gender?
2 2 Did you have fun?
customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?
Poll
Choice
Django Multicolumn Joins | © 2012 Hearsay Social 24
Queries with ForeignObject
Attempt 2) Manually stated w/tuple
Choice.objects.filter(poll=(2, 1))
SELECT * FROM choiceWHERE
poll_id = 1AND
customer_id = 2;
customer_id id question1 1 What’s your seat pref.?
1 2 Are you married?
2 1 Gender?
2 2 Did you have fun?
customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?
Poll
Choice
Django Multicolumn Joins | © 2012 Hearsay Social 25
Field Assignment with ForeignObject
quantity_inn = Customer.objects.create(id=15, name=‘Quantity Inn’)
quantity_poll = Poll.objects.create(id=1, company=quantity_inn, question=‘What size bed do you prefer?’)
choice1 = Choice(id=1, choice_text=“King”, poll=quantity_poll)
choice1.customer_id
>> 15
choice1.customer = quantity_poll.customer Not needed
Django Multicolumn Joins | © 2012 Hearsay Social 26
“With great power comes great responsibility”
Django Multicolumn Joins | © 2012 Hearsay Social 27
Tuple ordering matters
Choice.objects.filter(poll=(1, 2))
SELECT * FROM choiceWHERE
poll_id = 2AND
customer_id = 1;
poll = models.ForeignObject(Poll, from_fields=(‘customer’, ‘poll_id’), to_fields=(‘customer’, ‘id’))
customer_id id question1 1 What’s your seat pref.?
1 2 Are you married?
2 1 Gender?
2 2 Did you have fun?
customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?
Poll
Choice
Django Multicolumn Joins | © 2012 Hearsay Social 28
IN Operator
Choice.objects.filter(poll__in=[(2, 1), (2, 2)])
SELECT * FROM choiceWHERE
(poll_id = 1AND
customer_id = 2)OR
(poll_id = 2AND
customer_id = 2);
poll = models.ForeignObject(Poll, from_fields=(‘customer’, ‘poll_id’), to_fields=(‘customer’, ‘id’))
customer_id id question1 1 What’s your seat pref.?
1 2 Are you married?
2 1 Gender?
2 2 Did you have fun?
customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?
Poll
Choice
Django Multicolumn Joins | © 2012 Hearsay Social 29
IN Operator w/queryset
Choice.objects.filter(poll__in= Poll.objects.filter(customer_id=2))
SELECT c.* FROM choice cWHEREEXISTS (SELECT p.customer_id, p.id
FROM poll pWHERE
p.customer_id = 2AND
p.customer_id = c.customer_idAND
p.id = c.poll_id);
poll = models.ForeignObject(Poll, from_fields=(‘customer’, ‘poll_id’), to_fields=(‘customer’, ‘id’))
customer_id id question1 1 What’s your seat pref.?
1 2 Are you married?
2 1 Gender?
2 2 Did you have fun?
customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?
Poll
Choice
Django Multicolumn Joins | © 2012 Hearsay Social 30
IN Operator with MySql
Choice.objects.filter(poll__in=[(2, 1), (2, 2)])
SELECT * FROM choiceWHERE
(poll_id, customer_id)IN
((1, 2), (2, 2));
poll = models.ForeignObject(Poll, from_fields=(‘customer’, ‘poll_id’), to_fields=(‘customer’, ‘id’))
customer_id id question1 1 What’s your seat pref.?
1 2 Are you married?
2 1 Gender?
2 2 Did you have fun?
customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?
Poll
Choice
Django Multicolumn Joins | © 2012 Hearsay Social 31
IN Operator w/queryset & MySQL
Choice.objects.filter(poll__in= Poll.objects.filter(customer_id=2))
SELECT c.* FROM choice cWHERE(c.customer_id, c.poll_id)IN(SELECT p.customer_id, p.id
FROM poll pWHERE
p.customer_id = 2);
poll = models.ForeignObject(Poll, from_fields=(‘customer’, ‘poll_id’), to_fields=(‘customer’, ‘id’))
customer_id id question1 1 What’s your seat pref.?
1 2 Are you married?
2 1 Gender?
2 2 Did you have fun?
customer_id poll_id id choice_text1 1 1 Window1 1 2 Ailse1 2 1 Yes1 2 2 No2 1 1 Male2 1 2 Female2 2 1 Yes?
Poll
Choice
Django Multicolumn Joins | © 2012 Hearsay Social 32
ForeignKey vs ForeignObject
Whats the difference?
ForeignKey is a ForeignObject
pseudo def: ForeignObject(OtherModel, from_fields=((‘self’,)), to_fields=((OtherModel._meta.pk.name),))
Django Multicolumn Joins | © 2012 Hearsay Social 33
ForeignKey usage: Order By Example
Poll.objects.order_by(‘customer’)class Customer(models.Model): name = models.CharField(max_length=100)
class Meta: ordering = (‘name’,)
class Poll(models.Model): customer = models.ForeignKey(Customer) question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')
Django Multicolumn Joins | © 2012 Hearsay Social 34
ForeignKey usage: Order By Example
Poll.objects.order_by(‘customer’)
SELECT p.* from poll INNER JOIN customer cON
p.customer_id = c.idORDER BY
c.name ASC;
class Customer(models.Model): name = models.CharField(max_length=100)
class Meta: ordering = (‘name’,)
class Poll(models.Model): customer = models.ForeignKey(Customer) question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')
Django Multicolumn Joins | © 2012 Hearsay Social 35
ForeignKey usage: Order By Example
Poll.objects.order_by(‘customer_id’)
SELECT p.* from poll INNER JOIN customer cON
p.customer_id = c.idORDER BY
c.name ASC;
class Customer(models.Model): name = models.CharField(max_length=100)
class Meta: ordering = (‘name’,)
class Poll(models.Model): customer = models.ForeignKey(Customer) question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')
Alias for customer
Django Multicolumn Joins | © 2012 Hearsay Social 36
ForeignKey usage: Order By Example
Poll.objects.order_by(‘customer__id’)
SELECT p.* from poll INNER JOIN customer cON
p.customer_id = c.idORDER BY
p.customer_id ASC;
class Customer(models.Model): name = models.CharField(max_length=100)
class Meta: ordering = (‘name’,)
class Poll(models.Model): customer = models.ForeignKey(Customer) question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')
Django Multicolumn Joins | © 2012 Hearsay Social 37
ForeignKey usage: Order By Example
Poll.objects.order_by(‘customer_id’)
SELECT * from pollORDER BY
customer_id ASC;
class Customer(models.Model): name = models.CharField(max_length=100)
class Meta: ordering = (‘name’,)
class Poll(models.Model): customer_id = models.IntegerField() question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')
customer = models.ForeignObject(Customer, from_fields=(‘customer_id’,), to_fields=(‘id’,))
Django Multicolumn Joins | © 2012 Hearsay Social 38
Still more fun stuff
• ForeignObject.get_extra_description_filter
• ForeignObject.get_extra_restriction
• More to come
Django Multicolumn Joins | © 2012 Hearsay Social 39
Dig for more information:
• ForeignObject source• django/db/models/fields/related.py
• V1 Version of Patch (Based of Django 1.4)• https://github.com/jtillman/django/tree/MultiColumnJoin
• Blog post to come• Hearsay Social Blog (http://engineering.hearsaysocial.com/)
Django Multicolumn Joins | © 2012 Hearsay Social 40
Questions?