full-text search in django with postgresql - europython · full-text search in django with...

38
Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet |

Upload: trinhminh

Post on 11-Jul-2018

263 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Full-Text Searchin Django with PostgreSQL

EuroPython 2017 - Rimini, 2017-07-12

Paolo Melchiorre - @pauloxnet

|

Page 2: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Paolo Melchiorre |

▪Computer Science Engineer

▪Backend Python Developer (>10yrs)

▪Django Developer (~5yrs)

▪Senior Software Engineer @ 20Tab

▪Happy Remote Worker

▪PostgreSQL user, not a DBA

2

Page 3: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Goal |

“To show how we have used Django

Full-Text Search and PostgreSQL

in a Real Project”

3

Page 4: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Motivation |

“To implement Full-Text Search using only

Django and PostgreSQL functionalities,

without resorting to external tools.”

4

Page 5: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Agenda |

▪Full-Text Search

▪Existing Solutions

▪PostgreSQL Full-Text Search

▪Django Full-Text Search Support

▪www.concertiaroma.com project

▪What’s next

▪Conclusions

▪Questions

5

Page 6: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Full-Text Search |

“… Full-Text Search* refers to techniques

for Searching a single computer-stored

Document or a Collection

in a Full-Text Database …”

-- Wikipedia

* FTS = Full-Text Search

6

Page 7: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Features of a FTS |

▪Stemming

▪Ranking

▪Stop-words

▪Multiple languages support

▪Accent support

▪ Indexing

▪Phrase search

7

Page 8: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Tested Solutions |

8

Page 9: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Elasticsearch |Project: Snap Market (~500k mobile users)

Issues:

▪Management problems

▪Patching a Java plug-in

@@ -52,7 +52,8 @@ public class DecompoundTokenFilter … {

- posIncAtt.setPositionIncrement(0);

+ if (!subwordsonly)

+ posIncAtt.setPositionIncrement(0);

return true;

}

9

Page 10: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Apache Solr |

Project: GoalScout (~25k videos)

Issues:

▪Synchronization problems

▪All writes to PostgreSQL and reads from Solr

10

Page 11: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Existing Solutions |

PROS

▪Full featured solutions

▪Resources (documentations, articles, …)

CONS

▪Synchronization

▪Mandatory use of driver (haystack, bungiesearch…)

▪Ops Oriented: focus on system integrations

11

Page 12: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

FTS in PostgreSQL |

▪FTS Support since version 8.3 (~2008)

▪TSVECTOR to represent text data

▪TSQUERY to represent search predicates

▪Special Indexes (GIN, GIST)

▪Phrase Search since version 9.6 (~2016)

12

Page 13: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

What are Documents |

“… a Document is the Unit of searching

in a Full-Text Search system; for example,

a magazine Article or email Message …”

-- PostgreSQL documentation

13

Page 14: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

DjangoSupport |

▪Module: django.contrib.postgres

▪FTS Support since version 1.10 (2016)

▪BRIN and GIN indexes since version 1.11 (2017)

▪Dev Oriented: focus on programming

14

Page 15: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Making queries |class Blog(models.Model):

name = models.CharField(max_length=100)

tagline = models.TextField()

class Author(models.Model):

name = models.CharField(max_length=200)

email = models.EmailField()

class Entry(models.Model):

blog = models.ForeignKey(Blog)

headline = models.CharField(max_length=255)

body_text = models.TextField()

pub_date = models.DateField()

authors = models.ManyToManyField(Author)

15

Page 16: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Standard queries |

>>> Author.objects.filter(name__contains='Terry')

[<Author: Terry Gilliam>, <Author: Terry Jones>]

>>> Author.objects.filter(name__icontains='Erry')

[<Author: Terry Gilliam>, <Author: Terry Jones>,

<Author: Jerry Lewis>]

16

Page 17: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Unaccented query |

>>> from django.contrib.postgres.operations import UnaccentExtension

>>> UnaccentExtension()

>>> Author.objects.filter(name__unaccent__icontains='Hélène')

[<Author: Helen Mirren>, <Author: Helena Bonham Carter>, <Author:

Hélène Joy>]

17

Page 18: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Trigram similar |

>>> from django.contrib.postgres.operations import TrigramExtension

>>> TrigramExtension()

>>> Author.objects.filter(name__unaccent__trigram_similar='Hélèn')

[<Author: Helen Mirren>, <Author: Helena Bonham Carter>,

<Author: Hélène Joy>]

18

Page 19: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

The search lookup |

>>> Entry.objects.filter(body_text__search='Cheese')

[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]

19

Page 20: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

SearchVector |

>>> from django.contrib.postgres.search import SearchVector

>>> Entry.objects.annotate(

... search=SearchVector('body_text', 'blog__tagline'),

... ).filter(search='Cheese')

[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]

20

Page 21: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

SearchQuery |

>>> from django.contrib.postgres.search import SearchQuery

>>> SearchQuery('potato') & SearchQuery('ireland')

# potato AND ireland

>>> SearchQuery('potato') | SearchQuery('penguin')

# potato OR penguin

>>> ~SearchQuery('sausage')

# NOT sausage

21

Page 22: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

SearchRank |

>>> from django.contrib.postgres.search import (

... SearchQuery, SearchRank, SearchVector

... )

>>> vector = SearchVector('body_text')

>>> query = SearchQuery('cheese')

>>> Entry.objects.annotate(

... rank=SearchRank(vector, query)

... ).order_by('-rank')

[<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]

22

Page 23: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Search confguration |

>>> from django.contrib.postgres.search import (

... SearchQuery, SearchVector

... )

>>> Entry.objects.annotate(

... search=SearchVector('body_text', config='french'),

... ).filter(search=SearchQuery('œuf', config='french'))

[<Entry: Pain perdu>]

>>> from django.db.models import F

>>> Entry.objects.annotate(

... search=SearchVector('body_text', config=F('blog__lang')),

... ).filter(search=SearchQuery('œuf', config=F('blog__lang')))

[<Entry: Pain perdu>]

23

Page 24: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Weighting queries |

>>> from django.contrib.postgres.search import (

... SearchQuery, SearchRank, SearchVector

... )

>>> vector = SearchVector('body_text', weight='A') +

... SearchVector('blog__tagline', weight='B')

>>> query = SearchQuery('cheese')

>>> Entry.objects.annotate(

... rank=SearchRank(vector, query)

... ).filter(rank__gte=0.3).order_by('rank')

24

Page 25: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

SearchVectorField |

>>> Entry.objects.update(

... search_vector=SearchVector('body_text')

... )

>>> Entry.objects.filter(search_vector='cheese')

[<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]

25

Page 26: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

www.concertiaroma.com|

“… today's shows in the Capital” *

The numbers of the project:

~ 1k venues

> 12k bands

> 15k shows

~ 200 festivals

~ 30k user/month

* since ~2014

26

Page 27: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Version 2.0 |

Python 2.7 - Django 1.7 - PostgreSQL 9.1 - SQL LIKE

27

Page 28: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Version3.0 |

Python 3.6 - Django 1.11 - PostgreSQL 9.6 - PG FTS

28

Page 29: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Band Manager |LANG = 'english'

class BandManager(models.Manager):

def search(self, text):

vector = (

SearchVector('nickname', weight='A', config=LANG) +

SearchVector('genres__name', weight='B', config=LANG)+

SearchVector('description', weight='D', config=LANG)

)

query = SearchQuery(text, config=LANG)

rate = SearchRank(vector, query)

return self.get_queryset().annotate(rate=rate).filter(

search=query).annotate(search=vector).distinct(

'id', 'rate').order_by('-rate', 'id')

29

Page 30: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Band Test Setup |

class BandTest(TestCase):

def setUp(self):

metal, _ = Genre.objects.get_or_create(name='Metal')

doom, _ = Genre.objects.get_or_create(name='Doom')

doomraiser, _ = Contact.objects.get_or_create(

nickname='Doom raiser', description='Lorem…')

doomraiser.genres.add(doom)

forgotten_tomb, _ = Contact.objects.get_or_create(

nickname='Forgotten Tomb', description='Lorem…')

forgotten_tomb.genres.add(doom)

....

30

Page 31: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Band TestMethod |class BandTest(TestCase):

def setUp(self):

...

def test_band_search(self):

band_queryset = Band.objects.search(

'doom').values_list('nickname', 'rate')

band_list = [

('Doom raiser', 0.675475),

('The Foreshadowin', 0.258369),

('Forgotten Tomb', 0.243171)]

self.assertSequenceEqual(

list(OrderedDict(band_queryset).items()),

band_list)

31

Page 32: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

What’s next |

▪Misspelling support

▪Multiple language configuration

▪Search suggestions

▪SearchVectorField with triggers

▪JSON/JSONB Full-Text Search

▪RUM indexing

32

Page 33: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Conclusions |

Conditions to implement this solution:

▪No extra dependencies

▪Not too complex searches

▪Easy management

▪No need to synchronize data

▪PostgreSQL already in your stack

▪Python-only environment

33

Page 34: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Resources |

▪postgresql.org/docs/9.6/static/textsearch.html

▪github.com/damoti/django-tsvector-field

▪en.wikipedia.org/wiki/Full-text_search

▪docs.djangoproject.com/en/1.11/ref/contrib/postgres

▪PostgreSQL & Django source codes

▪Stack Overflow

▪Google ;-)

34

Page 35: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Acknowledgements |

Marc Tamlyn

for all the Support for django.contrib.postgres

35

Page 36: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Thankyou |

BY - SA (Attribution-ShareAlike)

creativecommons.org/licenses/by-sa

Slides

speakerdeck.com/pauloxnet

36

Page 37: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Questions ? |

After the talk, Please!

*

* Speak Slowly

I'm not a native English speaker

37

Page 38: Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Contacts |

www.paulox.net

twitter.com/pauloxnet

linkedin.com/in/paolomelchiorre

github.com/pauloxnet38