fficiency-effectiveness trade offs in modern web …theses.di.unipi.it/slides/trani.pdf · lucchese...

16
Roberto Trani roberto.trani@{isti.cnr.it, di.unipi.it} December 5th, 2019 E FFICIENCY -E FFECTIVENESS T RADE - OFFS IN M ODERN W EB S EARCH EFFICIENCY-EFFECTIVENESS TRADE-OFFS IN MODERN WEB SEARCH http://hpc.lab.isti.cnr.it

Upload: others

Post on 15-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FFICIENCY-EFFECTIVENESS TRADE OFFS IN MODERN WEB …theses.di.unipi.it/slides/trani.pdf · Lucchese C., et al. “Quickscorer: A fast algorithm to rank documents with additive ensembles

Roberto Traniroberto.trani@{isti.cnr.it, di.unipi.it}

December 5th, 2019

EFFICIENCY-EFFECTIVENESS TRADE-OFFS IN MODERN WEB SEARCH

EFFICIENCY-EFFECTIVENESS TRADE-OFFS IN MODERN WEB SEARCH

http://hpc.lab.isti.cnr.it

Page 2: FFICIENCY-EFFECTIVENESS TRADE OFFS IN MODERN WEB …theses.di.unipi.it/slides/trani.pdf · Lucchese C., et al. “Quickscorer: A fast algorithm to rank documents with additive ensembles

Search…

Page 3: FFICIENCY-EFFECTIVENESS TRADE OFFS IN MODERN WEB …theses.di.unipi.it/slides/trani.pdf · Lucchese C., et al. “Quickscorer: A fast algorithm to rank documents with additive ensembles

Search…

1014 WEB PAGES 109 PRODUCTS

1011 NEWS 1012 POSTS

Page 4: FFICIENCY-EFFECTIVENESS TRADE OFFS IN MODERN WEB …theses.di.unipi.it/slides/trani.pdf · Lucchese C., et al. “Quickscorer: A fast algorithm to rank documents with additive ensembles

Search…

1014 WEB PAGES

+100ms ⇒ -1% sales

109 PRODUCTS

1011 NEWS 1012 POSTS

+500ms ⇒ -20% traffic+100ms ⇒ -0,6% profit

Page 5: FFICIENCY-EFFECTIVENESS TRADE OFFS IN MODERN WEB …theses.di.unipi.it/slides/trani.pdf · Lucchese C., et al. “Quickscorer: A fast algorithm to rank documents with additive ensembles

EFFICIENT SCORING

Page 6: FFICIENCY-EFFECTIVENESS TRADE OFFS IN MODERN WEB …theses.di.unipi.it/slides/trani.pdf · Lucchese C., et al. “Quickscorer: A fast algorithm to rank documents with additive ensembles

EFFICIENT SCORING

‣ State-of-the-art Learning to Rank models use forests of thousands decision trees e.g., 5000 trees, 32 leaves per tree, 700 features 34ms to score 1000 documents

F4 50.1

F1 0.57

F2 5.0

F3 -3.8

7.1

1.8 -3.1

0.4 -1.4

x[F3] ≤ − 3.8?

s1,w1 s2,w2 sN,wN Score = siwi!2

M=�

DOCUMENT

QUERY DOCUMENT FEATURE EXTRACTION DOCUMENT FEATURES

T1 T2 TN…

Page 7: FFICIENCY-EFFECTIVENESS TRADE OFFS IN MODERN WEB …theses.di.unipi.it/slides/trani.pdf · Lucchese C., et al. “Quickscorer: A fast algorithm to rank documents with additive ensembles

f1f0

increasing values

f|F|�1

num. leaves num. leaves num. leaves

num.leaves � num. trees

num. leaves

num. leaves

x[f0] �0

x[f0] �1 x[f0] �2 x[f0] �3

x[f1] �4

x[f1] �5

h

leafvalues

mask

leafindexes

EFFICIENT SCORING

Novel traversal of the entire forest: linear scan of thresholds and succinct storage of partial results of the computation

18x with SIMD 120x with Multithread 600x with GPU

up to 6 times faster than state-of-the-art competitors

Highly (auto) vectorizableLow branch misprediction ++Cache friendly

Page 8: FFICIENCY-EFFECTIVENESS TRADE OFFS IN MODERN WEB …theses.di.unipi.it/slides/trani.pdf · Lucchese C., et al. “Quickscorer: A fast algorithm to rank documents with additive ensembles

BEST FINDING

Page 9: FFICIENCY-EFFECTIVENESS TRADE OFFS IN MODERN WEB …theses.di.unipi.it/slides/trani.pdf · Lucchese C., et al. “Quickscorer: A fast algorithm to rank documents with additive ensembles

BEST FINDING

Several applications need to find the best element of a set

- Question-Answering

- Recommender systems

- Route suggestion

- Advertising

- Machine Translation

- Key term extraction

Page 10: FFICIENCY-EFFECTIVENESS TRADE OFFS IN MODERN WEB …theses.di.unipi.it/slides/trani.pdf · Lucchese C., et al. “Quickscorer: A fast algorithm to rank documents with additive ensembles

BEST FINDING

▸ ML algorithms for ranking to assign scores to each item individually

▸ ML pairwise classifier to perform a round robin tournament

3 3 2 14 4

4

3

1

2

1

2Find the

Champion!

New Machine Learning solutions to find the Best

Efficient algorithms to find a Tournament Champion

Page 11: FFICIENCY-EFFECTIVENESS TRADE OFFS IN MODERN WEB …theses.di.unipi.it/slides/trani.pdf · Lucchese C., et al. “Quickscorer: A fast algorithm to rank documents with additive ensembles

FILTERING OF ATTRIBUTE-SORTED RESULTS

Page 12: FFICIENCY-EFFECTIVENESS TRADE OFFS IN MODERN WEB …theses.di.unipi.it/slides/trani.pdf · Lucchese C., et al. “Quickscorer: A fast algorithm to rank documents with additive ensembles

FILTERING OF ATTRIBUTE-SORTED RESULTS

Page 13: FFICIENCY-EFFECTIVENESS TRADE OFFS IN MODERN WEB …theses.di.unipi.it/slides/trani.pdf · Lucchese C., et al. “Quickscorer: A fast algorithm to rank documents with additive ensembles

FILTERING OF ATTRIBUTE-SORTED RESULTS

up to 150 times faster than state-of-the-art competitor

Filter out some of the results to improve the quality of the list Preserving the sort by attribute

Dynamic ProgrammingCombinatorics ++Approximation algorithms

Page 14: FFICIENCY-EFFECTIVENESS TRADE OFFS IN MODERN WEB …theses.di.unipi.it/slides/trani.pdf · Lucchese C., et al. “Quickscorer: A fast algorithm to rank documents with additive ensembles

IMPACT IN INDUSTRY AND TECHNOLOGY TRANSFER

Page 15: FFICIENCY-EFFECTIVENESS TRADE OFFS IN MODERN WEB …theses.di.unipi.it/slides/trani.pdf · Lucchese C., et al. “Quickscorer: A fast algorithm to rank documents with additive ensembles

OUR TEAM

Page 16: FFICIENCY-EFFECTIVENESS TRADE OFFS IN MODERN WEB …theses.di.unipi.it/slides/trani.pdf · Lucchese C., et al. “Quickscorer: A fast algorithm to rank documents with additive ensembles

THANK YOU !ROBERTO TRANI [email protected]

RAFFAELE PEREGO (LAB HEAD) [email protected] http://hpc.isti.cnr.it

Beretta L., et al. "An Optimal Algorithm to Find Champions of Tournament Graphs”.  International Symposium on String Processing and Information Retrieval (SPIRE). Springer, 2019.

Lucchese C., et al. “Quickscorer: A fast algorithm to rank documents with additive ensembles of regression trees”. Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR). ACM, 2015. [Best Paper Award]

Nardini F.M., et al. "Fast Approximate Filtering of Search Results Sorted by Attribute”. Proceedings of the 42nd International ACM Conference on Research and Development in Information Retrieval (SIGIR). ACM, 2019.