iahx elluminate antwerp abcd 20120524

38
IAHx Search Interface using Apache Solr/Lucene ABCD & CDS/ISIS Workshop Elluminate Session 24 May 2012 Vinicius de Andrade Desarrollo de Sistemas KMC/BIREME BIREME / PAHO / WHO

Upload: juan-pablo-alvarez

Post on 29-Sep-2015

8 views

Category:

Documents


0 download

DESCRIPTION

Manual IAHX

TRANSCRIPT

  • IAHxSearch Interface using Apache Solr/Lucene

    ABCD & CDS/ISIS WorkshopElluminate Session

    24 May 2012

    Vinicius de AndradeDesarrollo de SistemasKMC/BIREME

    BIREME / PAHO / WHO

  • Data Level

    Index Level

    ISISLucene

    InterfaceLevel

    Services Interfaces

    LayersCapas

  • MetadataMetadatos

    Conversion of information sources for a set of metadata (single schema)

    Identification of elements for organization into "clusters"

    Data LevelCapa de los datos

  • Indexes

    Index LevelCapa de los ndices

    Boolean queryBsqueda booleana

    Boolean query, ranking and clustersBsqueda booleana, ranking y clusters

  • Multiples interfaces for present result

    Interface Level

  • What is LuceneHigh performance, scalable, full-text search libraryFocus: Indexing + Searching DocumentsDocument is just a list of name+value pairsNo crawlers or document parsingFlexible Text Analysis (tokenizers + token filters)100% Java, no dependencies, no configfiles

  • What is SolrA full text search server based on LuceneXML/HTTP, JSON InterfacesFaceted Search (category counting)Flexible data schema to define types and fieldsIndex ReplicationExtensible Open Architecture, PluginsWeb Administration InterfaceWritten in Java, deployable as a WAR

  • Lucene Architecture

  • admin update select

    Standard request handlerCustom request handler

    XML response writerJSON response writer

    XML Update HandlerCSV Update Handler

    Lucene

    Basic AppDocument

    title: Genomeauthor: Matt Ridleytype: book...

    Query Response(matching docs)Query(title:genome)

    http://solr/update http://solr/select

    Servlet

    Contain

    er

    Solr

    HTML

    WebappIndexer

  • DocList

    Search(Query,Filter[],Sort,offset,n)

    language:en

    year:2008

    genomeyear asc

    subject:chromosomes

    subject:diseases

    DocSet

    type:article

    type:book

    journal:Rev. A

    journal:Rev B

    Journal: Rev C

    intersection

    Size()

    = 594

    = 382

    = 247

    = 689

    = 104

    = 92

    = 75

    Query Response

    Clusters / Grupos

  • Indexing DataHTTP POST to http://localhost:8080/solr/update

    05991GenoneMatt Ridleygenomediseasechromosomesen

  • Index Key Generation

  • Deleting DocumentsDelete by Id, most efficient

    0559132552

    Delete by Query

    subject:disease

  • Commit makes changes visible

    same as commit, merges all index segments for faster searching

    _0.fnm_0.fdt_0.fdx_0.frq_0.tis_0.tii_0.prx_0.nrm

    _0_1.del

    _1.fnm_1.fdt_1.fdx[]

    Lucene Index Segments

  • Searchinghttp://localhost:8080/solr/select?q=genome

    &start=0&rows=2&fl=title,author

    GenomeMatt Ridley

  • Update and Query Index

    :8080/index/update

    /index/select

    XML

    QUERY

    http://localhost:8080/index/select?q=saude&fq=type:article&wt=json

  • IAHx - Architecture

    Client Interface Controller Index

  • Update scripts

    index.shindex.sh [arquivo xml] [indice]commit.shcommit.sh [indice]optimize.shoptimize.sh [indice]deletedocs.shdeletedocs.sh [indice] [query]

  • lil-7320LILACSBR1.1regionalarticleRibeiro, M. VGallina, R. ASato, THidranencefalia: estudo clinicopatologico de 6 casos.Hydranencephaly: clinicopathological study of 6 cases184-92Arq Neuropsiquiatr;40(2)1982. Arq Neuropsiquiatr0004-282X402pt1982BR1982000000.0671982Foram estudados 6 casos de hidranencefalia do ponto de vista de sua semiologia clinica, de seus

    exames complementares e das verificacoes anatomopatologicas. Os autores concluem que a transiluminacao e de grande utilidade no diagnostico precoce destes casos. O seguimento dos pacientes e as verificacoes anatomopatologicasdemonstram que a hidranencefalia teve como origem lesoes encefaloclasticas (inflamatorias, mecanicas e vasculares) que levaram, antes ou apos o nascimento, a destruicao total do cerebro com preservacao das estruturas sub-tentoriais

    ^d6984SCAD

    Solr Update XML formatrelevancy

    cluster

    order

  • Solr XML Config schema (1/2)

    .....

  • Solr XML Config schema.xml (2/2)

    .....

    ....

  • Solr XML Config solrconfig.xml

    .....

    truetype type_of_studymh_clusterta_clusteryear_cluster201

  • 010

    oniahx

    BVS-3700iAHx integrated searchpresentation

    Solr XML result

  • {"responseHeader":{"status":0,"QTime":1,"params":{

    "wt":"json","rows":["1","1"],

    "start":"0","indent":"on","q":iahx","version":"2.2"}},"response":{"numFound":2,"start":0,"docs":[

    {"id":"BVS-3700",au":"Antonio, Vinicius de Andrade",ti":" iAHx integrated search ","type":"presentation"}]}}

    Solr JSON result

  • IAHx - Search Interface

  • Project Source Code RedDes (tickets, documentation)

    http://reddes.bvsalud.org/projects/iahx/

    GitHub (source code) http://github.com/bireme/iahx-opac/ http://github.com/bireme/iahx-server/ http://github.com/bireme/iahx-controller/

  • IAHx - Instalation

    http://reddes.bvsalud.org/projects/iahx/wiki/Install* Available only in Portuguese at this time

  • Running iAHx Solr server installed and running

    iahx-server is a custom installation of tomcat6 with solr deployment and shell scripts for executing basic solr REST commands

    Tomcat6 iahx-controller is a war module used for dispatch

    and receive solr requests

    Webserver + PHP iahx-opac interface that convert JSON Solr result

    using smarty template

  • Prepare data for Solr

    ISIS SOLR Conversion via PFT

    OAI-PMH XML SOLR Conversion via XSL

  • ISIS SOLRif p(v2) then,

    ' ',| |v2||/,(| |v4||/),(| |v16^*||/),(| |v18^*||/),

    ' '/,fi,

  • OAI-PMH XML SOLR

  • Questions

  • Vinicius de AndradeBIREME/OPS/OMS

    Thank you