macaw: an extensible conversational information seeking platform · 2019-12-20 · platform hamed...

Macaw: An Extensible Conversational Information SeekingPlatform

Hamed Zamani and Nick CraswellMicrosoft AI & Research

United States{hazamani,nickcr}@microsoft.com

ABSTRACTConversational information seeking (CIS) has been recognized asa major emerging research area in information retrieval. Such re-search will require data and tools, to allow the implementationand study of conversational systems. This paper introduces Macaw,an open-source framework with a modular architecture for CISresearch. Macaw supports multi-turn, multi-modal, and mixed-initiative interactions, and enables research for tasks such as doc-ument retrieval, question answering, recommendation, and struc-tured data exploration. It has a modular design to encourage thestudy of new CIS algorithms, which can be evaluated in batch mode.It can also integrate with a user interface, which allows user studiesand data collection in an interactive mode, where the back end canbe fully algorithmic or a wizard of oz setup. Macaw is distributedunder the MIT License.1

KEYWORDSConversational Search; Conversational Question Answering; Con-versational Recommendation; Conversational Natural LanguageInterface; Open-Source Library

1 INTRODUCTIONThe rapid growth in speech and small screen interfaces, particu-larly on mobile devices, has significantly influenced the way usersinteract with intelligent systems to satisfy their information needs.The growing interest in personal digital assistants, such as Ama-zon Alexa, Apple Siri, Google Assistant, and Microsoft Cortana,demonstrates the willingness of users to employ conversationalinteractions [10]. As a result, conversational information seeking(CIS) has been recognized as a major emerging research area inthe Third Strategic Workshop on Information Retrieval (SWIRL2018) [4].2

Research progress in CIS relies on the availability of resources tothe community. There have been recent efforts on providing data

1Macaw is available at https://github.com/microsoft/macaw.2https://sites.google.com/view/swirl3/

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected], Microsoft, AI & Research© 2019 Association for Computing Machinery.ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00https://doi.org/10.1145/nnnnnnn.nnnnnnn

(a) Multi-modal interactions. (b) Multi-turn interactions.

Figure 1: Example screenshots of the Macaw interface onmobile devices using Telegram bots. Macaw supports multi-modal and multi-turn interactions.

for various CIS tasks, such as the TREC 2019 Conversational Assis-tance Track (CAsT),3 MISC [14], Qulac [1], CoQA [11], QuAC [3],SCS [15], and CCPE-M [9]. In addition, Dalton et al. [5] have imple-mented a demonstration for conversational movie recommendationbased on Google’s DialogFlow. Despite all of these resources, thecommunity still feels the lack of a suitable platform for developingCIS systems. We believe that providing such platform will speedup the progress in conversational information seeking research.Therefore, we developed a general framework for supporting CISresearch. The framework is called Macaw. This paper describes thehigh-level architecture of Macaw, the supported functionality, andour future vision. Researchers working on various CIS tasks shouldbe able to take advantage of Macaw in their projects.

Macaw is designed based on a modular architecture to sup-port different information seeking tasks, including conversational

3http://www.treccast.ai/

arX

iv:1

912.

0890

4v1

[cs

.IR

] 1

8 D

ec 2

019

https://github.com/microsoft/macaw

https://sites.google.com/view/swirl3/

https://doi.org/10.1145/nnnnnnn.nnnnnnn

http://www.treccast.ai/

search, conversational question answering, conversational recom-mendation, and conversational natural language interface to struc-tured and semi-structured data. Each interaction in Macaw (fromboth user and system) is a Message object, thus a conversation is alist of Messages. Macaw consists of multiple actions, each action isa module that can satisfy the information needs of users for somerequests. For example, search and question answering can be twoactions in Macaw. Even multiple search algorithms can be also seenas multiple actions. Each action can produce multiple outputs (e.g.,multiple retrieved documents). For every user interaction, Macawruns all actions in parallel. The actions’ outputs produced within apredefined time interval (i.e., an interaction timeout constant) arethen post-processed. Macaw can choose one or combine multipleof these outputs and prepare an output Message object as the user’sresponse.

The modular design of Macaw makes it relatively easy to con-figure a different user interface or add a new one. The currentimplementation of Macaw supports a command line interface aswell as mobile, desktop, and web apps. In more detail, Macaw’sinterface can be a Telegram bot, which supports a wide range ofdevices and operating systems (see Figure 1). This allows Macawto support multi-modal interactions, such as text, speech, image,click, etc. A number of APIs for automatic speech recognition andgeneration have been employed to support speech interactions.Note that the Macaw’s architecture and implementation allowsmixed-initiative interactions.

The research community can benefit from Macaw for the follow-ing purposes:• Developing algorithms, tools, and techniques for CIS.• Studying user interactions with CIS systems.• Performing CIS studies based on an intermediary person andwizard of oz.

• Preparing quick demonstration for a developed CIS model.

2 MACAW ARCHITECTUREMacaw has a modular design, with the goal of making it easyto configure and add new modules such as a different user inter-face or different retrieval module. The overall setup also follows aModel-View-Controller (MVC) like architecture. The design deci-sions have been made to smooth the Macaw’s adoptions and ex-tensions. Macaw is implemented in Python, thus machine learningmodels implemented using PyTorch,4 Scikit-learn,5 or TensorFlow6

can be easily integrated into Macaw. The high-level overview ofMacaw is depicted in Figure 2. The user interacts with the inter-face and the interface produces a Message object from the currentinteraction of user. The interaction can be in multi-modal form,such as text, speech, image, and click. Macaw stores all interactionsin an “Interaction Database”. For every interaction, Macaw looksfor most recent user-system interactions (including the system’sresponses) to create a list of Messages, called the conversation list.It is then dispatched to multiple information seeking (and related)actions. The actions run in parallel, and each should respond withina pre-defined time interval. The output selection component selects

4https://pytorch.org/5https://scikit-learn.org/6http://tensorflow.org/

from (or potentially combines) the outputs generated by differentactions and creates a Message object as the system’s response. Thismessage is logged into the interaction database and is sent to theinterface to be presented to the user. Again, the response messagecan be multi-modal and include text, speech, link, list of options,etc.

Macaw also supportsWizard of Oz studies or intermediary-basedinformation seeking studies. The architecture of Macaw for suchsetup is presented in Figure 3. As shown in the figure, the seekerinteracts with a real conversational interface that supports multi-modal and mixed-initiative interactions in multiple devices. Theintermediary (or the wizard) receives the seeker’s message andperforms different information seeking actions with Macaw. Allseeker-intermediary and intermediary-system interactions will belogged for further analysis. This setup can simulate an ideal CISsystem and thus is useful for collecting high-quality data from realusers for CIS research.

3 RETRIEVAL AND QUESTION ANSWERINGIN MACAW

The overview of retrieval and question answering actions in Macawis shown in Figure 4. These actions consist of the following compo-nents:• Co-Reference Resolution: To support multi-turn interactions,it is sometimes necessary to use co-reference resolution tech-niques for effective retrieval. In Macaw, we identify all the co-references from the last request of user to the conversation his-tory. The same co-reference resolution outputs can be used fordifferent query generation components. This can be a generic oraction-specific component.

• Query Generation: This component generates a query basedon the past user-system interactions. The query generation com-ponent may take advantage of co-reference resolution for queryexpansion or re-writing.

• Retrieval Model: This is the core ranking component that re-trieves documents or passages from a large collection. Macawcan retrieve documents from an arbitrary document collectionusing the Indri python interface [6, 12].7 We also provide thesupport for web search using the Bing Web Search API.8 Macawalso allows multi-stage document re-ranking.

• Result Generation: The retrieved documents can be too long tobe presented using some interfaces. Result generation is basicallya post-processing step ran on the retrieved result list. In case ofquestion answering, it can employ answer selection or generationtechniques, such as machine reading comprehension models.For example, Macaw features the DrQA model [2] for questionanswering.These components are implemented in a generic form, so re-

searchers can easily replace them with their own favorite algo-rithms.

7Indri [12] is an open-source search engine originally implemented to support languagemodels in information retrieval as part of the Lemur Project (http://lemurproject.org/). It features a wide range of retrieval models. For more information visit http://lemurproject.org/indri.php.8https://azure.microsoft.com/en-us/services/cognitive-services/bing-web-search-api/

https://pytorch.org/

https://scikit-learn.org/

http://tensorflow.org/

http://lemurproject.org/

http://lemurproject.org/

http://lemurproject.org/indri.php

http://lemurproject.org/indri.php

https://azure.microsoft.com/en-us/services/cognitive-services/bing-web-search-api/

https://azure.microsoft.com/en-us/services/cognitive-services/bing-web-search-api/

Interface

Conversation Retrieval

Request Dispatcher

Action 1Output

Selection

Output Presentation

List of Request

Messages

Action 2 Action N

Interaction DB

UI Specific

Request Message

Response Message

Model Specific

Figure 2: The high-level architecture of Macaw for developing conversation information seeking systems.

Interface Interface

Request Dispatcher

Action 1Output

Selection Action 2 Action N

Response Message

Seeker(User)

Intermediary(Wizard)

Figure 3: The high-level architecture of Macaw for user studies. In this architecture, user interacts with a human intermediarywho is an expert of the system and can interact with the system to address the user’s information need.

4 USER INTERFACESWe have implemented the following interfaces for Macaw:

• File IO: This interface is designed for experimental purposes,such as evaluating the performance of a conversational searchtechnique on a dataset with multiple queries. This is not aninteractive interface.

• Standard IO: This interactive command line interface is de-signed for development purposes to interact with the system,see the logs, and debug or improve the system.

• Telegram: This interactive interface is designed for interactionwith real users (see Figure 1). Telegram9 is a popular instantmessaging service whose client-side code is open-source. We

9https://telegram.org/

https://telegram.org/

Co-reference Resolution

Query Generation

Retrieval Model

Document Collection

ResultGeneration

Figure 4: The overview of retrieval and question answeringin Macaw.

have implemented a Telegram bot that can be used with differentdevices (personal computers, tablets, and mobile phones) anddifferent operating systems (Android, iOS, Linux, Mac OS, andWindows). This interface allows multi-modal interactions (text,speech, click, image). It can be also used for speech-only inter-actions. For speech recognition and generation, Macaw relieson online APIs, e.g., the services provided by Google Cloud andMicrosoft Azure. In addition, there exist multiple popular groupsand channels in Telegram, which allows further integration of so-cial networks with conversational systems. For example, see theNaseri and Zamani’s study on news popularity in Telegram [8].

Similar to the other modules, one can easily extend Macaw usingother appropriate user interfaces.

5 LIMITATIONS AND FUTUREWORKThe current implementation of Macaw lacks the following actions.We intend to incrementally improve Macaw by supporting moreactions and even more advanced techniques for the developedactions.

• Clarification and Preference Elicitation: Asking clarifyingquestions has been recently recognized as a necessary componentin a conversational system [1, 9]. The authors are not aware ofa published solution for generating clarifying questions usingpublic resources. Therefore, Macaw does not currently supportclarification.

• Explanation: Despite its importance, result list explanation isalso a relatively less explored topic. We intend to extend Macawwith result list explanation as soon as we find a stable and maturesolution.

• Recommendation: In our first release, we focus on conversa-tional search and question answering tasks. We intend to providesupport for conversational recommendation, e.g., [7, 13, 18], andjoint search and recommendation, e.g., [16, 17], in the future.

• Natural Language Interface: Macaw can potentially supportaccess to structured data, such as knowledge graph. We wouldlike to ease conversational natural language interface to struc-tured and semi-structured data in our future releases.

6 CONTRIBUTIONMacaw is distributed under the MIT License. We welcome contribu-tions and suggestions. Most contributions require you to agree to aContributor License Agreement (CLA) declaring that you have theright to, and actually do, grant us the rights to use your contribution.For details, visit https://cla.opensource.microsoft.com. This projecthas adopted the Microsoft Open Source Code of Conduct.

When you submit a pull request, a CLA bot will automaticallydetermine whether you need to provide a CLA and decorate thePR appropriately (e.g., status check, comment). Simply follow theinstructions provided by the bot. You will only need to do this onceacross all repos using our CLA.

7 CONCLUSIONSThis paper described Macaw, an open-source platform for conver-sational information seeking research. Macaw supports multi-turn,multi-modal, and mixed-initiative interactions. It was designedbased on a modular architecture that allows further improvementsand extensions. Researchers can benefit fromMacaw for developingalgorithms and techniques for conversational information seekingresearch, for user studies with different interfaces, for data collec-tion from real users, and for preparing a demonstration of a CISmodel.

8 ACKNOWLEDGEMENTSThe authors wish to thank Ahmed Hassan Awadallah, KrisztianBalog, and Arjen P. de Vries for their invaluable feedback.

REFERENCES[1] Mohammad Aliannejadi, Hamed Zamani, Fabio Crestani, and W. Bruce Croft.

2019. Asking Clarifying Questions in Open-Domain Information-Seeking Con-versations. In Proceedings of the 42nd International ACM SIGIR Conference onResearch and Development in Information Retrieval (SIGIR’19). ACM, New York,NY, USA, 475–484.

[2] Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. ReadingWikipedia to Answer Open-Domain Questions. In Proceedings of the 55th AnnualMeeting of the Association for Computational Linguistics (ACL ’17).

[3] Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, PercyLiang, and Luke Zettlemoyer. 2018. QuAC: Question Answering in Context. InProceedings of the 2018 Conference on Empirical Methods in Natural LanguageProcessing (EMNLP ’18). Association for Computational Linguistics, Brussels,Belgium, 2174–2184.

[4] J. Shane Culpepper, Fernando Diaz, and Mark D. Smucker. 2018. Research Fron-tiers in Information Retrieval: Report from the Third Strategic Workshop onInformation Retrieval in Lorne (SWIRL 2018). SIGIR Forum 52, 1 (Aug. 2018),34–90.

[5] Jeffrey Dalton, Victor Ajayi, and Richard Main. 2018. Vote Goat: ConversationalMovie Recommendation. In The 41st International ACM SIGIR Conference onResearch and Development in Information Retrieval (SIGIR ’18). ACM, New York,NY, USA, 1285–1288.

[6] Christophe Van Gysel, Evangelos Kanoulas, and Maarten de Rijke. 2017. Pyndri:A Python Interface to the Indri Search Engine. In Proceedings of the 39th EuropeanConference on Information Retrieval (ECIR ’17). 744–748.

[7] Raymond Li, Samira Ebrahimi Kahou, Hannes Schulz, Vincent Michalski, LaurentCharlin, and Chris Pal. 2018. Towards Deep Conversational Recommendations.In Advances in Neural Information Processing Systems (NeurIPS ’18). Curran Asso-ciates, Inc., 9725–9735.

[8] Mohammad Naseri and Hamed Zamani. 2019. Analyzing and Predicting NewsPopularity in an InstantMessaging Service. In Proceedings of the 42nd InternationalACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’19). ACM, New York, NY, USA, 1053–1056.

[9] Filip Radlinski, Krisztian Balog, Bill Byrne, and Karthik Krishnamoorthi. 2019.Coached Conversational Preference Elicitation: A Case Study in UnderstandingMovie Preferences. In Proceedings of the Annual SIGdial Meeting on Discourse andDialogue.

https://cla.opensource.microsoft.com

[10] Filip Radlinski and Nick Craswell. 2017. A Theoretical Framework for Con-versational Search. In Proceedings of the 2017 Conference on Conference HumanInformation Interaction and Retrieval (CHIIR ’17). ACM, New York, NY, USA,117–126.

[11] Siva Reddy, Danqi Chen, and Christopher D. Manning. 2019. CoQA: A Con-versational Question Answering Challenge. Transactions of the Association forComputational Linguistics 7 (2019), 249–266.

[12] Trevor Strohman, Donald Metzler, Howard Turtle, and W. Bruce Croft. 2005.Indri: A language model-based search engine for complex queries. In Proceedingsof the 2005 International Conference on Intelligent Analysis.

[13] Yueming Sun and Yi Zhang. 2018. Conversational Recommender System. InThe 41st International ACM SIGIR Conference on Research and Development inInformation Retrieval (SIGIR ’18). ACM, New York, NY, USA, 235–244.

[14] Paul Thomas, Daniel McDuff, Mary Czerwinski, and Nick Craswell. 2017. MISC:A data set of information-seeking conversations. In Proceedings of the 1st Interna-tional Workshop on Conversational Approaches to Information Retrieval.

[15] Johanne R. Trippas and Paul Thomas. 2019. Data Sets for Spoken ConversationalSearch. In Proceedings of the CHIIR 2019 Workshop on Barriers to Interactive IRResources Re-use. 14–18.

[16] Hamed Zamani and W. Bruce Croft. 2018. Joint Modeling and Optimization ofSearch and Recommendation. In Proceedings of the First International Conferenceon Design of Experimental Search and Information Retrieval Systems (DESIRES’18).

[17] Hamed Zamani and W. Bruce Croft. 2020. Learning a Joint Search and Recom-mendation Model from User-Item Interactions. In Proceedings of the 13th ACMInternational Conference on Web Search and Data Mining (WSDM ’20).

[18] Yongfeng Zhang, Xu Chen, Qingyao Ai, Liu Yang, and W. Bruce Croft. 2018.Towards Conversational Search and Recommendation: SystemAsk, User Respond.In Proceedings of the 27th ACM International Conference on Information andKnowledge Management (CIKM ’18). ACM, New York, NY, USA, 177–186.

macaw: an extensible conversational information seeking platform · 2019-12-20 · platform hamed...

Documents