a profile of applied data analysis lab (ada lab)
TRANSCRIPT
Applied Data Analysis Lab – a profile
Dr. Łukasz BolikowskiICM, University of Warsaw
December 2014
ADA Lab ⊆ ICM ⊆ UW
University of Warsaw (UW) is one of the top Polish higher education establishments.
Interdisciplinary Centre for Mathematical and Computational Modelling (ICM)is a supercomputing and research data centre within the University of Warsaw.
Applied Data Analysis Lab (ADA Lab) is a research group within the ICM.
ADA Lab’s Scope of Interest
Legal Text Mining
Business Data Mining
Training & Outreach
Scholarly PDF Mining
Map of SciencePersistent IDs
Data Anonymization
Scalable Text and Data Mining Informatics for Open Science
Legal Text Mining
Building a judgment analysis system for Poland.Integrating data from common courts, theSupreme Administrative Court, the SupremeCourt, and the Constitutional Tribunal.Planning a larger, European project with similargoals (Horizon 2020; currently building consor-tium and defining scope).
Business Data Mining
Leveraging high demand for data science skills.For-profit projects with business partners.Usually can’t discuss details due to NDAs.Our favourite toolset:
R for data understanding and modellingApache Spark for analysing larger data setsD3 for information visualizationCRISP-DM for managing our projects(Cross-Industry Standard Process for Data Mining)
Training and Outreach
“Web-Scale Data Mining and Processing”(Course at Polish Academy of Sciences)
“Introduction to Text Mining”(Course at Warsaw School of Data Analysis organised by ICM)
Internal trainings on Hadoop, SparkPresentations at Big Data conferences(Target audience: business partners)
Workshops and internships for talented youth(In collaboration with Polish Children’s Fund)
Scholarly PDF Mining
Extracting metadata, bibliographic references, and full textfrom scholarly PDFs. Research direction: semantic anno-tation of paragraphs, sentences, phrases.CERMINE is an open software (AGPL license), with usersworldwide: OpenAIRE.eu, Paperity.org, Public KnowledgeProject.Interfaces for humans and for machines (RESTful API).Try CERMINE at: http://cermine.ceon.pl/
Map of Science
A comprehensive map of academia. Mining availabledocuments and data sets in order to reconstruct thegraph of relations between: people, documents, insti-tutions, topics, funding sources.Final result: a publicly available data set.Why? Better understanding of science. Cool featuresin digital libraries and research information systems.Elements of the map currently developed in OpenAIREand OCEAN projects.
Persistent IDs
To achieve long-term preservation of research arti-facts, we need an identifier minting and managementscheme that can outlive the organization managingthe scheme.We are developing a distributed scheme based onpublic-key cryptography and P2P networking (a lotin common with Bitcoin).
Data Anonymization
Privacy-preserving research data publication is across-cutting issue, applies to various types ofdata analysed at ICM: legal judgments, medicalrecords, social network activity.
Thank you for your attention. Let’s stay in touch!
adalab.icm.edu.pl/blog
twitter.com/adalab_icm
linkedin.com/in/bolikowski
twitter.com/bolikowski
License
c© 2014 ICM, University of Warsaw. Some rights reserved. This presentation is available under a CC BY 3.0 license. Materials from the followingsources were used:
https://www.flickr.com/photos/86530412@N02/8213432552 (p. 4, CC BY 2.0)https://www.flickr.com/photos/124247024@N07/13903385550 (p. 5, CC BY-SA 2.0)https://www.flickr.com/photos/genista/228006200 (p. 6, CC BY-SA 2.0)https://www.flickr.com/photos/bohman/210977249 (p. 9, CC BY 2.0)https://www.flickr.com/photos/hyku/368912557 (p. 10, CC BY 2.0)