east-west neo medicinal u-lifecare research center workshop january 2014 presented by: muhammad...

41
HPC over Cloud East-West Neo Medicinal u-Lifecare Research Center Workshop January 2014 Presented By: Muhammad Bilal Amin Cloud Computing Team, Ubiquitous Computing Lab Kyung Hee University, Global Campus, Korea

Upload: amos-ira-glenn

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • East-West Neo Medicinal u-Lifecare Research Center Workshop January 2014 Presented By: Muhammad Bilal Amin Cloud Computing Team, Ubiquitous Computing Lab Kyung Hee University, Global Campus, Korea
  • Slide 2
  • Agenda High Performance Computing over Cloud Motivation for HPC over Cloud (HPCoC) Related work HPCoC Architecture HPCoC Contribution SPHeRe Motivation for SPHeRe Implementation Details Evaluation SPHeRes Contributions & Achievements Conclusion 2
  • Slide 3
  • Motivation for HPC over Cloud 3
  • Slide 4
  • 4
  • Slide 5
  • 5
  • Slide 6
  • 6
  • Slide 7
  • 7
  • Slide 8
  • Related Work and Limitations 8
  • Slide 9
  • HPCoC Architecture (Stack View)
  • Slide 10
  • UCLab Cloud infrastructure 10
  • Slide 11
  • UCLab Cloud Infrastructure 11 Physical Machine 1234 4 Gb 8 Gb RAM Hard drive Windows 7 x64 VM Ware ESXI Native OS Hypervisor VM1 Windows 7 VM 2 Windows 7 Virtual Machines Guest OS 2 Tb 1 2 4 Gb 1 Tb 4 Gb 1 Tb 12 Physical Machine 3478 4 Gb 8 Core i7 CPU 16 Gb RAM Hard drive Xen Hypervisor VM 2 VM 4 Guest OS 56 4 Gb VM 3 12 4 Gb VM1 Linux 12 4 Gb 345678 250 Gb Linux 4 Gb 250 Gb Linux 4 Gb 250 Gb Linux 4 Gb 250 Gb 2 Tb Java Runtime Hadoop 4 virtual nodes16 virtual nodes 20 Virtual Nodes Virtual Machines
  • Slide 12
  • HPCoC Contributions & Uniqueness A unified Java-based High performance platform for Grande Applications (Data and Computation Intensive). Cloud-enable Java-based HPC messaging and distribution middle-wares e.g. MPJ-Core. MPI-Like messaging with fault tolerance incorporated from Hadoop. Implement parallel computation intensive and data intensive processing on unshared data in MapReduce through In- map/In-reduce parallelism. Green HPC: Virtualized resources are a big step for the HPC to step into green computing and energy efficient. Releasing the solution under an open source licensing for the academic community. 12
  • Slide 13
  • 13 A Performance Initiative towards Large-scale Bio-medical Ontology Matching by Implementing Thread Level Parallelism (TTP) over Multicore Platforms
  • Slide 14
  • Motivation for SPHeRe Effective ontology matching is a computationally intensive (processing power and memory) operation requiring matching algorithms with quadratic complexity to be executed over candidate ontologies Gross et al. On Matching Large Life Science Ontologies in Parallel, Lecture Notes in Computer Science (LNCS), 2010 Delay in matching results, makes ontology matching ill-equipped for semi-real-time, semantic web-based systems Stoilos et al. A string metric for ontology alignment ISWC05, Heidelberg, Germany 2005 The core techniques for achieving better performance are either related to the optimization of matching algorithms or the fragmentation of ontologies for matching algorithms. Utilization of parallel and distributed platforms has largely been missing P. Shvaiko and J Euzenat Ontology matching: State of the art and future challenges IEEE Transaction on Knowledge and Data Engineering, January 2013 Commodity hardware capable of parallelism i.e., multi-core processors over a distributed platform (Cloud) Amin et. Al High Performance Java Sockets (HPJS) for scientific Health Clouds 13th IEEE HealthCom, Beijing 2012 Cloud is affordable (utility-based pricing), cloud is available (ubiquitous) Armbrust et al. A view of Cloud Computing ACM Communication April 2010 14 Research Opportunity: Ontology Matching over parallel and distributed commodity hardware
  • Slide 15
  • Implementation Challenges 1. End to end Parallelism 15 1.Resolution : Methodology to exploit for parallelism from loading till delivery
  • Slide 16
  • Implementation Challenges 2. Memory Strain Amount of related information not required at the moment of time, flooding Memory Parsing and Loading for Inference vs. Parsing and Loading for Matching Java Heap Blow-up (2 GB Heap is not Enough) Unable to iterate over properties of FMA and NCI Cloud Instances have limited memory per instance 16 2. Resolution: Load what we need (Smaller memory foot print during execution)
  • Slide 17
  • Implementation Challenges 3. Accuracy Preservation 17 3. Resolution : Decoupling of Matching Algorithms from Distribution
  • Slide 18
  • Implementation Challenges 4. Thread Safety Shared ontology data among multiple threads (synchronize access leads to sequential execution) The available owl frameworks are not thread safe Result guarantee 18 4. Resolution: Thread Safe ontology model, shared among multithreaded execution
  • Slide 19
  • Implementation Challenges 5. Scalability with optimal resource utilization Exploit the available computational resources for concurrency with equality (Effective load balancing) Implementation of right parallelism technique (partitioning) Better reduction rate 19 5. Resolution: Effective distribution of matching requests over available computational resources
  • Slide 20
  • SPHeRe Architecture 20
  • Slide 21
  • 21 Matcher Distribution The matching request received by the system is subdivided from macro (matching request) to micro (matching task) level
  • Slide 22
  • 22 Matcher Distribution
  • Slide 23
  • 23 Inter-node Communication
  • Slide 24
  • 24 Mappings Aggregation Responsible for accumulating the matched results, creating a corresponding Bridge Ontology (Mapping), and its delivery
  • Slide 25
  • Large Scale Biomedical Ontology Matching tool over High Performance Computing
  • Slide 26
  • 26 Scenario I: Multicore desktop
  • Slide 27
  • 27 Scenario II: 4 VM Cloud
  • Slide 28
  • Ontology Loading Time 28 3 x Faster Loading time
  • Slide 29
  • Total Memory Footprint 29 8 x Memory efficient
  • Slide 30
  • Scalability (Reduction Score) 30 Outperforms by 40%
  • Slide 31
  • Performance Evaluation 31 ~4 to 8 x Performance efficient
  • Slide 32
  • Performance Evaluation (FMAxNCI) 32
  • Slide 33
  • Performance Evaluation (FMAxSNOMED) 33
  • Slide 34
  • Performance Evaluation (NCIxSNOMED) 34
  • Slide 35
  • Uniqueness / Contributions Exploitation of Parallel Commodity hardware for matching Implementing data parallelism based distribution over subsets of candidate ontologies of ontology subsets over multicore hardware of multicore platform and provides a collection of mappings among the ontologies as a bridge ontology file End-to-End Performance Initiative (from loading till delivery) Creating subsets of ontologies depending on the needs of matching algorithms and caches them in serialized formats, providing a single-step ontology loading for matching algorithms in parallel Smaller Memory footprint Each subset is lightweight due to matcher-based and redundancy-free creation, providing smaller memory footprints and contributing in overall system performance Better Scalability Utilization of computational resources most efficiently with the help of its matching task distribution 35
  • Slide 36
  • Achievements OAEI 2013. Evaluation at ISWC 2013 (A-Rated Conference) SPHeRe was presented and evaluated over large-scale biomedical track It was remarked as the first Ontology Matching system that utilizes distributed Cloud resource Our first release of this year ranked among the top-15 systems of 2013 (globally) Microsoft Research Asia Award 2013-2014 Research Funding Awarded by Microsoft Research Asia for SPHeRe over Microsoft Azure platform. Microsoft Azure4Research Award 2014-2015 SPHeRe for Large scale Biomedical Ontology Matching over Microsoft Azure Platform 36
  • Slide 37
  • Publications Conferences Wajahat Ali Khan, Muhammad Bilal Amin, Asad Masood Khattak, Maqbool Hussain, and Sungyoung Lee, System for Parallel Heterogeneity Resolution (SPHeRe) results for OAEI 201312 th Int. Semantic Web Conference (ISWC), 21-25 October 2013, Sydney, Australia. Ammar Ahmad Awan, Muhammad Bilal Amin, Shujaat Hussain, Aamir Shafi and Sungyoung Lee, An MPI-IO Compliant Java based Parallel I/O library, 13 th IEEE CCGrid. Delft, Netherlands, May 2013 Ammar Ahmad Awan, Muhammad Shoaib Ayub, Aamir Shafi and Sungyoung Lee, Towards Efficient Support for Parallel I/O in Java HPC, 13 th PDCAT, Beijing 2012. Muhammad Bilal Amin, Wajahat Ali Khan, Shujaat Hussain and Sungyoung Lee, High Performance Java Sockets (HPJS) for healthcare cloud systems, 13 th HealthCom 2012, Beijing, Oct 2012. Muhammad Bilal Amin, Wajahat Ali Khan, Ammar Ahmad Awan and Sungyoung Lee, Intercloud Message Exchange Middleware, 7th ICUIMC 2012, Kuala Lampur, Malaysia, Feb 2012. 37
  • Slide 38
  • Publications Journals Muhammad Bilal Amin, Wajahat Ali Khan and Sungyoung Lee, SPHeRe: A performance initiative towards ontology matching by implementing parallelism over cloud platforms, Jr. of Supercomputing (SCI, IF 0.9), 2013 Wajahat Ali Khan, Maqbool Hussain, Muhammad Afzal, Muhammad Bilal Amin, Muhammad Aamir Saleem, and Sungyoung Lee, Personalized- Detailed Clinical Model for Data Interoperability among Clinical Standards, Telemedicine and EHealth (SCI, IF:1.416), 2013 Muhammad Bilal Amin, Wajahat Ali Khan and Sungyoung Lee, Enabling Data Parallelism for Large-scale Bio-medical Ontology Matching over Multicore Platforms, Jr. of Applied Intelligence (SCI, IF 1.8) (under review), 2014 38
  • Slide 39
  • Conclusion HPC over cloud is a very cost effective solution with all the ability that can be provided by expensive clusters or grids To fully exploit its utilization, efforts are required to implement platforms and applications for computation and data intensive problems. Applications like SPHeRe can be built to provide resolution of compute and data intensive problems over multicore platforms for performance needs. Commodity hardware consumes lesser man hours for maintenance and consume far less of energy which makes it an excellent candidate for Green Computing. 39
  • Slide 40
  • Slide 41
  • References 1. N. Carriero, M. V. Osier, K.-H. Cheung, P. L. Miller, M. Gerstein, H. Zhao, B. Wu, S. Rifkin, J. T. Chang, H. Zhang, K. White, K. Williams, M. H. Schultz, Case report: A high productivity/low main- tenance approach to high-performance computation for biomedicine: Four case studies., JAMIA 12 (1) (2005) 9098. 2. G. Bueno, R. Gonzlez, O. Dniz, M. Garca-Rojo, J. Gonzlez-Garca, M. Fernndez-Carrobles, N. Vllez, J. Salido, A parallel solution for high resolution histological image analysis, Computer Methods and Programs in Biomedicine 108 (1) (2012) 388 401. doi:http://dx.doi.org/10.1016/j.cmpb.2012. 03.007. 3. F. Perez, J. Huguet, R. Aguilar, L. Lara, I. Larrabide, M. Villa-Uriol, J. Lpez, J. Macho, A. Rigo, J. Rossell, S. Vera, E. Vivas, J. Fernndez, A. Arbona, A. Frangi, J. H. Jover, M. G. Ballester, Radstation3g: A platform for cardiovascular image analysis integrating pacs, 3d+t visualization and grid computing, Computer Methods and Programs in Biomedicine 110 (3) (2013) 399 410. doi:http://dx.doi.org/10.1016/j.cmpb.2012.12.002. 4. A. Eklund, M. Andersson, H. Knutsson, fmri analysis on the gpupossibilities and challenges, Computer Methods and Programs in Biomedicine 105 (2) (2012) 145 161. doi:http://dx.doi.org/10.1016/ j.cmpb.2011.07.007. 5. E. I. Konstantinidis, C. A. Frantzidis, C. Pappas, P. D. Bamidis, Real time emotion aware applications: A case study employing emotion evocative pictures and neuro-physiological sensing enhanced by graphic processor units, Computer Methods and Programs in Biomedicine 107 (1) (2012) 16 27, advances in Biomedical Engineering and Computing: the conference case. doi:http://dx.doi.org/10.1016/j. cmpb.2012.03.008. 6. H. L opez-Fern andez, M. Reboiro-Jato, D. Glez-Pea, F. Aparicio, D. Gachet, M. Buenaga, F. Fdez- Riverola, Bioannote: A software platform for annotating biomedical documents with application in medical learning environments, Computer Methods and Programs in Biomedicine 111 (1) (2013) 139 147. doi:http://dx.doi.org/10.1016/j.cmpb.2013.03.007. 7. J. Cimino, X. Zhu, of on, IMIA Yearbook of Medical 1 (1) (2006) 124135. 8. D. Isern, D. Snchez, A. Moreno, Ontology-driven execution of clinical guidelines, Computer Methods and Programs in Biomedicine 107 (2) (2012) 122 139. doi:http://dx.doi.org/10.1016/j.cmpb. 2011.06.006. 9. P. De Potter, H. Cools, K. Depraetere, G. Mels, P. Debevere, J. De Roo, C. Huszka, D. Colaert, E. Mannens, R. Van De Walle, Semantic patient information aggregation and medicinal decision support, Comput. Methods Prog. Biomed. 108 (2) (2012) 724735. doi:10.1016/j.cmpb.2012.04.002. URL http://dx.doi.org/10.1016/j.cmpb.2012.04.002 41