![Page 1: Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Nandez](https://reader030.vdocument.in/reader030/viewer/2022020301/5a655f4c7f8b9af13a8b4723/html5/thumbnails/1.jpg)
Analyzing Andromeda Galaxy data using SparkJose NandezSHARCNET – University of Western [email protected]
![Page 2: Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Nandez](https://reader030.vdocument.in/reader030/viewer/2022020301/5a655f4c7f8b9af13a8b4723/html5/thumbnails/2.jpg)
What is ?• Shared Hierarchical Academic Research
NETwork, • A consortium of 18 Ontario academic
institutions, lead by University of Western Ontario
• Partner of Compute Canada that oversees funding and distribution of equipment.
• Sysadmins and HPC specialist, 20 in total, distributed across 6 institutions.
![Page 3: Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Nandez](https://reader030.vdocument.in/reader030/viewer/2022020301/5a655f4c7f8b9af13a8b4723/html5/thumbnails/3.jpg)
What does SHARCNET do?• Provides service and support to all SHARCNET
researchers in High Performance Computing.• Researchers are part of partner universities across
Ontario.• Starting to provide service for large data needs:
– With storage and processing of large data sets– Data processing using Spark, Hadoop, etc– Data mining and Machine Learning
![Page 4: Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Nandez](https://reader030.vdocument.in/reader030/viewer/2022020301/5a655f4c7f8b9af13a8b4723/html5/thumbnails/4.jpg)
What is the Andromeda Galaxy?• Known as M31, or Messier 31• Spiral galaxy• 2.5 million light-years• Closest galaxy• Bigger galaxy than ours
![Page 5: Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Nandez](https://reader030.vdocument.in/reader030/viewer/2022020301/5a655f4c7f8b9af13a8b4723/html5/thumbnails/5.jpg)
Why Andromeda galaxy?• Cool wallpaper• t-shirts,• Mugs …• Science?
![Page 6: Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Nandez](https://reader030.vdocument.in/reader030/viewer/2022020301/5a655f4c7f8b9af13a8b4723/html5/thumbnails/6.jpg)
Andromeda Galaxy in Science• It has a ~ trillion stars• 2.5 times longer than our galaxy• Thought to have merged with another galaxy• It contains about 26 known black holes• It can be used as a galaxy laboratory for
extragalactic astronomy• Our galaxy will collide with it (in about 4 billion
years)
![Page 7: Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Nandez](https://reader030.vdocument.in/reader030/viewer/2022020301/5a655f4c7f8b9af13a8b4723/html5/thumbnails/7.jpg)
Why Andromeda?
![Page 8: Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Nandez](https://reader030.vdocument.in/reader030/viewer/2022020301/5a655f4c7f8b9af13a8b4723/html5/thumbnails/8.jpg)
Particularly…• It has been recognized the
extension of Andromeda.• The area shows the
extension of the galaxy, further than thought before.
• M. Rafiei Ravandi et al 2016.
![Page 9: Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Nandez](https://reader030.vdocument.in/reader030/viewer/2022020301/5a655f4c7f8b9af13a8b4723/html5/thumbnails/9.jpg)
Extended Andromeda• They were taken from
Spitzer-IRAC which is an Infrared telescope.
• It has 426,529 new sources.
• Extends observations for disc and halo.
![Page 10: Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Nandez](https://reader030.vdocument.in/reader030/viewer/2022020301/5a655f4c7f8b9af13a8b4723/html5/thumbnails/10.jpg)
Classification of these objects• Do all these sources (426,529) are part
of Andromeda?• Are they all known from previous
catalogs?• What type of object (such as Black holes,
galaxies, etc) are those new sources?• What can we learn from these new
objects?
![Page 11: Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Nandez](https://reader030.vdocument.in/reader030/viewer/2022020301/5a655f4c7f8b9af13a8b4723/html5/thumbnails/11.jpg)
Which catalogs?• Astronomical databases :
– SIMBAD (39,022)– NED (126,862)– MAST (118,854,914)
• Sources only around M31, sources are in different wavelengths (IR, Optical, UV)
• Then compare them with the observed objects.
![Page 12: Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Nandez](https://reader030.vdocument.in/reader030/viewer/2022020301/5a655f4c7f8b9af13a8b4723/html5/thumbnails/12.jpg)
How hard could it be?
𝑊𝑒𝑑𝑒𝑓𝑖𝑛𝑒𝑑2))𝑜𝑟2arcsecasagoodmatch. Arcsec = 1/3600∘, angular measurement, not linear measurement (such as miles/km).
![Page 13: Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Nandez](https://reader030.vdocument.in/reader030/viewer/2022020301/5a655f4c7f8b9af13a8b4723/html5/thumbnails/13.jpg)
𝑘𝑒𝑦 − 𝑣𝑎𝑙𝑢𝑒 =𝑅𝐴, 𝐷𝐸𝐶 , 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 →
join(), groupByKey(), filter(), map(), sortByKey()
![Page 14: Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Nandez](https://reader030.vdocument.in/reader030/viewer/2022020301/5a655f4c7f8b9af13a8b4723/html5/thumbnails/14.jpg)
NED+SIMBAD+IRAC
![Page 15: Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Nandez](https://reader030.vdocument.in/reader030/viewer/2022020301/5a655f4c7f8b9af13a8b4723/html5/thumbnails/15.jpg)
Counts?• 613 Stars• 70 Globular Cluster• 63 X-rays sources• 62 Galaxies• 52 Star clusters• Total known
sources: 1,391
![Page 16: Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Nandez](https://reader030.vdocument.in/reader030/viewer/2022020301/5a655f4c7f8b9af13a8b4723/html5/thumbnails/16.jpg)
And the rest?• They are not part of SIMBAD, NED or MAST• What about other catalog?• Can we classify them?• Can we use machine learning?
![Page 17: Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Nandez](https://reader030.vdocument.in/reader030/viewer/2022020301/5a655f4c7f8b9af13a8b4723/html5/thumbnails/17.jpg)
Conclusions• MAST has a higher resolution than IRAC-catalog,
SIMBAD and NED. • Only 1,391 known sources from a matched between
NED + SIMBAD + IRAC-catalog.• The rest could be classified using ML using the
known object features in order to give a classification.
• We need more data for a better classification.
![Page 18: Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Nandez](https://reader030.vdocument.in/reader030/viewer/2022020301/5a655f4c7f8b9af13a8b4723/html5/thumbnails/18.jpg)
Thank You!Collaborator: Prof. Pauline Barmby, Department of Physics, University of Western Ontario
Photos:Mainly from NASA, ESO, EarthSky, MacOS.