maximizing the halo detector’s uptime...halo – maximizing uptime - 2016. automatic power...

Post on 07-Aug-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Maximizing the HALO Detector’s Uptime

3

3 .

3

3

3

.

Stéphane Venne – SNOLAB User’s Meeting 2016

Helium And Lead Observatory

• Operational since May 2012

• 79 tons of lead and 128 Helium-3 counters

• Low cost, low maintenance, long lifetime, dedicated supernova detector

• Supernovas occur approximately 3 times per century

• Crucial that HALO be operational all the time, to not miss the next supernova

HALO – Maximizing Uptime - 2016

2

Current live time plot

3

HALO – Maximizing Uptime - 2016

Causes of downtime

4

Current live time fraction: ~95%

Cause Solution

Active detector work and maintenance

Testing on Laurentian spare equipment. Should decrease now that most systems are in place.

Power outages UPS, automatic shutdown and startup of detector

Hardware malfunction Redundant hardware and automatic sentry

Maximize the uptime with automated systems and proceduresReduces the response time

HALO – Maximizing Uptime - 2016

Sentry – DAQ monitoring

5

Overview• For redundancy: 2 DAQ computers• Both running ORCA• For consistency, configuration files and run scripts are stored on Dropbox• Thus, consistent run number• Primary and secondary DAQs

DAQ failure• Primary is running, secondary is

on standby• If primary does not respond to

pings, secondary takes over and raises alarm

Toggle scheduler• Automatic transfer/switch at

set interval to test system

HALO – Maximizing Uptime - 2016

Sentry – SBC monitoring

6

• ORCA will only start a run if all data taking hardware is present• In HALO, these are the 2 SBCs• If an SBC becomes unreachable, run in progress is stopped• Redundant hardware – can operate half the detector

Full detector operation is preferable.Checks if the removed SBCs are reachable at the end of a run.

If they are, next run starts with both SBCs taking data.

Raises alarm

HALO – Maximizing Uptime - 2016

Automatic power management

7

• Laboratory is subject to scheduled (and unscheduled) power outages

• Uninterruptible Power Supply (UPS) installed• Can maintain detector operational for ~3h on battery• If outage longer than 3h, systems must be shutdown gracefully• Systems must be booted when power restored

Network UPS Tools (NUT)UPS

HALO – Maximizing Uptime - 2016

8

1

2

3

4

Power outage

Numbers in red indicate possible times at which power could be restored. Must be ready for every scenario.

HALO – Maximizing Uptime - 2016

9

• Can start full or half detector, depending on available hardware• Can restart SBCs if necessary• Sends email to report the process activity

Power restorationHALO – Maximizing Uptime - 2016

Next steps…

10

• Sentry has been running since July• A minor bug needs to be fixed• Automatic power management system has recently

been installed• Needs to be tested a few more times• Needs to be tested in a real power outage• Add statistics recording – how much time we gain with

automated system

HALO – Maximizing Uptime - 2016

11

Any questions?

Thank you!

HALO – Maximizing Uptime - 2016

top related