Download OLAS Operator`s Guide

Transcript
ESO
9
OLAS Operator’s Guide
Doc: VLT-MAN-ESO-19400-1557
Issue 2
Date: 18/6/02
Page: 29
How the supervisor (watch-dog) works
For each running OLAS task (except vcsolac), there is a watchdog process that looks after it. In case
the monitored task is killed or dies for any reason (except when this is done by a cleanup script), the
watchdog will restart it using the same options.
When the watchdog restarts a task, it sends also an e-mail to the address specified by the variable
$OLAS_MGR, in order to notify to the operator that a problem occurred.
When a normal shutdown of a process is performed through the standard cleanup procedure, the
relevant watchdog task is killed before the monitored task.
Example: on the DHS workstation, the output of the command “ps -ef | grep archeso”
should look like the following:
archeso 13033
archeso 13050
archeso 13035
archeso 13054
1
1
1
1
0 19:49:30 pts/0
1 19:49:36 pts/0
0 19:49:35 pts/0
0 19:49:41 pts/0
0:00 dhs -dhsdata /data/raw ...
0:00 frameIngest -dhsdata /data/raw ...
0:00 /bin/sh ./watchdog-DHS
0:00 /bin/sh ./watchdog-FrameIngest
The command show-olas should return something like the following output:
using DHS_DATA = /data/raw for data files
using BAD_DIR = /data/bad for bad files
using DHS_LOG = /data/msg for log files
using DHS_HOST = wu1dhs
using DHS_CONFIG = archeso@wu1dhs:/data/msg
FrameIngest:
FrameIngest-wu1dhs-watchdog (pid 13054)
FrameIngest-wu1dhs (pid 13050)
DHS:
DHS-wu1dhs-watchdog (pid 13035)
DHS-wu1dhs (pid 13033)
that shows all the running tasks and watchdogs with the corresponding process ids.