Download OLAS Operator`s Guide
Transcript
ESO 9 OLAS Operator’s Guide Doc: VLT-MAN-ESO-19400-1557 Issue 2 Date: 18/6/02 Page: 29 How the supervisor (watch-dog) works For each running OLAS task (except vcsolac), there is a watchdog process that looks after it. In case the monitored task is killed or dies for any reason (except when this is done by a cleanup script), the watchdog will restart it using the same options. When the watchdog restarts a task, it sends also an e-mail to the address specified by the variable $OLAS_MGR, in order to notify to the operator that a problem occurred. When a normal shutdown of a process is performed through the standard cleanup procedure, the relevant watchdog task is killed before the monitored task. Example: on the DHS workstation, the output of the command “ps -ef | grep archeso” should look like the following: archeso 13033 archeso 13050 archeso 13035 archeso 13054 1 1 1 1 0 19:49:30 pts/0 1 19:49:36 pts/0 0 19:49:35 pts/0 0 19:49:41 pts/0 0:00 dhs -dhsdata /data/raw ... 0:00 frameIngest -dhsdata /data/raw ... 0:00 /bin/sh ./watchdog-DHS 0:00 /bin/sh ./watchdog-FrameIngest The command show-olas should return something like the following output: using DHS_DATA = /data/raw for data files using BAD_DIR = /data/bad for bad files using DHS_LOG = /data/msg for log files using DHS_HOST = wu1dhs using DHS_CONFIG = archeso@wu1dhs:/data/msg FrameIngest: FrameIngest-wu1dhs-watchdog (pid 13054) FrameIngest-wu1dhs (pid 13050) DHS: DHS-wu1dhs-watchdog (pid 13035) DHS-wu1dhs (pid 13033) that shows all the running tasks and watchdogs with the corresponding process ids.