No category

Download Blue Gene/L: Application Development

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

Transcript

Dispatcher program
The dispatcher program first pulls a task submission message off the work queue. Then it
waits on a socket for a launcher connection and reads the launcher ID from the socket. It
writes the task into the socket, and the association between task and launcher is stored in a
table. The table stores the last task dispatched to the launcher program. This connection is an
indication that the last task has completed and the task completion message can be
published back to the client. Figure 12-3 shows the entire cycle of a job submitted in HTC
mode.
Figure 12-3 HTC job cycle
The intention of this design is to optimize the launcher program. The dispatcher program
spends little time between connect and dispatch, so latency volatility is mainly due to the
waiting time for dispatcher program connections. After rebooting, the launcher program
connects to the dispatcher program and passes the completion information back to the
dispatcher program. To assist task status resolution, the Compute Node Kernel stores the
exit status of the last running process in a buffer. After the launcher program restarts, the
contents of this buffer can be written to the dispatcher and stored in the task completion
message.
Launcher program
The launcher program is intentionally kept simple. Arguments to the launcher program
describe a socket connection to the dispatcher. When the launcher program starts, it
connects to this socket, writes its identity into the socket, and waits for a task message. Upon
receipt of the task message, the launcher parses the message and calls the execve system
call to execute the task. When the task exits (for any reason), the Compute Node Kernel
restarts the launcher program again. The launcher program is not a container for the
application. Therefore, regardless of what happens to the application, the launcher program
will not fail to restart.
Chapter 12. High Throughput Computing on Blue Gene/L
137

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Download Blue Gene/L: Application Development