Download Parallel Debugging Techniques - Great Lakes Consortium for

Transcript
Parallel Debugging Techniques
Parallel Debugging Techniques
Le Yan
Louisiana Optical Network Initiative
8/3/2009
Scaling to Petascale Virtual Summer School
Outline
• Overview of parallel debugging
Overview of parallel debugging
– Challenges
– Tools
– Strategies
• G
Get familiar with TotalView/DDT through t f ili
ith T t lVi /DDT th
h
hands‐on exercises
8/3/2009
Scaling to Petascale Virtual Summer School
1
Bugs in Parallel Programming
Bugs in Parallel Programming
• Parallel
Parallel programs are prone to the usual bugs programs are prone to the usual bugs
found in sequential programs
– Improper pointer usage
p p p
g
– Stepping over array bounds
– Infinite loops
–…
• Plus…
8/3/2009
Scaling to Petascale Virtual Summer School
2
Common Types of Bugs in Parallel Programming
• Erroneous use of language features
Erroneous use of language features
– Mismatched parameters, missing mandatory calls etc.
•
•
•
•
Defective space decomposition
Incorrect/improper synchronization
/ p p y
Hidden serialization
……
http://www.hpcbugbase.org/index.php/Main_Page
8/3/2009
Scaling to Petascale Virtual Summer School
3
Debugging Essentials
Debugging Essentials
• Reproducibility
p
y
– Find the scenario where the error is reproducible
• Reduction
– Reduce the problem to its essence
• Deduction
– Form hypotheses on what the problem might be
Form hypotheses on what the problem might be
• Experimentation
– Filter out invalid hypotheses
Filter out invalid hypotheses
Terrence Parr, Learn The Essentials of Debugging
http://www.ibm.com/developerworks/web/library/wa‐debug.html?ca=dgr‐lnxw03Dbug
8/3/2009
Scaling to Petascale Virtual Summer School
4
Challenges in Parallel Debugging
Challenges in Parallel Debugging
• Reproducibility
– Many problems cannot be easily reproduced
• Reduction
– SSmallest scale might still be too large and complex to a est sca e g t st be too a ge a d co p e to
handle
• Deduction
– Need
Need to consider concurrent and interdependent to consider concurrent and interdependent
program instances
• Experimentation
– Cyclic debugging might be very expensive
Cyclic debugging might be very expensive
8/3/2009
Scaling to Petascale Virtual Summer School
5
A Nasty Little Bug
A Nasty Little Bug
…
i
integer*4 :: i,ista,iend
*4 i i i d
integer*4 :: chunksize=1024*1024
…
call MPI_Comm_Rank(MPI_COMM_WORLD, & myrank,error)
k
)
…
ista=myrank*chunksize+1
iend=(myrank+1)*chunksize
d i it i d
do i = ista,iend
…
enddo
…
8/3/2009
• What is the potential problem?
Scaling to Petascale Virtual Summer School
6
A Nasty Little Bug
A Nasty Little Bug
…
i
integer*4 :: i,ista,iend
*4 i i i d
integer*4 :: chunksize=1024*1024
…
call MPI_Comm_Rank(MPI_COMM_WORLD, & myrank,error)
k
)
…
ista=myrank*chunksize+1
iend=(myrank+1)*chunksize
d i it i d
do i = ista,iend
…
Integer overflow if enddo
myrank ≥ 4096 !
…
8/3/2009
• A bug that shows g
up only when g
running with more than 4096 cores
Scaling to Petascale Virtual Summer School
7
printf/write Debugging
printf/write Debugging
• Extremely easy to use, therefore y
y
,
dangerously attractive, but…
– Need to edit, recompile and rerun when additional information is desired
additional information is desired
– May change program behavior
– Only capable of displaying a subset of the program’s state
’
– Output size grows rapidly with increasing core count and harder to comprehend
• Not scalable, not recommended
8/3/2009
Scaling to Petascale Virtual Summer School
8
Compilers Can Help
Compilers Can Help
• Most compilers can (at runtime)
ost co p e s ca (at u t e)
– Check array bounds
– Trap floating operation errors
– Provide traceback information
• Relatively scalable, but…
– Overhead added
– Limited capability
– Non‐interactive
N i t
ti
8/3/2009
Scaling to Petascale Virtual Summer School
9
Parallel Debuggers
Parallel Debuggers
• Capable of what serials debuggers can do
Capab e o
at se a s debugge s ca do
– Control program execution
– Set action points
– View/change values of variables
• More importantly
– Control program execution at various levels
• Group/process/thread
– View MPI message queues
View MPI message queues
8/3/2009
Scaling to Petascale Virtual Summer School
10
An Ideal Parallel Debugger
An Ideal Parallel Debugger
• Should
Should allow easy process/thread control and allow easy process/thread control and
navigation
• Should support multiple high performance Should support multiple high performance
computing platforms
• Should not limit the number of processes Should not limit the number of processes
being debugged and should allow it to vary at runtime
8/3/2009
Scaling to Petascale Virtual Summer School
11
How Parallel Debuggers Work
How Parallel Debuggers Work
• Frontend
o te d
User processes
– GUI
– Debugger engine
• Debugger Agents
– Control application l
processes
– Send data back Send data back
to the debugger engine to analyze
8/3/2009
…
Agent
Agent
Agent
Compute nodes
D b
Debugger engine
i
GUI
Scaling to Petascale Virtual Summer School
Interactive node
12
At Very Large Scale
At Very Large Scale
• The
The debugger itself becomes a large parallel debugger itself becomes a large parallel
application
• Bottlenecks
– Debugger framework startup cost
– Communication between frontend and agents
C
i i b
f
d d
– Access to shared resources, e.g. file system
8/3/2009
Scaling to Petascale Virtual Summer School
13
Validation Is Crucial
Validation Is Crucial
• Have
Have a solid validation procedure to check the a solid validation procedure to check the
correctness
• Test smaller components before putting them Test smaller components before putting them
together
8/3/2009
Scaling to Petascale Virtual Summer School
14
General Parallel Debugging Strategy
General Parallel Debugging Strategy
• Incremental debugging
c e e ta debugg g
– Downscale if possible
• Participating processes, problem size and/or number of it ti
iterations
• Example: run with one single thread to detect scope errors in OpenMP programs
– Add more instances to reveal other issues
• Example: run MPI programs on more than one node to detect problems introduced by network delays
p
y
y
8/3/2009
Scaling to Petascale Virtual Summer School
15
Strategy at Large Scale
Strategy at Large Scale
• Again, downscale if possible
• Reduce the number of processes to which the debugger is attached
– Reduces overhead
– Reduces the required number of license seats as well
• Focus on one or a small number of processes/threads
– Analyze
Analyze call path and message queues to find problematic call path and message queues to find problematic
processes
– Control the execution of as few processes/threads as possible while keeping others running
• Provides the context where the error occurs
8/3/2009
Scaling to Petascale Virtual Summer School
16
Trends in Debugging Technology
Trends in Debugging Technology
• Lightweight trace analysis tools – Help to identify processes/threads that have similar behavior and reduce the search space
– Complementary to full feature debuggers
– Example: Stack Trace Analysis Tool (STAT)
• Replay/Reverse execution
– ReplayEngine
p y g now available from TotalView
– Checkpointing supported in DDT 2.4
• Post‐mortem statistical analysis
– Detect
Detect anomalies by analyzing profile dissimilarity of multiple anomalies by analyzing profile dissimilarity of multiple
runs
8/3/2009
Scaling to Petascale Virtual Summer School
17
Hands‐on
Hands
on Exercise
Exercise
• Debug MPI and OpenMP
ebug
a d Ope
programs that solve a p
og a s t at so e a
simple problem to get familiar with
– Basic functionalities of parallel debuggers
• TotalView: Pople, Bluefire and Athena
• DDT: Ranger
– Some common types of bugs in parallel programming
Some common types of bugs in parallel programming
• Programs and instructions can be found at p //
/ y /
http://www.cct.lsu.edu/~lyan1/summerschool09
8/3/2009
Scaling to Petascale Virtual Summer School
18
Problem
0 1 2 3 4 5 6 7 8 … 4 5
• A 1‐D periodic array with N elements
• Initial value
– C: cell(x)=x%10
– Fortran: cell(x)=mod(x‐1,10)
• In each iteration, all elements are updated with the value of two adjacent elements:
– cell(x)i+1=[cell(x‐1)i+cell(x+1)i]%10
• Execute Niter iterations
p
g
g
• The final outputs are the global maximum and average
http://www.hpcbugbase.org/index.php/Main_Page
8/3/2009
Scaling to Petascale Virtual Summer School
19
Sequential Program
Sequential Program
• Use
Use an integer array to hold current values
a tege a ay to o d cu e t a ues
• Use another integer array to hold the calculated values
• Swap the pointers at the end of each iteration
• The result is used to check the correctness of the parallel programs
– Chances are that we will not have such a luxury for l
large jobs
j b
8/3/2009
Scaling to Petascale Virtual Summer School
20
MPI Program
MPI Program
0 1 2 3 4 5 6 7 8 … 4 5
5 0 1 2 … 5 6
5 6 7 8 … 2 3
Process 1
Process 2
……
7 8 9 0 … 5 0
Process n
• Di
Divide the array among n
id th
processes
• Each process works on its local array
• Exchange boundary data with neighbor processes at the end of each iteration
each iteration
• Ring topology 8/3/2009
Scaling to Petascale Virtual Summer School
21
OpenMP Program
0 1 2 3 4 5 6 7 8 … 4 5
Thread 0
Thread 1 … Thread n
• Each thread works on its own part of the p
global array
• All threads have access to the entire array, so y,
no data exchange is necessary
8/3/2009
Scaling to Petascale Virtual Summer School
22
Three Ways to Start TotalView/DDT
Three Ways to Start TotalView/DDT
• Start with core dumps
• Start by attaching to one or more running processes
Start by attaching to one or more running processes
• Start the executable within TotalView/DDT
8/3/2009
Scaling to Petascale Virtual Summer School
23
TotalView – Root Window
Root Window
Host name
Status Code
Description
Blank
Exited
B
At breakpoint
E
Error
H
Held
K
In kernel
M
Mixed
R
Running
T
Stopped
W
At watchpoint
TotalView ID
8/3/2009
Status
MPI Rank
Scaling to Petascale Virtual Summer School
24
TotalView – Process Window
Process Window
• Stack trace pane
p
– Call stack of routines
• Stack frame pane
– LLocal variables, registers l i bl
i
and function parameters
• Source pane
– Source code
• Action points, processes, threads pane
threads pane
– Manage action points, processes and threads
8/3/2009
Scaling to Petascale Virtual Summer School
25
DDT ‐ Main Window
DDT Main Window
Project window
Process group window
Variable window
Source code window
Parallel stack view and output window
8/3/2009
Evaluation window
Scaling to Petascale Virtual Summer School
26
Controlling Execution
Controlling Execution
• The process window (TotalView) or main window p
(
)
(DDT) always focuses on one process/thread
• Switch between processes/threads
– TotalView: p+/p‐, t+/t‐, double click in root window, l
/
/ d bl l k
d
process/thread tab
– DDT: click on process rank in process window
p
p
• Need to set the appropriate scope when
– Giving control commands
– Setting action points
8/3/2009
Scaling to Petascale Virtual Summer School
27
Control Commands
Control Commands
TotalView
DDT
Description
Go
Play/Continue
Start/resume execution
Halt
Pause
Stop execution
Kill
Terminate the job
Restart
Restarts a running program
Next
Step over
Run to the next source line without stepping into another function
Step
Step into
Run to next source line
Out
Step out
Run to the completion of current function
Run to
Run to line
Run to the indicated location
8/3/2009
Scaling to Petascale Virtual Summer School
28
Process/Thread Groups ‐ TotalView
Process/Thread Groups • Scope of commands and action points
i t
– Group(control)
• All processes and threads
– Group(workers)
• All threads that are executing user code
– Rank X
• Current process and its threads
Current process and its threads
– Process(workers)
• User threads in the current process
– Thread X.Y
• Current thread
C
t th d
– User defined group
• Group ‐> Custom Groups, or
• Create in call graph
8/3/2009
Scaling to Petascale Virtual Summer School
29
Process/Thread Groups ‐ DDT
Process/Thread Groups • Create custom groups
– Ctrl+click on all desired processes
– Right click on the process window then Ri h li k
h
i d
h
“create group”
8/3/2009
Scaling to Petascale Virtual Summer School
30
Action Points
Action Points
• Breakpoints stop the execution of the processes and threads that reach it
– Unconditional – Conditional: stop only if the condition is satisfied
– Evaluation: stop and execute a code fragment when reached
• Useful when testing small patches
• Process barrier points synchronize a set of processes or threads
– TotalView only
• Watchpoints
p
monitor a location in memory and stop y
p
execution when its value changes
8/3/2009
Scaling to Petascale Virtual Summer School
31
Setting Action Points ‐ TotalView
Setting Action Points • Breakpoints
– Right click on a source line ‐> Set breakpoint
– Click on the line number
• Watch points
Watch points
– Right click on a variable ‐> Create watchpoint
• Barrier points
– Right click on a source line ‐> Set barrier
• Edit action point property
– Right
Right click on a action point in the Action Points tab ‐> click on a action point in the Action Points tab ‐>
Properties
8/3/2009
Scaling to Petascale Virtual Summer School
32
Setting Action Points ‐ DDT
Setting Action Points • Breakpoints
p
– Double click on a source code line
– Right click in the Breakpoints tab Right click in the Breakpoints tab ‐>> Add Add
breakpoint
• Watch points
Watch points
– Right click on a variable ‐> Add to Watches
– Right click in the Watches tab ‐> Add Watch
Right click in the Watches tab > Add Watch
8/3/2009
Scaling to Petascale Virtual Summer School
33
Viewing/Editing Data
Viewing/Editing Data
• View values and types of variables
– At one process/thread
– Across all processes/threads
• Edit variable value and type
yp
• Array Data
–
–
–
–
8/3/2009
Slicing
Filtering
Filtering Visualization
Statistics
Scaling to Petascale Virtual Summer School
34
Viewing/Editing Data ‐ TotalView
Viewing/Editing Data • Viewing data in
e g data
– Stack frame
– Expression list
– Variable window (dive on a variable by double clicking on its name)
• Editing data by clicking on the value in
Editi d t b li ki
th
l i
– Stack frame
– Variable window
Variable window
8/3/2009
Scaling to Petascale Virtual Summer School
35
Viewing/Editing Data ‐ DDT
Viewing/Editing Data • Viewing data in
Viewing data in
– Variable window (in the main window)
– Evaluation window
Evaluation window
• Editing data
– Right click on the variable name in the evaluation window
– Then choose “Edit value” or “Edit type”
Th
h
“Edit l ” “Edit t ”
8/3/2009
Scaling to Petascale Virtual Summer School
36
Viewing Dynamic Arrays in C/C++
Viewing Dynamic Arrays in C/C++
• TotalView
• DDT
– Edit “type” in the variable window
– Tell TotalView how to interpret the memory from a starting location
from a starting location
– Example
– Drag a pointer variable into the evaluation window
– Right click on the variable ‐>> “View
variable View as as
vector”
• To view an array of 100 integers
– Int * ‐> int[100]*
8/3/2009
Scaling to Petascale Virtual Summer School
37
MPI Message Queues
MPI Message Queues
• Detect
– Deadlocks
– Load balancing issues
• TotalView
– Tools ‐> Message Queue Graph
– More options available
M
ti
il bl
• DDT
– View ‐> Message Queues
g Q
8/3/2009
Scaling to Petascale Virtual Summer School
38
TotalView ‐ Call Graph
Call Graph
• Tools
Tools ‐>> Call graph
Call graph
• Quick view of program state
– Nodes: functions
– Edges: calls
• Look for outliers
8/3/2009
Scaling to Petascale Virtual Summer School
39
DDT ‐ Parallel Stack View
DDT Parallel Stack View
• Allow
Allow users to see the users to see the
position of each process/thread in the source code in same window
• Hover over any function to see a list of processes that are currently at
that are currently at that location
8/3/2009
Scaling to Petascale Virtual Summer School
40
References
• TotalView user manual
– http://www.totalviewtech.com/support/documentation/totalview/ind
ex.html
• DDT user manual
– http://www.allinea.com/downloads/userguide.pdf
http://www allinea com/downloads/userguide pdf
• LLNL TotalView tutorial
– https://computing.llnl.gov/tutorials/totalview
• NCSA Cyberinfrastructure
NCSA Cyberinfrastructure Tutor
– “Debugging Serial and Parallel Codes” course
• HPCBugBase
– http://hpcbugbase.org/index.php/Main_Page
http://hpcbugbase org/index php/Main Page
8/3/2009
Scaling to Petascale Virtual Summer School
41