Download Parallel Debugging Techniques - Great Lakes Consortium for
Transcript
Parallel Debugging Techniques Parallel Debugging Techniques Le Yan Louisiana Optical Network Initiative 8/3/2009 Scaling to Petascale Virtual Summer School Outline • Overview of parallel debugging Overview of parallel debugging – Challenges – Tools – Strategies • G Get familiar with TotalView/DDT through t f ili ith T t lVi /DDT th h hands‐on exercises 8/3/2009 Scaling to Petascale Virtual Summer School 1 Bugs in Parallel Programming Bugs in Parallel Programming • Parallel Parallel programs are prone to the usual bugs programs are prone to the usual bugs found in sequential programs – Improper pointer usage p p p g – Stepping over array bounds – Infinite loops –… • Plus… 8/3/2009 Scaling to Petascale Virtual Summer School 2 Common Types of Bugs in Parallel Programming • Erroneous use of language features Erroneous use of language features – Mismatched parameters, missing mandatory calls etc. • • • • Defective space decomposition Incorrect/improper synchronization / p p y Hidden serialization …… http://www.hpcbugbase.org/index.php/Main_Page 8/3/2009 Scaling to Petascale Virtual Summer School 3 Debugging Essentials Debugging Essentials • Reproducibility p y – Find the scenario where the error is reproducible • Reduction – Reduce the problem to its essence • Deduction – Form hypotheses on what the problem might be Form hypotheses on what the problem might be • Experimentation – Filter out invalid hypotheses Filter out invalid hypotheses Terrence Parr, Learn The Essentials of Debugging http://www.ibm.com/developerworks/web/library/wa‐debug.html?ca=dgr‐lnxw03Dbug 8/3/2009 Scaling to Petascale Virtual Summer School 4 Challenges in Parallel Debugging Challenges in Parallel Debugging • Reproducibility – Many problems cannot be easily reproduced • Reduction – SSmallest scale might still be too large and complex to a est sca e g t st be too a ge a d co p e to handle • Deduction – Need Need to consider concurrent and interdependent to consider concurrent and interdependent program instances • Experimentation – Cyclic debugging might be very expensive Cyclic debugging might be very expensive 8/3/2009 Scaling to Petascale Virtual Summer School 5 A Nasty Little Bug A Nasty Little Bug … i integer*4 :: i,ista,iend *4 i i i d integer*4 :: chunksize=1024*1024 … call MPI_Comm_Rank(MPI_COMM_WORLD, & myrank,error) k ) … ista=myrank*chunksize+1 iend=(myrank+1)*chunksize d i it i d do i = ista,iend … enddo … 8/3/2009 • What is the potential problem? Scaling to Petascale Virtual Summer School 6 A Nasty Little Bug A Nasty Little Bug … i integer*4 :: i,ista,iend *4 i i i d integer*4 :: chunksize=1024*1024 … call MPI_Comm_Rank(MPI_COMM_WORLD, & myrank,error) k ) … ista=myrank*chunksize+1 iend=(myrank+1)*chunksize d i it i d do i = ista,iend … Integer overflow if enddo myrank ≥ 4096 ! … 8/3/2009 • A bug that shows g up only when g running with more than 4096 cores Scaling to Petascale Virtual Summer School 7 printf/write Debugging printf/write Debugging • Extremely easy to use, therefore y y , dangerously attractive, but… – Need to edit, recompile and rerun when additional information is desired additional information is desired – May change program behavior – Only capable of displaying a subset of the program’s state ’ – Output size grows rapidly with increasing core count and harder to comprehend • Not scalable, not recommended 8/3/2009 Scaling to Petascale Virtual Summer School 8 Compilers Can Help Compilers Can Help • Most compilers can (at runtime) ost co p e s ca (at u t e) – Check array bounds – Trap floating operation errors – Provide traceback information • Relatively scalable, but… – Overhead added – Limited capability – Non‐interactive N i t ti 8/3/2009 Scaling to Petascale Virtual Summer School 9 Parallel Debuggers Parallel Debuggers • Capable of what serials debuggers can do Capab e o at se a s debugge s ca do – Control program execution – Set action points – View/change values of variables • More importantly – Control program execution at various levels • Group/process/thread – View MPI message queues View MPI message queues 8/3/2009 Scaling to Petascale Virtual Summer School 10 An Ideal Parallel Debugger An Ideal Parallel Debugger • Should Should allow easy process/thread control and allow easy process/thread control and navigation • Should support multiple high performance Should support multiple high performance computing platforms • Should not limit the number of processes Should not limit the number of processes being debugged and should allow it to vary at runtime 8/3/2009 Scaling to Petascale Virtual Summer School 11 How Parallel Debuggers Work How Parallel Debuggers Work • Frontend o te d User processes – GUI – Debugger engine • Debugger Agents – Control application l processes – Send data back Send data back to the debugger engine to analyze 8/3/2009 … Agent Agent Agent Compute nodes D b Debugger engine i GUI Scaling to Petascale Virtual Summer School Interactive node 12 At Very Large Scale At Very Large Scale • The The debugger itself becomes a large parallel debugger itself becomes a large parallel application • Bottlenecks – Debugger framework startup cost – Communication between frontend and agents C i i b f d d – Access to shared resources, e.g. file system 8/3/2009 Scaling to Petascale Virtual Summer School 13 Validation Is Crucial Validation Is Crucial • Have Have a solid validation procedure to check the a solid validation procedure to check the correctness • Test smaller components before putting them Test smaller components before putting them together 8/3/2009 Scaling to Petascale Virtual Summer School 14 General Parallel Debugging Strategy General Parallel Debugging Strategy • Incremental debugging c e e ta debugg g – Downscale if possible • Participating processes, problem size and/or number of it ti iterations • Example: run with one single thread to detect scope errors in OpenMP programs – Add more instances to reveal other issues • Example: run MPI programs on more than one node to detect problems introduced by network delays p y y 8/3/2009 Scaling to Petascale Virtual Summer School 15 Strategy at Large Scale Strategy at Large Scale • Again, downscale if possible • Reduce the number of processes to which the debugger is attached – Reduces overhead – Reduces the required number of license seats as well • Focus on one or a small number of processes/threads – Analyze Analyze call path and message queues to find problematic call path and message queues to find problematic processes – Control the execution of as few processes/threads as possible while keeping others running • Provides the context where the error occurs 8/3/2009 Scaling to Petascale Virtual Summer School 16 Trends in Debugging Technology Trends in Debugging Technology • Lightweight trace analysis tools – Help to identify processes/threads that have similar behavior and reduce the search space – Complementary to full feature debuggers – Example: Stack Trace Analysis Tool (STAT) • Replay/Reverse execution – ReplayEngine p y g now available from TotalView – Checkpointing supported in DDT 2.4 • Post‐mortem statistical analysis – Detect Detect anomalies by analyzing profile dissimilarity of multiple anomalies by analyzing profile dissimilarity of multiple runs 8/3/2009 Scaling to Petascale Virtual Summer School 17 Hands‐on Hands on Exercise Exercise • Debug MPI and OpenMP ebug a d Ope programs that solve a p og a s t at so e a simple problem to get familiar with – Basic functionalities of parallel debuggers • TotalView: Pople, Bluefire and Athena • DDT: Ranger – Some common types of bugs in parallel programming Some common types of bugs in parallel programming • Programs and instructions can be found at p // / y / http://www.cct.lsu.edu/~lyan1/summerschool09 8/3/2009 Scaling to Petascale Virtual Summer School 18 Problem 0 1 2 3 4 5 6 7 8 … 4 5 • A 1‐D periodic array with N elements • Initial value – C: cell(x)=x%10 – Fortran: cell(x)=mod(x‐1,10) • In each iteration, all elements are updated with the value of two adjacent elements: – cell(x)i+1=[cell(x‐1)i+cell(x+1)i]%10 • Execute Niter iterations p g g • The final outputs are the global maximum and average http://www.hpcbugbase.org/index.php/Main_Page 8/3/2009 Scaling to Petascale Virtual Summer School 19 Sequential Program Sequential Program • Use Use an integer array to hold current values a tege a ay to o d cu e t a ues • Use another integer array to hold the calculated values • Swap the pointers at the end of each iteration • The result is used to check the correctness of the parallel programs – Chances are that we will not have such a luxury for l large jobs j b 8/3/2009 Scaling to Petascale Virtual Summer School 20 MPI Program MPI Program 0 1 2 3 4 5 6 7 8 … 4 5 5 0 1 2 … 5 6 5 6 7 8 … 2 3 Process 1 Process 2 …… 7 8 9 0 … 5 0 Process n • Di Divide the array among n id th processes • Each process works on its local array • Exchange boundary data with neighbor processes at the end of each iteration each iteration • Ring topology 8/3/2009 Scaling to Petascale Virtual Summer School 21 OpenMP Program 0 1 2 3 4 5 6 7 8 … 4 5 Thread 0 Thread 1 … Thread n • Each thread works on its own part of the p global array • All threads have access to the entire array, so y, no data exchange is necessary 8/3/2009 Scaling to Petascale Virtual Summer School 22 Three Ways to Start TotalView/DDT Three Ways to Start TotalView/DDT • Start with core dumps • Start by attaching to one or more running processes Start by attaching to one or more running processes • Start the executable within TotalView/DDT 8/3/2009 Scaling to Petascale Virtual Summer School 23 TotalView – Root Window Root Window Host name Status Code Description Blank Exited B At breakpoint E Error H Held K In kernel M Mixed R Running T Stopped W At watchpoint TotalView ID 8/3/2009 Status MPI Rank Scaling to Petascale Virtual Summer School 24 TotalView – Process Window Process Window • Stack trace pane p – Call stack of routines • Stack frame pane – LLocal variables, registers l i bl i and function parameters • Source pane – Source code • Action points, processes, threads pane threads pane – Manage action points, processes and threads 8/3/2009 Scaling to Petascale Virtual Summer School 25 DDT ‐ Main Window DDT Main Window Project window Process group window Variable window Source code window Parallel stack view and output window 8/3/2009 Evaluation window Scaling to Petascale Virtual Summer School 26 Controlling Execution Controlling Execution • The process window (TotalView) or main window p ( ) (DDT) always focuses on one process/thread • Switch between processes/threads – TotalView: p+/p‐, t+/t‐, double click in root window, l / / d bl l k d process/thread tab – DDT: click on process rank in process window p p • Need to set the appropriate scope when – Giving control commands – Setting action points 8/3/2009 Scaling to Petascale Virtual Summer School 27 Control Commands Control Commands TotalView DDT Description Go Play/Continue Start/resume execution Halt Pause Stop execution Kill Terminate the job Restart Restarts a running program Next Step over Run to the next source line without stepping into another function Step Step into Run to next source line Out Step out Run to the completion of current function Run to Run to line Run to the indicated location 8/3/2009 Scaling to Petascale Virtual Summer School 28 Process/Thread Groups ‐ TotalView Process/Thread Groups • Scope of commands and action points i t – Group(control) • All processes and threads – Group(workers) • All threads that are executing user code – Rank X • Current process and its threads Current process and its threads – Process(workers) • User threads in the current process – Thread X.Y • Current thread C t th d – User defined group • Group ‐> Custom Groups, or • Create in call graph 8/3/2009 Scaling to Petascale Virtual Summer School 29 Process/Thread Groups ‐ DDT Process/Thread Groups • Create custom groups – Ctrl+click on all desired processes – Right click on the process window then Ri h li k h i d h “create group” 8/3/2009 Scaling to Petascale Virtual Summer School 30 Action Points Action Points • Breakpoints stop the execution of the processes and threads that reach it – Unconditional – Conditional: stop only if the condition is satisfied – Evaluation: stop and execute a code fragment when reached • Useful when testing small patches • Process barrier points synchronize a set of processes or threads – TotalView only • Watchpoints p monitor a location in memory and stop y p execution when its value changes 8/3/2009 Scaling to Petascale Virtual Summer School 31 Setting Action Points ‐ TotalView Setting Action Points • Breakpoints – Right click on a source line ‐> Set breakpoint – Click on the line number • Watch points Watch points – Right click on a variable ‐> Create watchpoint • Barrier points – Right click on a source line ‐> Set barrier • Edit action point property – Right Right click on a action point in the Action Points tab ‐> click on a action point in the Action Points tab ‐> Properties 8/3/2009 Scaling to Petascale Virtual Summer School 32 Setting Action Points ‐ DDT Setting Action Points • Breakpoints p – Double click on a source code line – Right click in the Breakpoints tab Right click in the Breakpoints tab ‐>> Add Add breakpoint • Watch points Watch points – Right click on a variable ‐> Add to Watches – Right click in the Watches tab ‐> Add Watch Right click in the Watches tab > Add Watch 8/3/2009 Scaling to Petascale Virtual Summer School 33 Viewing/Editing Data Viewing/Editing Data • View values and types of variables – At one process/thread – Across all processes/threads • Edit variable value and type yp • Array Data – – – – 8/3/2009 Slicing Filtering Filtering Visualization Statistics Scaling to Petascale Virtual Summer School 34 Viewing/Editing Data ‐ TotalView Viewing/Editing Data • Viewing data in e g data – Stack frame – Expression list – Variable window (dive on a variable by double clicking on its name) • Editing data by clicking on the value in Editi d t b li ki th l i – Stack frame – Variable window Variable window 8/3/2009 Scaling to Petascale Virtual Summer School 35 Viewing/Editing Data ‐ DDT Viewing/Editing Data • Viewing data in Viewing data in – Variable window (in the main window) – Evaluation window Evaluation window • Editing data – Right click on the variable name in the evaluation window – Then choose “Edit value” or “Edit type” Th h “Edit l ” “Edit t ” 8/3/2009 Scaling to Petascale Virtual Summer School 36 Viewing Dynamic Arrays in C/C++ Viewing Dynamic Arrays in C/C++ • TotalView • DDT – Edit “type” in the variable window – Tell TotalView how to interpret the memory from a starting location from a starting location – Example – Drag a pointer variable into the evaluation window – Right click on the variable ‐>> “View variable View as as vector” • To view an array of 100 integers – Int * ‐> int[100]* 8/3/2009 Scaling to Petascale Virtual Summer School 37 MPI Message Queues MPI Message Queues • Detect – Deadlocks – Load balancing issues • TotalView – Tools ‐> Message Queue Graph – More options available M ti il bl • DDT – View ‐> Message Queues g Q 8/3/2009 Scaling to Petascale Virtual Summer School 38 TotalView ‐ Call Graph Call Graph • Tools Tools ‐>> Call graph Call graph • Quick view of program state – Nodes: functions – Edges: calls • Look for outliers 8/3/2009 Scaling to Petascale Virtual Summer School 39 DDT ‐ Parallel Stack View DDT Parallel Stack View • Allow Allow users to see the users to see the position of each process/thread in the source code in same window • Hover over any function to see a list of processes that are currently at that are currently at that location 8/3/2009 Scaling to Petascale Virtual Summer School 40 References • TotalView user manual – http://www.totalviewtech.com/support/documentation/totalview/ind ex.html • DDT user manual – http://www.allinea.com/downloads/userguide.pdf http://www allinea com/downloads/userguide pdf • LLNL TotalView tutorial – https://computing.llnl.gov/tutorials/totalview • NCSA Cyberinfrastructure NCSA Cyberinfrastructure Tutor – “Debugging Serial and Parallel Codes” course • HPCBugBase – http://hpcbugbase.org/index.php/Main_Page http://hpcbugbase org/index php/Main Page 8/3/2009 Scaling to Petascale Virtual Summer School 41