US 7,523,352 B2
System and method for examining remote systems and gathering debug data in real time
Jonathan D. Bradbury, Poughkeepsie, N.Y. (US); Scott M. Carlson, Tucson, Ariz. (US); Trevor E. Carlson, Poughkeepsie, N.Y. (US); Donald P. Crabtree, Port Ewen, N.Y. (US); David A. Elko, Austin, Tex. (US); Michel Henri Théodore Hack, Cortlandt Manor, N.Y. (US); William M. Sakal, Tivoli, N.Y. (US); Denise M. Sevigny, Wappingers Falls, N.Y. (US); Ronald M. Smith, Sr., Wappingers Falls, N.Y. (US); and Li Zhang, Yorktown Heights, N.Y. (US)
Assigned to International Business Machines Corporation, Armonk, N.Y. (US)
Filed on Sep. 09, 2005, as Appl. No. 11/223,887.
Prior Publication US 2007/0061628 A1, Mar. 15, 2007
Int. Cl. G06F 11/00 (2006.01)
U.S. Cl. 714—39 20 Claims
OG exemplary drawing
 
1. An apparatus for dynamic debugging of a multi-node network, said network comprising an infrastructure including a plurality of devices, each device adapted for communicating timing messages between nodes according to a timing protocol governing communication of timing information among nodes in the network for synchronizing system clocks at said nodes, said apparatus comprising:
a plurality of probe links interconnecting each node of said multi-node network with a probe device, said probe device monitoring data included in each timing message received at each node as communicated according to said timing protocol, and extracting timing state information from said timing message;
means for processing said extracted timing state information from each message at said probe device to determine existence of a trigger condition at a node, said processing of said extracted timing state information including:
calculating clock offset values depicting clock offset between the probe device and the clock synchronization data at the respective node being probed; and,
determining whether a clock state transition has occurred from a synchronized state to an unsynchronized state, coupled with a determination whether a calculated clock offset value is within a specified tolerance;
said process means determining existence of said trigger condition based on a calculated clock offset value, whether said offset value is within said tolerance, and a corresponding state transition determination at said node; and, in response to detecting a trigger condition,
generating a control message for receipt by all nodes in said network via said probe links for halting operation at said node and recording data useful for debugging purposes, whereby debug information is collected at each node at the time of a first error detection and collected dynamically at execution time without manual intervention.