US 9,813,307 B2
Methods and systems of monitoring failures in a distributed network system
Alexander Leonard Walsh, Waverly (CA); and Daniel Joseph Spraggins, Garden Ridge, TX (US)
Assigned to Rackspace US, Inc., San Antonio, TX (US)
Filed by Rackspace US, Inc., San Antonio, TX (US)
Filed on Mar. 15, 2013, as Appl. No. 13/841,446.
Application 13/841,446 is a continuation in part of application No. 13/752,147, filed on Jan. 28, 2013, granted, now 9,135,145.
Application 13/752,147 is a continuation in part of application No. 13/752,255, filed on Jan. 28, 2013, granted, now 9,521,004.
Application 13/752,255 is a continuation in part of application No. 13/752,234, filed on Jan. 28, 2013, granted, now 9,658,941.
Prior Publication US 2014/0215057 A1, Jul. 31, 2014
Int. Cl. G06F 9/44 (2006.01); H04L 12/24 (2006.01); H04L 29/08 (2006.01)
CPC H04L 41/5009 (2013.01) [H04L 67/025 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for failure monitoring in a distributed network system providing cloud services, the method comprising:
receiving an Application Programming Interface (API) request issued by a requestor, the API request specifying a task for processing by the distributed network system;
assigning a unique identifier to the API request;
updating the API request to reflect the unique identifier;
recording, in association with the unique identifier, a request time corresponding to a time at which the API request was transmitted by the requestor;
during the processing of the API request by a plurality of communicatively coupled service-providing computing devices:
at each one of the coupled computing devices, tracking events associated with the API request, and reporting tracked events to an aggregation process;
at the aggregation process, tracking the progress of the request through the coupled computing devices by aggregating responses using the unique identifier of the associated API request, and wherein the tracked events each indicate progress or failure in the completion of the API request by the system;
determining, based on the tracked events, one or more state changes in the distributed network system, the state changes corresponding to both the transit of the request through the coupled computing devices and side effects that may occur in response to the API request;
associating the one or more state changes with the unique identifier assigned to the API request;
determining a final disposition of the API request based on the determined state changes associated with the API request and the recorded request time, and recording the final disposition of the API request in a final disposition field associated with the unique identifier, wherein the value of the final disposition field is separate from the failure or success of the system in processing the API request; and
writing the tracked events and state changes to a log.