| US 7,516,194 B1 | ||
| Method for downloading high-volumes of content from the internet without adversely effecting the source of the content or being detected | ||
| Nick Lamkins, Portland, Oreg. (US); Rick O'Brien, Eugene, Oreg. (US); and James Dirksen, Portland, Oreg. (US) | ||
| Assigned to Microsoft Corporation, Redmond, Wash. (US) | ||
| Filed on May 21, 2003, as Appl. No. 10/443,110. | ||
| Claims priority of provisional application 60/382779, filed on May 21, 2002. | ||
| Int. Cl. G06F 15/16 (2006.01) | ||
| U.S. Cl. 709—218 [709/217; 709/219] | 22 Claims |

| 1. A system for downloading a plurality of documents from a plurality of content servers, said content servers being linked
to a plurality of routers that each have a different network address, said system comprising:
a plurality of pullers;
a director for:
creating a list of URLs of the plurality of documents to be downloaded from the plurality of content servers, each of the
plurality of said documents being identified by a different URL; and
assigning a portion of the list of URLs to each of the pullers such that each portion assigned to a particular puller includes
all documents to be retrieved from a single content server wherein no two pullers initiate requests to adjacent URLs, wherein
adjacent URLs identify documents located on the same content server;
wherein each of the plurality of pullers is responsive to the director for:
receiving the assigned portion of the list of URLs;
queuing requests to retrieve documents identified by the received portion of the list of URLs wherein the requests having
different URLs are queued by the puller;
determining if the URL of a first queued request is adjacent to the URL of a document being currently downloaded;
if the URL of the first queued request is adjacent to the URL of a document being currently downloaded, waiting until the
currently downloading document has been received before initiating the first queued request to avoid overlapping requests
to the content server;
if the URL of the queued request is not adjacent to the URL of a document being currently downloaded, initiating the first
queued request; and
a proxy gateway responsive to each of the pullers for receiving the initiated requests to retrieve documents, and for retrieving
documents corresponding to the list of URL from the content servers via the routers.
|