US 7,590,603 B2
Method and system for classifying and identifying messages as question or not a question within a discussion thread
Benyu Zhang, Beijing (China); Zheng Chen, Beijing (China); Hua-Jun Zeng, Beijing (China); and Wei-Ying Ma, Beijing (China)
Assigned to Microsoft Corporation, Redmond, Wash. (US)
Filed on Oct. 01, 2004, as Appl. No. 10/957,329.
Prior Publication US 2006/0112036 A1, May 25, 2006
Int. Cl. G06F 15/18 (2006.01)
U.S. Cl. 706—12  [709/206; 704/1; 704/10] 34 Claims
OG exemplary drawing
 
1. A method in a computer system with a processor for classifying a message of a discussion thread as a question or not a question, the method comprising:
providing training data including discussion threads having messages;
generating by the processor feature vectors of messages of the provided training data, the feature vectors being based on indicator words of the messages identified in accordance with the following equation:

OG Complex Work Unit Drawing
 where t is an indicator word, A is the number of question messages in the training data that contain t, B is the number of non-question messages that contain t, C is the number of question messages that do not contain t, D is the number of non-question messages that do not contain t, and N is the total number of messages in the training data;
providing classifications of messages of the provided training data as questions or not questions;
training by the processor a classifier using the generated feature vectors and provided classifications of messages of the provided training data; and
after the classifier is trained,
receiving an unclassified message of a discussion thread; and
classifying by the processor the received message of the discussion thread as a question or not a question using the trained classifier.