| US 7,590,603 B2 | ||
| Method and system for classifying and identifying messages as question or not a question within a discussion thread | ||
| Benyu Zhang, Beijing (China); Zheng Chen, Beijing (China); Hua-Jun Zeng, Beijing (China); and Wei-Ying Ma, Beijing (China) | ||
| Assigned to Microsoft Corporation, Redmond, Wash. (US) | ||
| Filed on Oct. 01, 2004, as Appl. No. 10/957,329. | ||
| Prior Publication US 2006/0112036 A1, May 25, 2006 | ||
| Int. Cl. G06F 15/18 (2006.01) | ||
| U.S. Cl. 706—12 [709/206; 704/1; 704/10] | 34 Claims |

| 1. A method in a computer system with a processor for classifying a message of a discussion thread as a question or not a
question, the method comprising:
providing training data including discussion threads having messages;
generating by the processor feature vectors of messages of the provided training data, the feature vectors being based on
indicator words of the messages identified in accordance with the following equation:
![]() where t is an indicator word, A is the number of question messages in the training data that contain t, B is the number of
non-question messages that contain t, C is the number of question messages that do not contain t, D is the number of non-question
messages that do not contain t, and N is the total number of messages in the training data;
providing classifications of messages of the provided training data as questions or not questions;
training by the processor a classifier using the generated feature vectors and provided classifications of messages of the
provided training data; and
after the classifier is trained,
receiving an unclassified message of a discussion thread; and
classifying by the processor the received message of the discussion thread as a question or not a question using the trained
classifier.
|