US 11,720,559 B2
Bridging textual and tabular data for cross domain text-to-query language semantic parsing with a pre-trained transformer language encoder and anchor text
Xi Lin, Palo Alto, CA (US); and Caiming Xiong, Menlo Park, CA (US)
Assigned to Salesforce.com, Inc., San Francisco, CA (US)
Filed by salesforce.com, inc., San Francisco, CA (US)
Filed on Oct. 6, 2020, as Appl. No. 17/64,466.
Claims priority of provisional application 63/033,770, filed on Jun. 2, 2020.
Prior Publication US 2021/0374133 A1, Dec. 2, 2021
Int. Cl. G06F 16/30 (2019.01); G06F 16/2452 (2019.01); G06F 16/21 (2019.01); G06F 16/22 (2019.01); G06N 3/088 (2023.01); G06F 16/242 (2019.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01)
CPC G06F 16/24522 (2019.01) [G06F 16/212 (2019.01); G06F 16/2282 (2019.01); G06F 16/243 (2019.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/088 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
receiving a natural language question and a database schema;
generating a serialized question-schema representation from the natural language question and the database schema, wherein the serialized question-schema representation includes at least one word from the natural language question, at least one table name of a table in the database schema and at least one field name of a field associated with the table;
generating, using a fuzzy string match, a set of multiple values from a picklist associated with the field, wherein the picklist links at least one word in the natural language question to the field, and wherein the set of multiple values includes words that match the at least one word in the natural language question and at least one value in the picklist;
separating values in the set of multiple values with a value token;
appending the set of multiple values from the picklist to the serialized question-schema representation;
generating, using an encoder and at least one bi-directional long-short term memory (LSTM), question encodings and schema encodings from the serialized question-schema representation; and
generating, using a decoder, an executable query from the question encodings and the schema encodings.