US 11,837,227 B2
System for user initiated generic conversation with an artificially intelligent machine
Sneh Vaswani, Mumbai (IN); Prashant Iyengar, Mumbai (IN); and Chintan Raikar, Mumbai (IN)
Assigned to RN CHIDAKASHI TECHNOLOGIES PVT LTD, Mumbai (IN)
Appl. No. 17/264,840
Filed by RN Chidakashi Technologies Pvt. Ltd., Mumbai (IN)
PCT Filed Jan. 22, 2021, PCT No. PCT/IN2021/050067
§ 371(c)(1), (2) Date Jan. 31, 2021,
PCT Pub. No. WO2021/149079, PCT Pub. Date Jul. 29, 2021.
Claims priority of application No. 202021002986 (IN), filed on Jan. 23, 2020.
Prior Publication US 2023/0121824 A1, Apr. 20, 2023
Int. Cl. G10L 15/22 (2006.01); G10L 15/32 (2013.01)
CPC G10L 15/22 (2013.01) [G10L 15/32 (2013.01); G10L 2015/223 (2013.01)] 16 Claims
OG exemplary drawing
 
1. A system for user initiated conversation using an artificially intelligent machine, the system comprising:
an artificially intelligent (AI) machine;
a plurality of conversational nodes that are interconnected at edges to form a conversational network that encapsulates a conversational flow and logic to transport data associated with the conversational flow between the plurality of conversational nodes, wherein each of the plurality of conversational nodes are associated with salient information;
a conversational server that is communicatively connected to the artificially intelligent machine, wherein the conversational server comprises a first memory that stores a first set of instructions and a processor that is configured to execute the first set of instructions to initiate the plurality of conversational nodes to perform steps of
receiving an input query, at an input node, from a user to start the conversational flow using a first input modality and a second input modality, wherein the user provides the input query through (i) the first input modality comprising a microphone, a keypad, and a user interface, and (ii) the second input modality comprising a camera, and a sensor, wherein the input query comprises a speech, a text, an audio, a visual data, or a motion data;
transforming the input query at an input recognizing node (IR) associated with the artificially intelligent machine, into a first format that is suitable for processing the input query, wherein the input recognizing node (IR) converts the speech into text for the input query received through the first input modality, and performs a facial recognition on the input query received through the second input modality, to transform the input query into the first format;
determining, at a query processing node (F), at least one parameter associated with the input query by processing the first format of the input query using an artificially intelligent model, wherein the query processing node (F) transfers the conversational flow to a conversational node from the plurality of conversational nodes and the conversational node transfers the conversational flow to sub-query processing nodes to extract salient information associated with the input query to determine the at least one parameter, wherein the at least one parameter comprises an input category and subcategory, an input primary and secondary entity, an input sentiment, an intent, a predefined information associated with the input query, and an information associated with the user;
executing a decision node (D) that maps the identified at least one parameter associated with the input query to a function node, wherein the decision node (D) selects at least one edge that is connected to an output node (O) based on the input query, the at least one parameter, and the salient information associated with the input query; and
generating, by the function node, an output data to be fed to the output node (O) comprising a delay, an audio, a speech, a visual, and a motion execution block; and
executing, using the output node (O) associated with the artificially intelligent machine, commands or directives in the output data according to a sequence of an output data structure.