US 11,704,745 B2
	Multimodal dialog state tracking and action prediction for assistant systems
Shivani Poddar, Mountain View, CA (US); Seungwhan Moon, Seattle, WA (US); Paul Anthony Crook, Newcastle, WA (US); and Rajen Subba, San Carlos, CA (US)
Assigned to Meta Platforms, Inc., Menlo Park, CA (US)
Filed by Meta Platforms, Inc., Menlo Park, CA (US)
Filed on Aug. 28, 2020, as Appl. No. 17/6,339.
Claims priority of provisional application 62/923,342, filed on Oct. 18, 2019.
Prior Publication US 2021/0117681 A1, Apr. 22, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 40/30 (2020.01); G06F 3/01 (2006.01); G06F 9/451 (2018.01); G06F 9/48 (2006.01); G06F 9/54 (2006.01); G06F 16/332 (2019.01); G06F 16/9032 (2019.01); G06F 16/9536 (2019.01); G06F 40/205 (2020.01); G06F 40/253 (2020.01); G06Q 50/00 (2012.01); H04N 7/14 (2006.01); G06N 20/00 (2019.01); G06F 40/35 (2020.01); G06F 40/56 (2020.01); G06F 40/242 (2020.01); G06V 20/20 (2022.01); G06V 10/82 (2022.01); G06V 40/16 (2022.01); G06V 20/30 (2022.01); G06V 10/20 (2022.01); G06V 10/764 (2022.01); G06V 20/00 (2022.01); G06V 40/20 (2022.01); H04L 51/222 (2022.01); H04L 51/224 (2022.01); H04L 51/52 (2022.01); H04L 51/212 (2022.01); H04L 67/75 (2022.01); G06N 3/047 (2023.01); G06N 3/045 (2023.01); G06F 18/2321 (2023.01); G06N 3/08 (2023.01); G06Q 10/109 (2023.01); G10L 15/06 (2013.01); G10L 15/08 (2006.01); G10L 15/16 (2006.01); G10L 15/18 (2013.01); G10L 15/22 (2006.01); G10L 15/30 (2013.01); G10L 15/32 (2013.01); H04L 51/18 (2022.01); H04L 67/306 (2022.01); G06V 20/40 (2022.01); G06F 3/16 (2006.01)

CPC G06Q 50/01 (2013.01) [G06F 3/011 (2013.01); G06F 3/013 (2013.01); G06F 9/453 (2018.02); G06F 9/485 (2013.01); G06F 9/4881 (2013.01); G06F 9/547 (2013.01); G06F 16/3329 (2019.01); G06F 16/90332 (2019.01); G06F 16/9536 (2019.01); G06F 18/2321 (2023.01); G06F 40/205 (2020.01); G06F 40/242 (2020.01); G06F 40/253 (2020.01); G06F 40/30 (2020.01); G06F 40/35 (2020.01); G06F 40/56 (2020.01); G06N 3/045 (2023.01); G06N 3/047 (2023.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01); G06Q 10/109 (2013.01); G06V 10/255 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06V 20/00 (2022.01); G06V 20/20 (2022.01); G06V 20/30 (2022.01); G06V 40/16 (2022.01); G06V 40/25 (2022.01); G10L 15/063 (2013.01); G10L 15/08 (2013.01); G10L 15/16 (2013.01); G10L 15/1815 (2013.01); G10L 15/1822 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01); G10L 15/32 (2013.01); H04L 51/18 (2013.01); H04L 51/212 (2022.05); H04L 51/222 (2022.05); H04L 51/224 (2022.05); H04L 51/52 (2022.05); H04L 67/306 (2013.01); H04L 67/75 (2022.05); H04N 7/147 (2013.01); G06F 3/017 (2013.01); G06F 3/167 (2013.01); G06V 20/41 (2022.01); G06V 40/174 (2022.01); G06V 2201/10 (2022.01); G10L 2015/088 (2013.01); G10L 2015/223 (2013.01); G10L 2015/227 (2013.01)]

20 Claims

1. A method comprising:

receiving, from a client system associated with a user, a user request comprising a reference to a target object;

accessing visual data from the client system, wherein the visual data comprises images portraying the target object and one or more additional objects, and wherein attribute information of the target object is recorded in a multimodal dialog state;

resolving the reference to the target object based on the attribute information recorded in the multimodal dialog state;

determining relational information between the target obj ect and one or more of the additional objects portrayed in the visual data; and

sending, to the client system, instructions for presenting a response to the user request, wherein the response comprises the attribute information and the determined relational information.