US 11,741,965 B1
Configurable natural language output
Ramsey Abou-Zaki Opp, Santa Barbara, CA (US); Anantdeep Gill, San Jose, CA (US); Angela Liu, Santa Barbara, CA (US); Anisha Jain, Santa Barbara, CA (US); Justin Maxwell Bollag, Santa Barbara, CA (US); Nathan Yeazel, Santa Barbara, CA (US); Sara Renee Bilich, Santa Barbara, CA (US); and Spencer B Baker, Santa Barbara, CA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 26, 2020, as Appl. No. 16/913,139.
Int. Cl. G10L 15/26 (2006.01); G10L 15/18 (2013.01); G10L 13/08 (2013.01); G10L 15/00 (2013.01); G10L 13/00 (2006.01); G10L 15/08 (2006.01)
CPC G10L 15/26 (2013.01) [G10L 13/00 (2013.01); G10L 13/086 (2013.01); G10L 15/005 (2013.01); G10L 15/18 (2013.01); G10L 2015/088 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
receiving, from a first device, first audio data representing a first utterance;
determining first output data responsive to the first utterance, the first output data being a first natural language output including a first plurality of words;
determining the first device corresponds to a first location;
identifying, using a model configured to determine a language generation profile, a first language generation profile associated with the first location, wherein the model was trained using location data and a plurality of language generation profiles;
using a natural language generation (NLG) component, processing the first output data to determine second output data representing a second natural language output, wherein processing the first output data comprises:
determining the first language generation profile represents a first word to be inserted in the first natural language output,
determining the first language generation profile represents a position indicating that the first word is to be inserted after the first plurality of words, and
determining the second output data to include the first plurality of words followed by the first word;
processing, using text-to-speech (TTS) processing, the second output data to determine first output audio data representing first synthesized speech; and
sending the first output audio data to the first device.