US 11,808,473 B2
	Action optimization device, method and program
Nobuhiko Matsuura, Musashino (JP); Midori Kodama, Musashino (JP); Takahiro Hata, Musashino (JP); Motonori Nakamura, Musashino (JP); and Ippei Shake, Musashino (JP)
Assigned to NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
Appl. No. 17/263,255
Filed by NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
PCT Filed Jul. 16, 2019, PCT No. PCT/JP2019/027911 § 371(c)(1), (2) Date Jan. 26, 2021, PCT Pub. No. WO2020/022123, PCT Pub. Date Jan. 30, 2020.
Claims priority of application No. 2018-141754 (JP), filed on Jul. 27, 2018.
Prior Publication US 2021/0140670 A1, May 13, 2021
Int. Cl. F24F 11/63 (2018.01); A47L 9/28 (2006.01); G05B 13/02 (2006.01); G05B 13/04 (2006.01); G06N 3/044 (2023.01); A47L 11/40 (2006.01); G06N 3/006 (2023.01); G06N 7/01 (2023.01); G06N 20/00 (2019.01)

CPC F24F 11/63 (2018.01) [A47L 9/281 (2013.01); A47L 9/2826 (2013.01); G05B 13/0265 (2013.01); G05B 13/048 (2013.01)]

7 Claims

1. An action optimization device for optimizing an action for controlling air conditioning in a target space, comprising a processor and a memory connected to the processor, the action optimization device comprising:

an environmental data acquisition unit configured to acquire environmental data related to a state of the environment in the target space including a people flow, a temperature, and a humidity and to store the acquired a plurality of environmental data items in an environmental data storage unit, wherein each of data of the people flow, the temperature, and the humidity included in the plurality of environmental data items includes a plurality of data items acquired at any different time;

an environmental data interpolation unit configured to perform time/space interpolation on the plurality of acquired environmental data items according to a preset algorithm;

an environment reproduction model training unit configured to train an environment reproduction model, based on the time/space-interpolated environmental data, such that, when a state of an environment and an action for controlling the environment are input, a correct answer value of an environmental state after the action is output;

an environment expansion unit configured to perform data augmentation on the environmental data based on a random number within a range that does not destroy a relationship between the state of the environment, the action, and the environmental state;

wherein the environment reproduction model training unit is configured to train the environment reproduction model by using the environmental data subjected to the data augmentation;

an exploration model training unit configured to train an exploration model such that an action to be taken next is output when an environmental state output from the environment reproduction model is input, and store the trained exploration model in the memory;

an action exploration unit configured to explore for a first action to be taken for a first environmental state by using the trained exploration model;

an environment reproduction unit configured to predict a second environment state corresponding to the first environment state and the first action by using the trained environment reproduction model; and

an output unit configured to output a result of the exploration.