Skip over navigation
SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
Definition statement
This subclass/group covers:
  • processing of speech or voice signals in general (G10L 25/00);
  • production of synthetic speech signals (G10L 13/00);
  • recognition of speech (G10L 15/00);
  • lyrics recognition from a singing voice (G10L 15/00);
  • speaker identification, authentication or verification (G10L 17/00);
  • singer recognition from a singing voice (G10L 17/00);
  • analysis of speech signals for bandwidth compression or extension, bit-rate or redundancy reduction (G10L 19/00);
  • coding/decoding of audio signals for compression and expansion using analysis-synthesis, source filter models or psycho-acoustic analysis (G10L 19/00);
  • modification of speech signals, speech enhancement, source separation (G10L 21/00);
  • noise filtering or echo cancellation in an audio signal (G10L 21/00);
  • speech or voice analysis techniques specially adapted to analyse or modify audio signals not necessarily including speech or voice are also covered in subgroups (G10L 21/00,G10L 25/00);
References relevant to classification in this subclass
This subclass/group does not cover:
Devices for the storage of speech signals
Spatial sound recording
Spatial sound reproduction
Encoding of compressed speech signals for transmission or storage
Coding or synthesis of audio signals in musical instruments
Karaoke or singing voice processing
Sound production
Sound input or sound output arrangements for computers
Amplifiers
Gain or frequency control
Broadcasting
Secret communication
Handling natural language data
General pattern recognition
Speech or voice prosthesis
Mere application of speech or voice analysis techniques
application place

Examples of places where the subject matter of this group is covered when specially adapted, used for a particular purpose, or incorporated in a larger system:

Information retrieval of audio data
Broadcasting arrangements of audio
Name dialling controlled by voice recognition
Automatic arrangements for answering calls

Places in relation to which this subclass is residual:

Acoustics not otherwise provided for
Informative references
Attention is drawn to the following places, which may be of interest for search:
Measurement of sound waves in general
Sound input/output for computers
Image data processing
Teaching or communicating with the blind, deaf or mute
Electronic musical instruments
Information storage, e.g. sound storage
Electronic circuits for sound generation
Electronic filters
Coding, decoding or code conversion, error protection in general
Telephonic communication
Switching systems
Microphone arrangements, hearing aids, public address systems
Spatial sound reproduction
Glossary of terms
In this subclass/group, the following terms (or expressions) are used with the meaning indicated:

In this subclass, the following terms are used with the meaning indicated:

Speech
definite vocal sounds that form words to express thoughts and ideas
Voice
sounds generated by vocal chords or synthetic versions thereof
Audio
of or relating to humanly audible sound
Speech synthesis; Text to speech systems
Definition statement
This subclass/group covers:
  • synthesis of speech from text, concatenation of smaller speech units, grapheme to phoneme conversion;
  • modification of the voice for speech synthesis: gender, age, pitch, prosody, stress.
  • hardware or software implementation details of a speech synthesis system
References relevant to classification in this group
This subclass/group does not cover:

Examples of places where the subject matter of this group is covered when specially adapted, used for a particular purpose, or incorporated in a larger system:

Navigation systems for vehicles, guidance using speech synthesis.
Speech synthesis in games
Electric switches with speech feedback
Speech synthesis in mobile phones
Electronic musical instruments
Sound producing other than musical instruments
Informative references
Attention is drawn to the following places, which may be of interest for search:
Natural language translation
Excitation coding of a speech signal
Synonyms and Keywords

In patent documents the following abbreviations are often used:

HMM
Hidden Markov Model
TTS
Text To Speech
Concept to speech synthesisers; Generation of natural phrases from machine-based concepts (generation of parameters for speech synthesis out of text G10L 13/08)
Definition statement
This subclass/group covers:

concepts used for speech synthesis can be linked to an emotion to be conveyed (US2010329505), a communication goal driving a dialogue (US2010241420), image-to-speech (US2010231752), native sounding speech (US2004030554)

References relevant to classification in this group
This subclass/group does not cover:
Language translation
Speech recognition (G10L 17/00 takes precedence)
Definition statement
This subclass/group covers:
  • recognition of text or phonemes from a spoken audio signal;
  • spoken dialog interfaces, human-machine spoken interfaces
  • topic detection in a dialogue, semantic analysis, keyword detection, spoken command and control
  • context dependent speech recognition (location, environment, age, gender, etc.)
  • parameter extraction, acoustic models, word models, grammars, language models for speech recognition
  • recognition of speech in a noisy environment
  • recognition of speech using visual clues
  • feedback of the recognition results, disambiguation of speech recognition results
  • dedicated hardware or software implementations, parallel and distributed processing of speech recognition engines
References relevant to classification in this group
This subclass/group does not cover:

Examples of places where the subject matter of this group is covered when specially adapted, used for a particular purpose, or incorporated in a larger system:

Voice control for systems within a vehicle
Speech input for vehicle navigation systems
Sound input arrangements for computers
Teaching how to speak
Name dialling controlled by voice recognition
Speech interaction details in automatic or semi-automatic exchange systems for interactive information services
Spoken command and control of surgical instruments
Speech input in video games
Informative references
Attention is drawn to the following places, which may be of interest for search:
Complex mathematical functions
Handling natural language data
Information retrieval of audio data
Educational appliances
Face recognition, lip reading without acoustical input
Pattern recognition
Signal processing for recording
Natural language processing
Synonyms and Keywords

In patent documents the following abbreviations are often used:

ANN
Artificial neural network
ASR
Automatic speech recognition
CSR
Continuous speech recognition
GMM
Gaussian mixture model
HMM
Hidden Markov model
IVR
Interactive voice response
MLP
Multi layer perceptron
VLSR
Very large speech recognition
Speaker identification or verification
Definition statement
This subclass/group covers:
  • recognition, identification of a speaker
  • verification, authentication of a speaker
  • feature extraction, dialog, prompts, passwords for identification
  • identification in noisy condition
  • multimodal identification including voice
  • impostor detection
Informative references
Attention is drawn to the following places, which may be of interest for search:
Complex mathematical functions
Information retrieval of audio data
Secret secure communication including means for verifying the identity or authority of a user
Security arrangements, restricting access by authenticating users, using biometric data
G06F21/00N5A2B
Pattern recognition
Individual entry or exit registers, access control with identity check using personal physical data
Glossary of terms
In this subclass/group, the following terms (or expressions) are used with the meaning indicated:

In this subclass, the following terms are used with the meaning indicated:

Speaker verification, or authentication
refers to verifying that the user claimed identity is real, he is otherwise an impostor. Speaker recognition, or identification, aims at determining who the user is among a closed (finite number) set of users. He is otherwise unknown.
A goat, sheep
often refers to a person whose voice is easy to counterfeit.
A wolf, predator
often refers to a person who can easily counterfeit someone else's voice or is often identified as someone else.
An impostor
is someone actively trying to counterfeit someone else's identity.
Synonyms and Keywords

In patent documents the following abbreviations are often used:

ANN
Artificial neural network
ASR
Automatic speech recognition
GMM
Gaussian mixture model
HMM
Hidden Markov model
IVR
Interactive voice response
MLP
Multi layer perceptron
Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis (in musical instruments G10H)
Definition statement
This subclass/group covers:

Techniques for the reduction of data from audio sources, i.e. compression of audio. These techniques are applied to reduce the quantity of information to be stored or transmitted, but are independent of the end-application, medium or transmission channel, i.e. do only exploit the properties of the source signal itself or the final receiver exposed to this signal (the listener).

Mainly two types of sources can be distinguished :

"speech only" encompass signals produced by human speakers, and historically was to be understood as mono-channel, single speaker "telephone quality" speech having a narrow bandwidth limited to max. 4kHz. Encoding of speech only sources primarily aim at reducing the bit-rate while still providing fair intelligibility of the spoken content, but not always fidelity to the original.

"Audio signal" is broader and comprises speech as well as background information, e.g. music source having multiple channels. Encoding of audio deals primarily with transparent, i.e. "high fidelity" reproduction of the original signal.

The compression techniques can also be distinguished as being :

Lossy or Lossless, i.e. whether a perfect reconstruction of the source is possible, or only a perceptually acceptable approximation can be done.

The techniques classified in this subclass are based either on modelling the production of the signal (voice) or the perception of it (general audio).

References relevant to classification in this group
This subclass/group does not cover:
Coding of signals within electronic musical instruments
Informative references
Attention is drawn to the following places, which may be of interest for search:
Complex mathematical functions
Quality monitoring in automatic, semi automatic exchanges
Quality control of voice transmission between switching centres
H04M7/00M18
Signal processing not specific to the method of recording or reproducing
Editing; Indexing; Addressing; Timing or synchronizing; Monitoring;
Compression
Detecting, preventing errors in received information
Transmission of audio and video in television systems
H04N7/52A
Simultaneous speech and data transmission
Stereophonic arrangements
Stereophonic systems
Wireless communication networks
Glossary of terms
In this subclass/group, the following terms (or expressions) are used with the meaning indicated:

In this subclass, the following terms are used with the meaning indicated:

audio signal
is meant to include speech, music, silence or background signal, or any combinations thereof, unless explicitly specified
Synonyms and Keywords

In patent documents the following abbreviations are often used:

CELP
Code Excited Linear Prediction
CTX
Continuous transmission
DTX
Discontinuous transmission
HVXC
Harmonic Vector eXcitation Coding
LPC
linear prediction coding
MBE
Multiband Excitation
MELP
Mixed Excitation Linear Prediction
MOS
mean opinion score
MPEG
Moving Picture Experts Group
MPEG1 audio
Standard ISO/IEC 11172-3
MPEG2 audio
Standard ISO/IEC 13818-3
MPEG4 audio
Standard ISO/IEC 14496-3
MP3
MPEG 1 Layer III
PCM
pulse code modulation
PWI
Prototype Waveform Interpolation
SBR
Spectral Band Replication

In patent documents the following expressions/words " perceptual" and "psychoacoustic" are often used as synonyms.

Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
Definition statement
This subclass/group covers:

Coding of a signal with rate adaptation, e.g. adapted to voiced speech, unvoiced speech, transitions and noise/silence portions.

Coding of a signal with a core encoder providing a minimum level of quality, and extension layers to improve the quality but requiring a higher bitrate. It includes parameter based bandwidth extension (i.e. SBR) or channel extension.

This group is in opposition to G10L 21/038 in which the bandwidth extension is artificial, i.e. based on the only narrowband encoded signal.

Informative references
Attention is drawn to the following places, which may be of interest for search:
Artificial bandwidth extension, i.e. based on the only narrowband encoded signal
Spatial sound recording
Spatial sound reproduction
Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility (G10L 19/00 takes precedence)
Definition statement
This subclass/group covers:

The subgroup deals with speech or voice modification applications, but receives also applications for speech or voice analysis techniques specially adapted to analyse or modify audio signals not necessarily including speech or voice but which are not music signals (G10H).

  • bandwidth extension of an audio signal
  • improvement of the intelligibility of a coded speech signal
  • removal of noise from an audio signal
  • removal of echo from an audio signal
  • separation of audio sources
  • pitch, speed modification of an audio signal
  • voice morphing
  • visualisation of audio signals (e.g. sonagrams)
  • lips or face movement synchronisation with speech (e.g phonemes - visemes alignment).
  • face animation synchronisation with the emotion contained in the voice or speech signal
References relevant to classification in this group
This subclass/group does not cover:
Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders ; Coding or decoding of speech or audio signals, e.g. for compression or expansion, source-filter models or psychoacoustic analysis

Places in relation to which this group is residual:

Electronic musical instruments
Loudspekers, microphones, gramophone pick-up or like acoustic electromechanical transducers; deaf-aid sets; public address systems
Stereophonic systems
Informative references
Attention is drawn to the following places, which may be of interest for search:
Complex mathematical functions
Signal processing not specific to the method of recording or reproducing,
Gain control in amplifiers
Animation based on audio data, talking heads
Signal processing not specific to the method of recording or reproducing, for reducing noise
Editing; Indexing; Addressing; Timing or synchronizing; Monitoring;
Direction finder
Reducing noise or bandwidth in transmission systems not characterised by the medium used for transmission
Reducing echo effect or singing in line transmissions systems
Hearing aids
Public address systems
Glossary of terms
In this subclass/group, the following terms (or expressions) are used with the meaning indicated:

In this subclass, the following terms are used with the meaning indicated:

Viseme
a visual representation of the mouth, lips, tongue and teeth corresponding to a phoneme
Synonyms and Keywords

In patent documents the following abbreviations are often used:

BSS
blind source separation
LDA
linear discriminant analysis
NB
narrowband
PCA
principal component analysis
SBR
Spectral Band Replication
WB
wideband
for synchronising with other signals, e.g. video signals
Definition statement
This subclass/group covers:

Visemes are selected to match with the corresponding speech segment, or the speech segments are adapted/chosen, to match with the viseme. This symbol also encompasses the coarticulation effects as used in facial character animation or talking heads.

Informative references
Attention is drawn to the following places, which may be of interest for search:
Facial character animation per se
using band spreading techniques
Definition statement
This subclass/group covers:

Bandwidth extension taking place at the receiving side, e.g. generation of artificial low or high frequency components, regeneration of spectral holes, based on the only narrowband encoded signal. This is in opposition with G10L 19/24 wherein parameters are computed during the encoding step to enable bandwidth extension at the decoding step.

Informative references
Attention is drawn to the following places, which may be of interest for search:
Parameter based bandwidth extension (e.g. SBR)
Speech or voice analysis techniques not restricted to a single one of groups G10L 15/00-G10L 21/00
Definition statement
This subclass/group covers:
  • processing of speech or voice signals in general, in particular detection of a speech signal, end points detection in noise, extraction of pitch, measure of the voicing, emotional state, voice pathology or other speech or voice related parameters
  • speech or voice analysis techniques specially adapted to analyse audio signals not necessarily including speech or voice, such as audio scene segmentation, jingle detection, separation from music or noise, detection of particular sounds;
References relevant to classification in this group
This subclass/group does not cover:
Karaoke or singing voice processing, parameter extraction for musical signal categorisation, electronic musical instruments
Gain or frequency control
DTX communication
Multiplex systems
Informative references
Attention is drawn to the following places, which may be of interest for search:
Switching of direction of transmission by voice in loud-speaking telephone systems
Comfort noise
G10L19/00N
Glossary of terms
In this subclass/group, the following terms (or expressions) are used with the meaning indicated:
audio signal
is of or relating to humanly audible sound. e.g., it comprises any combination of background noise or silence, voice or speech, music.
This page is owned by Office of Patent Classification.
Last Modified: 10/11/2013