CPC Definition - Subclass G10L
This place covers:
- Processing of speech or voice signals in general (G10L 25/00).
- Production of synthetic speech signals, text to speech systems (G10L 13/00).
- Recognition of speech (G10L 15/00).
- Lyrics recognition from a singing voice (G10L 15/00).
- Speaker identification, authentication or verification (G10L 17/00).
- Singer recognition from a singing voice (i.e. speaker recognition on a singing voice) (G10L 17/00).
- Analysis of speech signals for bandwidth compression or extension, bit-rate or redundancy reduction (G10L 19/00).
- Coding/decoding of audio signals for compression and expansion using analysis-synthesis, source filter models or psycho-acoustic analysis (G10L 19/00).
- Modification of speech signals, speech enhancement, source separation (G10L 21/00).
- Processing of the speech or voice signal to produce another audible or non-audible signal, e.g., visual or tactile, in order to modify its quality or its intelligibility (G10L 21/00).
- Noise reduction or echo cancellation in an audio signal (G10L 21/00).
- Speech or voice analysis techniques specially adapted to analyse or modify audio signals, where the audio signals do not necessarily include speech or voice, are also covered in subgroups (G10L 21/00, G10L 25/00).
Classification should be generally directed to appropriate subclasses, e.g. G06F, H03M, for mathematical models for audio analysis in general.
Classification should be generally directed to appropriate subclasses, e.g. G10K, G10H, H04R, H04S when audio productions or general audio analysis or processing are of relevance.
Telegraphic communication is covered in subclass H04L.
Telephonic communication is covered in subclass H04M.
This place does not cover:
Examples of places where the subject matter of this place is covered when specially adapted, used for a particular purpose, or incorporated in a larger system:
Information retrieval of audio data | |
Broadcasting arrangements of audio | |
Name dialling controlled by voice recognition | |
Automatic arrangements for answering calls |
Examples of places in relation to which this place is residual:
Acoustics not otherwise provided for |
Attention is drawn to the following places, which may be of interest for search:
Larynxes or tracheae prosthesis implantable into the body | |
Input/output arrangements for on-board computers | |
Measurement of sound waves in general | |
Direction-finders for determining the direction from which infrasonic, sonic or ultrasonic waves, not having a directional significance, are being received | |
Systems using the reflection or reradiation of acoustic waves | |
Sound input/output for computers | |
Compilation or interpretation of high level programme languages | |
Information retrieval; Database structures therefor | |
Complex mathematical functions | |
General pattern recognition | |
Digital data processing methods or equipment specially adapted for handling, processing or translating natural language data | |
Image data processing | |
Arrangements for image or video recognition or understanding | |
Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition | |
Recognition of biometric, human-related or animal-related patterns in image or video data | |
Individual entry or exit registers | |
Arrangements for influencing the relationship between signals at input and output, e.g. differentiating, delaying | |
Teaching speaking | |
Teaching or communicating with the blind, deaf or mute | |
Electronic musical instruments | |
Sound producing devices other than musical instruments or loudspeakers | |
Methods or devices for protecting against, or for damping, noise or other acoustic waves | |
Signal processing for recording | |
Error detection or correction in digital recording or reproducing; Testing involved in digital recording or reproducing | |
Editing; Indexing; Addressing; Timing or synchronising; Monitoring | |
Electronic circuits for sound generation | |
Amplifiers | |
Amplifiers using amplifying element consisting of two mechanically- or acoustically-coupled transducers, e.g. telephone-microphone amplifier | |
Gain control in amplifiers or frequency changers | |
Electronic filters | |
Coding, decoding or code conversion, error protection in general | |
Transmission | |
Means associated with receiver for limiting or suppressing noise or interference | |
Details of transmission systems, not characterised by the medium used for transmission, for reducing bandwidth of signals | |
Transmission systems employing ultrasonic, sonic or infrasonic waves | |
Transmission systems not characterised by the medium used for transmission characterised by the use of pulse modulation | |
Broadcast distribution systems | |
Time-division multiplex systems in which the transmission channel allotted to a first user may be taken away and re-allotted to a second user if the first user becomes inactive | |
Secret communication | |
Transmission of digital information, e.g. telegraphic communication | |
Telephonic communication | |
Arrangements of transmitters, receivers or complete sets to prevent eavesdropping, to attenuate local noise or to prevent undesired transmission; Special mouthpieces or receivers therefor | |
Arrangements for preventing acoustic feedback in telephonic communication | |
Devices for calling a subscriber whereby a plurality of signals may be stored simultaneously | |
Substation equipment, e.g. for use by subscribers including speech amplifiers | |
Automatic arrangements for answering calls | |
Interactive information services, e.g. directory enquiries | |
Simultaneous speech and telegraphic or other data transmission over the same conductors | |
Systems for transmission of a pulse code modulated video signal with one or more other pulse code modulated signals, e.g. an audio signal, a synchronising signal | |
Switching systems | |
Loudspeakers, microphones, gramophone pick-up or like acoustic electromechanical transducers; Deaf-aid sets; Public address systems | |
Stereophonic arrangements | |
Public address systems | |
Stereophonic systems, e.g. spatial sound capture, matrixing of audio signals in the decoded state |
In this place, the following terms or expressions are used with the meaning indicated:
speech | definite vocal sounds that form words to express thoughts and ideas. |
voice | sounds generated by vocal chords or synthetic versions thereof. |
audio signal | of or relating to humanly audible sound, meant to include speech, voice, music, silence or background noise, or any combinations thereof. |
In patent documents, the following abbreviations are often used:
AAC | Advanced Audio Coding |
ACELP | Algebraic Code Excited Linear Prediction |
ADPCM | Adaptive Differential Pulse Code Modulation |
AMR, AMR-NB | Adaptive Multi-Rate |
AMR-WB | Adaptive Multi-Rate Wideband |
ANN | Artificial Neural Network |
AR | Autoregressive |
ASR | Automatic Speech Recognition |
BLP | Backward Linear Prediction |
BP | Back Propagation |
BSAC | Bit Sliced Arithmetic Coding (audio coding from MPEG-4 Part 3) |
CELP | Code Excited Linear Prediction |
DCT | Discrete Cosine Transform |
DFT | Discrete Fourier Transform |
DPCM | Differential Pulse Code Modulation |
DRM | Digital Rights Management |
DTX | Discontinuous Transmission |
EVRC, EVRC-B | Enhanced Variable Rate CODEC |
FFT | Fast Fourier Transform |
FIR | Finite Duration Impulse Response |
FLP | Forward Linear Prediction |
GMM | Gaussian mixture model |
HMM | Hidden Markov model |
HVXC | Harmonic Vector eXcitation Coding |
IDCT | Inverse Discrete Cosine Transform |
IVR | Interactive Voice Response |
LMS | Least Mean Square |
LPC | Linear Predictive Coding |
LSF | Line Spectral Frequencies |
LSP | Line Spectral Pairs |
LTP | Long Term Prediction |
MBE | Multi-Band Excitation |
MDCT | Modified Discrete Cosine Transform |
MELP | Mixed Excitation Linear Prediction |
MLP | Multi-Layer Perceptron |
MP3 | MPEG1 or MPEG2 audio layer III |
MPEG | Motion Picture Experts Group |
MPEG 1 audio | Standard ISO/IEC 11172-3 |
MPEG 2 audio | Standard ISO/IEC 13818-3 |
MPEG 4 audio | Standard ISO/IEC 14496-3 |
MPEG 21 | Standard ISO/IEC 21000 |
MSE | Mean Square Error |
NB – WB | Narrowband – Wideband |
PARCOR | Partial Correlation |
PWI | Prototype Waveform Interpolation |
RELP | Residual Excited Linear Prediction |
SBR | Spectral Band Replication |
TDNN | Time Delay Neural Network |
TTS | Text-to-Speech |
USAC | Unified Speech and Audio Coding |
VoIP | Voice over Internet Protocol |
VLSR | Very Large Speech Recognition |
VQ | Vector Quantization |
VSELP | Vector Sum Excited Linear Prediction |
V/UV | Voiced/Unvoiced |
VXML or VoiceXML | W3C's standard XML format |
This place covers:
- synthesis of speech from text, concatenation of smaller speech units, grapheme to phoneme conversion;
- modification of the voice for speech synthesis: gender, age, pitch, prosody, stress.
- hardware or software implementation details of a speech synthesis system
Examples of places where the subject matter of this place is covered when specially adapted, used for a particular purpose, or incorporated in a larger system:
Attention is drawn to the following places, which may be of interest for search:
Excitation coding of a speech signal | |
Processing or translation of natural language |
In patent documents, the following abbreviations are often used:
HMM | Hidden Markov Model |
TTS | Text To Speech |
This place covers:
concepts used for speech synthesis can be linked to an emotion to be conveyed (US2010329505), a communication goal driving a dialogue (US2010241420), image-to-speech (US2010231752), native sounding speech (US2004030554)
This place does not cover:
Processing or translation of natural language |
This place covers:
- recognition of text or phonemes from a spoken audio signal;
- spoken dialog interfaces, human-machine spoken interfaces
- topic detection in a dialogue, semantic analysis, keyword detection, spoken command and control
- context dependent speech recognition (location, environment, age, gender, etc.)
- parameter extraction, acoustic models, word models, grammars, language models for speech recognition
- recognition of speech in a noisy environment
- recognition of speech using visual clues
- feedback of the recognition results, disambiguation of speech recognition results
- dedicated hardware or software implementations, parallel and distributed processing of speech recognition engines
Examples of places where the subject matter of this place is covered when specially adapted, used for a particular purpose, or incorporated in a larger system:
Spoken command and control of surgical instruments | |
Speech input in video games | |
Voice control for systems within a vehicle | |
Speech input for vehicle navigation systems | |
Sound input arrangements for computers | |
Teaching how to speak | |
Name dialling controlled by voice recognition | |
Speech interaction details in automatic or semi-automatic exchange systems for interactive information services |
Attention is drawn to the following places, which may be of interest for search:
Information retrieval of audio data | |
Complex mathematical functions | |
Pattern recognition | |
Handling natural language data | |
Image or video recognition or understanding | |
Face recognition, lip reading without acoustical input | |
Educational appliances | |
Signal processing for recording |
In patent documents, the following abbreviations are often used:
ANN | Artificial neural network |
ASR | Automatic speech recognition |
CSR | Continuous speech recognition |
GMM | Gaussian mixture model |
HMM | Hidden Markov model |
IVR | Interactive voice response |
MLP | Multi layer perceptron |
VLSR | Very large speech recognition |
This place covers:
- Recognition, identification of a speaker
- Verification, authentication of a speaker
- Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA], principal components; Feature selection or extraction
- Dialog, prompts, passwords for identification
- Training, model building or enrolment
- Decision making techniques, pattern matching strategies
- Multimodal identification including voice
- Hidden Markov Models
- Artificial neural networks, connectionist approaches
- Pattern transformations and operations aimed at increasing system robustness, e.g. against channel noise, different working conditions
- Identification in noisy condition
- Interactive procedures, man-machine interface, e.g. user prompted to utter a password or predefined text
- Recognition of special voice characteristics, e.g. for use in a lie detector; recognition of animal voices
- Imposter detection
Attention is drawn to the following places, which may be of interest for search:
Information retrieval of audio data | |
Complex mathematical functions | |
Pattern recognition | |
User authentication in security arrangements for restricting access by using biometric data, e.g. voice prints | |
Machine learning | |
Arrangements for image or video recognition or understanding | |
Recognition of biometric, human-related or animal-related patterns in image or video data | |
Individual entry or exit registers, access control with identity check using personal physical data | |
Secret secure communication including means for verifying the identity or authority of a user | |
Secret secure communication including means for verifying the identity or authority of a user | |
Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers |
In this place, the following terms or expressions are used with the meaning indicated:
speaker verification or authentication | refers to verifying that the user claimed identity is real, he is otherwise an impostor. Speaker recognition, or identification, aims at determining who the user is among a closed (finite number) set of users. He is otherwise unknown. |
impostor | someone actively trying to counterfeit someone else's identity. |
In patent documents, the following abbreviations are often used:
ANN | Artificial neural network |
ASR | Automatic speech recognition |
GMM | Gaussian mixture model |
HMM | Hidden Markov model |
IVR | Interactive voice response |
MLP | Multilayer perceptron |
In patent documents, the following words/expressions are often used as synonyms:
- "goat" and "sheep"
- "predator" and "wolf"
In patent documents, the following words/expressions are often used with the meaning indicated:
A goat or sheep | often refers to a person whose voice is easy to counterfeit. |
A wolf or predator | often refers to a person who can easily counterfeit someone else's voice or is often identified as someone else. |
This place covers:
Techniques for the reduction of data from audio sources, i.e. compression of audio. These techniques are applied to reduce the quantity of information to be stored or transmitted, but are independent of the end-application, medium or transmission channel, i.e. do only exploit the properties of the source signal itself or the final receiver exposed to this signal (the listener).
Mainly two types of sources can be distinguished :
"speech only" encompass signals produced by human speakers, and historically was to be understood as mono-channel, single speaker "telephone quality" speech having a narrow bandwidth limited to max. 4kHz. Encoding of speech only sources primarily aim at reducing the bit-rate while still providing fair intelligibility of the spoken content, but not always fidelity to the original.
"Audio signal" is broader and comprises speech as well as background information, e.g. music source having multiple channels. Encoding of audio deals primarily with transparent, i.e. "high fidelity" reproduction of the original signal.
The compression techniques can also be distinguished as being :
Lossy or Lossless, i.e. whether a perfect reconstruction of the source is possible, or only a perceptually acceptable approximation can be done.
The techniques classified in this subclass are based either on modelling the production of the signal (voice) or the perception of it (general audio).
This place does not cover:
Coding of signals within electronic musical instruments |
Attention is drawn to the following places, which may be of interest for search:
Complex mathematical functions | |
Signal processing not specific to the method of recording or reproducing | |
Editing; Indexing; Addressing; Timing or synchronizing; Monitoring; | |
Compression | |
Detecting, preventing errors in received information | |
Quality monitoring in automatic, semi automatic exchanges | |
Quality control of voice transmission between switching centres | |
Simultaneous speech and data transmission | |
Transmission of audio and video in television systems | |
Stereophonic arrangements | |
Stereophonic systems | |
Wireless communication networks |
In this place, the following terms or expressions are used with the meaning indicated:
audio signal | is meant to include speech, music, silence or background signal, or any combinations thereof, unless explicitly specified |
In patent documents, the following abbreviations are often used:
CELP | Code Excited Linear Prediction |
CTX | Continuous transmission |
DTX | Discontinuous transmission |
HVXC | Harmonic Vector eXcitation Coding |
LPC | linear prediction coding |
MBE | Multiband Excitation |
MELP | Mixed Excitation Linear Prediction |
MOS | mean opinion score |
MPEG | Moving Picture Experts Group |
MPEG1 audio | Standard ISO/IEC 11172-3 |
MPEG2 audio | Standard ISO/IEC 13818-3 |
MPEG4 audio | Standard ISO/IEC 14496-3 |
MP3 | MPEG 1 Layer III |
PCM | pulse code modulation |
PWI | Prototype Waveform Interpolation |
SBR | Spectral Band Replication |
In patent documents, the following words/expressions are often used as synonyms:
- " perceptual" and "psychoacoustic"
This place covers:
Coding of a signal with rate adaptation, e.g. adapted to voiced speech, unvoiced speech, transitions and noise/silence portions.
Coding of a signal with a core encoder providing a minimum level of quality, and extension layers to improve the quality but requiring a higher bitrate. It includes parameter based bandwidth extension (i.e. SBR) or channel extension.
This group is in opposition to G10L 21/038 in which the bandwidth extension is artificial, i.e. based on the only narrowband encoded signal.
Attention is drawn to the following places, which may be of interest for search:
Artificial bandwidth extension, i.e. based on the only narrowband encoded signal | |
Spatial sound recording | |
Spatial sound reproduction |
This place covers:
The subgroup deals with speech or voice modification applications, but receives also applications for speech or voice analysis techniques specially adapted to analyse or modify audio signals not necessarily including speech or voice but which are not music signals (G10H).
- Speech or voice modification applications, but receives also applications for speech or voice analysis techniques specially adapted to analyse or modify audio signals, where the audio signals do not necessarily include speech or voice but which are not music signals (G10H).
- Bandwidth extension of an audio signal.
- Improvement of the intelligibility of a coded speech signal.
- Removal of noise from an audio signal.
- Removal of echo from an audio signal.
- Separation of audio sources.
- Pitch, speed modification of an audio signal.
- Voice morphing.
- Visualisation of audio signals (e.g. sonograms).
- Lips or face movement synchronisation with speech (e.g. phonemes - visemes alignment).
- Face animation synchronisation with the emotion contained in the voice or speech signal.
This place does not cover:
Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source-filter models or psychoacoustic analysis |
Examples of places in relation to which this place is residual:
Attention is drawn to the following places, which may be of interest for search:
Direction finder | |
Complex mathematical functions | |
3D Animation | |
Animation based on audio data, talking heads | |
Signal processing not specific to the method of recording or reproducing | |
Signal processing not specific to the method of recording or reproducing, for reducing noise | |
Editing; Indexing; Addressing; Timing or synchronizing; Monitoring | |
Gain control in amplifiers where the control is dependent upon ambient noise level or sound level | |
Reducing echo effect or singing in line transmissions systems | |
Transmission systems not characterised by the medium used for transmission using pulse code modulation, e.g. for reducing noise or bandwidth | |
Reducing noise or bandwidth in transmission systems not characterised by the medium used for transmission | |
Echo suppression in hand-free telephones | |
Hearing aids | |
Public address systems |
In this place, the following terms or expressions are used with the meaning indicated:
viseme | a visual representation of the mouth, lips, tongue and teeth corresponding to a phoneme. |
In patent documents, the following abbreviations are often used:
BSS | Blind source separation |
LDA | Linear discriminant analysis |
NB | Narrowband |
PCA | Principal component analysis |
SBR | Spectral Band Replication |
WB | Wideband |
This place covers:
Visemes are selected to match with the corresponding speech segment, or the speech segments are adapted/chosen, to match with the viseme. This symbol also encompasses the coarticulation effects as used in facial character animation or talking heads.
Attention is drawn to the following places, which may be of interest for search:
Facial character animation per se |
This place covers:
Bandwidth extension taking place at the receiving side, e.g. generation of artificial low or high frequency components, regeneration of spectral holes, based on the only narrowband encoded signal. This is in opposition with G10L 19/24 wherein parameters are computed during the encoding step to enable bandwidth extension at the decoding step.
Attention is drawn to the following places, which may be of interest for search:
Parameter based bandwidth extension (e.g. SBR) |
This place covers:
- Processing of speech or voice signals in general, in particular detection of a speech signal, end points detection in noise, extraction of pitch, measure of the voicing, emotional state, voice pathology or other speech or voice related parameters.
- Extracted parameters, e.g. techniques for evaluating correlation coefficients, zero crossing, prediction coefficients or formant information.
- Analysis technique, e.g. neural network, fuzzy, chaos, genetic algorithm or coding technique.
- Analysis window (window function).
- Specially adapted for particular use, e.g. for comparison and discrimination, evaluating synthetic and decoded voice signals, for transmitting result of analysis.
- Speech or voice analysis techniques specially adapted to analyse audio signals, where the analysed audio signals do not necessarily include speech or voice, such as audio scene segmentation, jingle detection, separation from music or noise or detection of particular sounds.
- Modeling vocal tract parameters.
- Detection of presence or absence of speech signals.
- Pitch determination of speech signals.
- Discriminating between voiced and unvoiced parts of speech signals.
This place does not cover:
Muting semiconductor-based amplifiers when some special characteristics of a signal are sensed by a speech detector, e.g. sensing when no signal is present |
Attention is drawn to the following places, which may be of interest for search:
Comfort noise | |
Karaoke or singing voice processing, parameter extraction for musical signal categorisation, electronic musical instruments | |
Gain or frequency control | |
DTX communication | |
Switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems | |
Multiplex systems |
In this place, the following terms or expressions are used with the meaning indicated:
audio signal | of or relating to humanly audible sound, e.g. it comprises any combination of background noise or silence, voice or speech, music |