Speaker-independent isolated word speech recognition with Cascade Network of Temporal Sequence Sensitive Neurons

 
The Temporal Sequence Sensitive Neurons (TSSN) are Spiking Neurons which could be trained to recognize spatio-temporal input.

Network of TSSN is created with purpose of isolated word recognition. The words pronunciation could differ by timbre, speech rate, etc.

A "positive-negative" couple of spiking neurons is associated with every one of the words - "one", "two", "three" and "four". The spiking neuron couple is cascade correlated. It consists of positive and negative neurons. The positive neuron is trained to generate spike only when the destination pattern sequence is received. The negative neuron should fire only if the received pattern sequence is not destination. Both neurons has the same inputs, but the positive neuron takes additional input from the negative one. More complex cascade correlation between all neurons it is possible (and usable in some cases).

Every word is given in several variants with different pronunciation, speech rate, by different speaker etc.

All words are represented as sequence of patterns. There are 32 possible patterns. Each of them activates different spiking neuron input (synapse) and  represents specific class of sounds. Here is the process of converting spoken word to sequence of patterns:

  • The speech signal is separated to frames.

  • Then the short term mel-frequency spectrum is calculated for every frame.

  • Trained in advance competitive neural network is used in order to classify the sound frame by its mel-frequency spectrum. Knowing the class we know witch spiking neuron input should be activated.

So the pattern is ready.

Every picture shown on the right side of the page represents membrane activity at time for both neurons - the "positive-negative" couple. The upper part of picture shows activity of the positive neuron. Below is negative neuron activity. There are three horizontal lines for every neuron. The central line represents zero activity. The dashed lines corresponds to positive and negative threshold. If the membrane potential crosses the positive threshold, an output spike is generated from the neuron. If the positive neuron fires, then the word is positively recognized.

 

 


Recognition process of word "one" from spiking neuron couple with destination the word "two".

The sequence is not recognized by the positive neuron. The negative neuron fires showing that the received sequence  is not destination.


Recognition process of word "two" from spiking neuron couple with destination the word "two"

The positive neuron fires showing that the received sequence  is  destination. The negative neuron not fires.


Recognition process of word "three" from spiking neuron couple with destination the word "two"

The sequence is not recognized by the positive neuron. The negative neuron fires showing that the received sequence  is not destination.


Recognition process of word "four" from spiking neuron couple with destination the word "two"

The sequence is not recognized by the positive neuron. The negative neuron fires showing that the received sequence  is not destination.


Recognition process of another word "three" from spiking neuron couple with destination the word "two"

The sequence is not recognized by the positive neuron. The negative neuron fires showing that the received sequence  is not destination. The membrane potential curve is similar, but not the same as in previous example of testing with word "three". See above.