NXC: "Speech"-Recognition
NXC: "Speech"-Recognition
hi folks,
a funny little program which introduces to speech recognition by the NXT.
Actually - it's of course not speech recognition, it's more "Rhythm Detection".
Construction: Sound Sensor at port S2.
http://www.mindstormsforum.de/viewtopic.php?f=70&t=6386
a funny little program which introduces to speech recognition by the NXT.
Actually - it's of course not speech recognition, it's more "Rhythm Detection".
Construction: Sound Sensor at port S2.
http://www.mindstormsforum.de/viewtopic.php?f=70&t=6386
- Attachments
-
- Oscillographs.jpg
- different oscillographs from spoken words
- (208.15 KiB) Downloaded 182 times
Last edited by HaWe on 09 Jun 2013, 15:50, edited 43 times in total.
Re: NXC: "Speech"-"Recognition"
as the pattern of my sound recordings is an oscillation of differnent noise levels (Noise Vibration), I got the idea to use a Fast Fourier Transformation (FFT) for characterizing my SoundRec[400] array.
Unfortunately I have no experience with FFT's at all, and the underlying maths are quite nebulous to me.
IIUC, a FFT approximates a vibration by a sum of sinus waves of different frequencies.
(f1, f2, f3,..., each frequency (resp. wavelength) twice as long as the previous one),
each term multiplied by a specific coefficient.
FFT(t) = c1*sin(f1(t)) + c2*sin(f2(t)) + c3*sin(f3(t)) +...+ cn*sin(fn(t))
As my RecordLenght consists of 400 samples, I suppose the frequencies (resp. wavelengths) could be
f1=1
f2=2
f3=4
f4=8
f5=16
f6=32
f7=64
f8=128
f9=256
That (at least up to f16) should fit, so I have to handle n=9(-16?) terms with 9(-16?) frequencies and 9(-16?) coefficients for 400 noise level samples.
Can anybody tell me how to implement a FFT algorithm for these conditions?
Unfortunately I have no experience with FFT's at all, and the underlying maths are quite nebulous to me.
IIUC, a FFT approximates a vibration by a sum of sinus waves of different frequencies.
(f1, f2, f3,..., each frequency (resp. wavelength) twice as long as the previous one),
each term multiplied by a specific coefficient.
FFT(t) = c1*sin(f1(t)) + c2*sin(f2(t)) + c3*sin(f3(t)) +...+ cn*sin(fn(t))
As my RecordLenght consists of 400 samples, I suppose the frequencies (resp. wavelengths) could be
f1=1
f2=2
f3=4
f4=8
f5=16
f6=32
f7=64
f8=128
f9=256
That (at least up to f16) should fit, so I have to handle n=9(-16?) terms with 9(-16?) frequencies and 9(-16?) coefficients for 400 noise level samples.
Can anybody tell me how to implement a FFT algorithm for these conditions?
Re: NXC: "Speech"-"Recognition"
Hi doc
I wrote one in Lejos some time ago, and it works pretty well under the given circumstances (small processor, very limited amount of space, coarse sampling frequency etc.). There are lots of FFT algorihms out there, but you'll probably need to do some translation.
Google for FFT numerical recipes:
http://www.google.com/search?q=FFT+numerical+recipes
There is some explanation here:
http://en.wikipedia.org/wiki/Fast_Fourier_transform
Best of luck!
Povl
I wrote one in Lejos some time ago, and it works pretty well under the given circumstances (small processor, very limited amount of space, coarse sampling frequency etc.). There are lots of FFT algorihms out there, but you'll probably need to do some translation.
Google for FFT numerical recipes:
http://www.google.com/search?q=FFT+numerical+recipes
There is some explanation here:
http://en.wikipedia.org/wiki/Fast_Fourier_transform
Best of luck!
Povl
-
- Posts: 323
- Joined: 29 Sep 2010, 05:03
Re: NXC: "Speech"-"Recognition"
For the talk we gave about leJOS at JavaOne a year or so ago, Roger created a demo that used "speech recognition". It wasn't as sophisticated as what Doc has planned but it worked pretty well and had a number of people fooled until we told them how it worked. A video of our test (and backup if we had problems on the day) is here:
http://www.youtube.com/watch?v=sjPzcmWSfQs
Some clips from the actual talk are here:
http://www.youtube.com/watch?v=fJD6vGHKLTQ
The voice demo starts about 4:30 into the clip.
Andy
http://www.youtube.com/watch?v=sjPzcmWSfQs
Some clips from the actual talk are here:
http://www.youtube.com/watch?v=fJD6vGHKLTQ
The voice demo starts about 4:30 into the clip.
Andy
Re: NXC: "Speech"-"Recognition"
well, what was your algorithm like?
Mine is based on the sum of the least square deviations of loudness patterns, and it works quite well as you may have observed. Notice, that the Lego Sound Sensor doesn't detect frequencies but just loudness oscillations (dbA) - nevertheless the recognition works (in a well-defined sub-population of rhythmically concise spoken words)!
But something like a FT oder FFT seems to be even more promising. Any ideas for a FT or FFT with 10 (max 20) terms (coefficients, frequencies)...?
I'm not a programmer and not a mathematician, and I already googled a lot but didn't find something suitable.
Mine is based on the sum of the least square deviations of loudness patterns, and it works quite well as you may have observed. Notice, that the Lego Sound Sensor doesn't detect frequencies but just loudness oscillations (dbA) - nevertheless the recognition works (in a well-defined sub-population of rhythmically concise spoken words)!
But something like a FT oder FFT seems to be even more promising. Any ideas for a FT or FFT with 10 (max 20) terms (coefficients, frequencies)...?
I'm not a programmer and not a mathematician, and I already googled a lot but didn't find something suitable.
-
- Posts: 323
- Joined: 29 Sep 2010, 05:03
Re: NXC: "Speech"-"Recognition"
Hi Doc,
Sorry I'm not sure how the algorithm worked. It was Roger's demo so I'll drop him so mail to find out....
Andy
Sorry I'm not sure how the algorithm worked. It was Roger's demo so I'll drop him so mail to find out....
Andy
Re: NXC: "Speech"-"Recognition"
new version with oscillograph (revised version) :)
Re: NXC: "Speech"-"Recognition"
Are the graphs with our without a German accent?new version with oscillograph (revised version)
This is pretty cool stuff.
- Xander
| My Blog: I'd Rather Be Building Robots (http://botbench.com)
| RobotC 3rd Party Driver Suite: (http://rdpartyrobotcdr.sourceforge.net)
| Some people, when confronted with a problem, think, "I know, I'll use threads,"
| and then two they hav erpoblesms. (@nedbat)
| RobotC 3rd Party Driver Suite: (http://rdpartyrobotcdr.sourceforge.net)
| Some people, when confronted with a problem, think, "I know, I'll use threads,"
| and then two they hav erpoblesms. (@nedbat)
Re: NXC: "Speech"-"Recognition"
accent?
what is "accent" ?
;)
what is "accent" ?
;)
Re: NXC: "Speech"-"Recognition"
Hi,
what do you think: what would be the best way to transfer al those sound arrays as a file to the PC,
e.g. 10 samples of each of 6 spoken words = 60 arrays[400] ?
in order to process the data on an external computer (by Excel or a ANSI C++ program) .
I think a text file with a separation of all numbers by ";" would be ok.
what do you think: what would be the best way to transfer al those sound arrays as a file to the PC,
e.g. 10 samples of each of 6 spoken words = 60 arrays[400] ?
in order to process the data on an external computer (by Excel or a ANSI C++ program) .
I think a text file with a separation of all numbers by ";" would be ok.
Who is online
Users browsing this forum: No registered users and 0 guests