NXC: "Speech"-Recognition

Discussion specific to projects ideas and support.
HaWe
Posts: 2500
Joined: 04 Nov 2014, 19:00

NXC: "Speech"-Recognition

Post by HaWe »

hi folks,
a funny little program which introduces to speech recognition by the NXT.
Actually - it's of course not speech recognition, it's more "Rhythm Detection".

Construction: Sound Sensor at port S2.

http://www.mindstormsforum.de/viewtopic.php?f=70&t=6386
Attachments
Oscillographs.jpg
different oscillographs from spoken words
(208.15 KiB) Downloaded 182 times
Last edited by HaWe on 09 Jun 2013, 15:50, edited 43 times in total.
HaWe
Posts: 2500
Joined: 04 Nov 2014, 19:00

Re: NXC: "Speech"-"Recognition"

Post by HaWe »

as the pattern of my sound recordings is an oscillation of differnent noise levels (Noise Vibration), I got the idea to use a Fast Fourier Transformation (FFT) for characterizing my SoundRec[400] array.
Unfortunately I have no experience with FFT's at all, and the underlying maths are quite nebulous to me.

IIUC, a FFT approximates a vibration by a sum of sinus waves of different frequencies.
(f1, f2, f3,..., each frequency (resp. wavelength) twice as long as the previous one),
each term multiplied by a specific coefficient.
FFT(t) = c1*sin(f1(t)) + c2*sin(f2(t)) + c3*sin(f3(t)) +...+ cn*sin(fn(t))
As my RecordLenght consists of 400 samples, I suppose the frequencies (resp. wavelengths) could be
f1=1
f2=2
f3=4
f4=8
f5=16
f6=32
f7=64
f8=128
f9=256
That (at least up to f16) should fit, so I have to handle n=9(-16?) terms with 9(-16?) frequencies and 9(-16?) coefficients for 400 noise level samples.

Can anybody tell me how to implement a FFT algorithm for these conditions?
kvols
Posts: 29
Joined: 14 Oct 2010, 22:09

Re: NXC: "Speech"-"Recognition"

Post by kvols »

Hi doc

I wrote one in Lejos some time ago, and it works pretty well under the given circumstances (small processor, very limited amount of space, coarse sampling frequency etc.). There are lots of FFT algorihms out there, but you'll probably need to do some translation.

Google for FFT numerical recipes:
http://www.google.com/search?q=FFT+numerical+recipes

There is some explanation here:
http://en.wikipedia.org/wiki/Fast_Fourier_transform

Best of luck!

;-) Povl
gloomyandy
Posts: 323
Joined: 29 Sep 2010, 05:03

Re: NXC: "Speech"-"Recognition"

Post by gloomyandy »

For the talk we gave about leJOS at JavaOne a year or so ago, Roger created a demo that used "speech recognition". It wasn't as sophisticated as what Doc has planned but it worked pretty well and had a number of people fooled until we told them how it worked. A video of our test (and backup if we had problems on the day) is here:
http://www.youtube.com/watch?v=sjPzcmWSfQs
Some clips from the actual talk are here:
http://www.youtube.com/watch?v=fJD6vGHKLTQ
The voice demo starts about 4:30 into the clip.

Andy
HaWe
Posts: 2500
Joined: 04 Nov 2014, 19:00

Re: NXC: "Speech"-"Recognition"

Post by HaWe »

well, what was your algorithm like?
Mine is based on the sum of the least square deviations of loudness patterns, and it works quite well as you may have observed. Notice, that the Lego Sound Sensor doesn't detect frequencies but just loudness oscillations (dbA) - nevertheless the recognition works (in a well-defined sub-population of rhythmically concise spoken words)!

But something like a FT oder FFT seems to be even more promising. Any ideas for a FT or FFT with 10 (max 20) terms (coefficients, frequencies)...?
I'm not a programmer and not a mathematician, and I already googled a lot but didn't find something suitable.
gloomyandy
Posts: 323
Joined: 29 Sep 2010, 05:03

Re: NXC: "Speech"-"Recognition"

Post by gloomyandy »

Hi Doc,
Sorry I'm not sure how the algorithm worked. It was Roger's demo so I'll drop him so mail to find out....

Andy
HaWe
Posts: 2500
Joined: 04 Nov 2014, 19:00

Re: NXC: "Speech"-"Recognition"

Post by HaWe »

new version with oscillograph (revised version) :)
mightor
Site Admin
Posts: 1079
Joined: 25 Sep 2010, 15:02
Location: Rotterdam, Netherlands
Contact:

Re: NXC: "Speech"-"Recognition"

Post by mightor »

new version with oscillograph (revised version) :)
Are the graphs with our without a German accent? :)

This is pretty cool stuff.

- Xander
| My Blog: I'd Rather Be Building Robots (http://botbench.com)
| RobotC 3rd Party Driver Suite: (http://rdpartyrobotcdr.sourceforge.net)
| Some people, when confronted with a problem, think, "I know, I'll use threads,"
| and then two they hav erpoblesms. (@nedbat)
HaWe
Posts: 2500
Joined: 04 Nov 2014, 19:00

Re: NXC: "Speech"-"Recognition"

Post by HaWe »

accent?
what is "accent" ?
;)
HaWe
Posts: 2500
Joined: 04 Nov 2014, 19:00

Re: NXC: "Speech"-"Recognition"

Post by HaWe »

Hi,
what do you think: what would be the best way to transfer al those sound arrays as a file to the PC,
e.g. 10 samples of each of 6 spoken words = 60 arrays[400] ?

in order to process the data on an external computer (by Excel or a ANSI C++ program) .
I think a text file with a separation of all numbers by ";" would be ok.
Post Reply

Who is online

Users browsing this forum: No registered users and 7 guests