I want to write a basic Speech Recognition software which can convert speech to text. I wanted to know which language is most suited to write such a software. Is Java suited for this job?
edit : Thank you all for the responses. I want to build a tool for a college project. I don't want to write it from scratch. Just want to demo the power of Speech Recognition. The tool should just write whatever a user says on a text editor like notepad. It need not be too accurate. I just want to experiment and learn the various algorithms behind Speech Recognition as I find this field very interesting.
Thank you, Deepak
-
Java may be suited for an interface to it but speech recognition requires seriously raw grunt. I'd be choosing a compiled close-to-the-metal language like C for the actual recognition engine.
This is not something to be undertaken lightly, by the way. There's an awful lot of theory you'll need to learn even before you begin. Myself, I would license one of the existing engines if possible, and concentrate on building a decent product around it.
That's if your intent is to build a product. If you just want to experiment, by all means write your own. It'll be fun (up to a point :-).
MarkusQ : @Pax feel free to comment on my contrarian response to your answer.Neil Coffey : Agree with the idea that licensing an existing engine is probably more productive. But you know that nowadays, Java is also effectively a natively "compiled" language, right? In any case, people have written engines in Python before now... -
I think that java can be a good option, it all depends on how will you receive the input. There are some nice librarys for sounds in Java.
The language is not going to be the problem because it will be a matter of recognizing the patterns. If java is the language you are most familiar with, I would use it.
-
A agree with almost everything Pax said, so I'm going to be contrarian and argue for the opposite. The conventional wisdom is that speech recognition "requires seriously raw grunt" and it may be because this is true.
But it also may be that everyone believes that because that's how it's always been done. Arguing from the fact that the human brain doesn't do huge amounts of brute force data churning to recognize speech, I would suggest that there exist clever feature extraction algorithms to do the job much more efficiently.
If that is the case, and if you seek to find such an algorithm, a higher level language may be better suited to the task. Anything you loose in efficiency you'll make up and more in algorithmic expressiveness.
That said, he's probably right.
paxdiablo : I don't disagree with anything you state, MarkusQ. But the brain achieves its grunt with massive parallelism - we could try to spawn 100 billion threads in the JRE but I'm not sure how well it would go (yes, I know it's reductio ad absurdum :-). I do agree with your contention we should try.MarkusQ : @Pax It's an interesting question. We certainly don't use 10^11 threads worth of neurons to do speech recognition; out of 10^11 neurons we probably use on the order of 10^9 for speech recognition at a transition rate of maybe 10^2/sec each; my quadcore desktop easily hits 10x that...Jiri : I disagree. In automatic speech recognition field - the best approaches are not too clever and depends on huge amount of training data and computation. It is not because of people like it, but because results are better. But we are on the way with Moore law in the pocket :-)MarkusQ : @Jiri -- You aren't disagreeing, you're making the same point. Even though it's fairly simple to show that it _could_ be done cheaply if you're clever (just compute the MIPS of brain hardware), we do it the expensive way because the results are better. Ergo, we aren't being clever _enough_. -
I agree with Pax that this is potentially quite a big project, and that the most practical solution is probably to just licence an existing engine.
If the scope of what you want to do is just distinguish between a few previously known possible utterances, it's a significantly smaller project, but still considerable.
But... if you decide you really really really do want to start developing your own, I can't see a reason not to use Java. The idea that "C is faster" is largely a myth (or based on out-of-date information).
-
My students are using Sphinx. It is written in Java (a port from C++ I believe). It might not be suitable for what you want (I think you would need to create your own dictionary) but worth checking out.
-
Java is turing complete so it can handle every programming job. Whether you want to do something in Java is entirely up to you.
-
We had moderate success with Shynx framework written in Java, but the real hard work lies in understanding algorithms and math involved in the area and then in fine tuning engine to your particular needs.
0 comments:
Post a Comment