Tag Archives: Speech Technology

Voice Tagging Images

I find Microsoft Recite very elegant, and it presents an interesting fork in the many applications of “voice recognition.”

At its most basis level, voice recognition software analyzes speech, and matches known words to their distinct set of syllables. Voice tagging is essentially a less precise implementation of voice recognition. Recite doesn’t match sounds to words, it matches sets of sounds to other larger sets of sounds that function like large tag clouds. The result is more available data for algorithms and thus more effective results.

The reason why I enjoy the simplistic approach of Recite, is that there are far more numerous and effective ways to communicate and store information than words. Pictures and other visualizations are the most effective communicators, and contain a large amount of intrinsic data (say a thousand words or so). It is obvious that tagging images with voice (and written words) is advantageous because the tags can be easily generated and recalled since they are based on the image; additionally, the associated voice data is already summarized in the image itself.

For these reasons, I feel that voice tagging is an effective method of manipulating images and visualizations. However, this really is only a valid solution when the cost & effort of creating voice tags is relatively low, or zero if it is part of a naturally occurring process as it is with recorded notes in Recite.

Reblog this post [with Zemanta]