Wednesday, June 10, 2009


I get bored easily, especially when I have to do something that occupies my hands but not my mind -- washing dishes, folding laundry, driving, and mowing the lawn are all common cases. Sometimes I just want to reflect, or even just vegetate, but usually I want some kind of input to keep my mind occupied.

Therefore, it was a great boon to me when I discovered books on tape. Not only did it give me something to think about while performing manual tasks, it also gave me a chance to catch up on all the classics that I had shirked reading in college. Libraries usually have a good collection of classics, especially literature, so I got to read a lot of Charles Dickens, Jane Austen, and other famous novels.

After 20 years, however, I'm starting to run out of material. There are plenty more books on tape, but they tend to be recent novels, mostly mysteries, that I don't care that much for. If I wanted to read the classics, there are plenty of places where I could download them for free: (Gutenberg, etc.). Wouldn't it be nice if I could convert these texts into audio?

Fortunately, there is a way, and it doesn't require buying any software. The Festival project at the University of Edinburgh is a complete text-to-speech (tts) system. You can get it to read your computer's output, but you can also get it to read text files, including ebooks.

Ubuntu, the most popular Linux distribution, has finally provided an up-to-date package for Festival. In the past, I've had to compile it, which is difficult but not impossible. If you want to run it in Windows, you'll have to install a compatibility layer such as cygwin or mingw and compile it. There doesn't seem to be any danger that Ubuntu or other Linux distributions will fall behind in the future, because Festival hasn't produced any new versions (including version 2.0, which is supposed to be almost done) for years. In fact, I'm a little concerned that the project is dead; although, since the latest version is very good, I'll be happy as long as it is available.

Festival commands seem designed to be as difficult as possible to figure out, but there are some good guides to using Festival online here, here, and at the main Festival page linked above. As with almost everything Linux, Festival is highly modularized -- you can use different backends, different interfaces, different voices, and different dictionaries. The most important change you'll want to make is to use the amazing CMU Arctic voices (instructions here), which are a quantum leap above the included voices. Do the CMU arctic voices still sound robotic? I don't think you're going to mistake it for a human voice, but it is surprisingly pleasant. What amazes me is how much natural intonation the sentences have. You might expect the individual words to sound natural, since the program is based on a dictionary of words with specific pronunciation rules, but sentences would logically be harder to parse and translate into normal, flowing speech.

Festival comes with a utility called text2wav that converts a text file into an audio rendering. Of course, this will result in a huge file if you convert it all at once, and it may even cause memory overflows. On this site, you can find a little perl script that will convert the text in little pieces and encode it to mp3 format, which is much smaller. Then all you have to do is copy it to your mobile device and listen -- using a mobile broadcaster, if necessary.

No comments:

Post a Comment