The Businessmakers Radio Show

Featuring entrepreneurial resources & hundreds of interviews with make it happen entrepreneurs

Producing Speech for Those Who Can’t

Creating the most advanced synthesized voice using human speech.

Matthew Aylet

Listen Now

This text will be replaced

Extras:

Share:

Summary:

Speech synthesis expert Dr. Matthew Aylett founded CereProc (Cerebral Processing), believing that speech technology could benefit from recent advances in speech synthesis. He was right. CereProc specializes in text-to-speech technologies and, since receiving a grant from the Scottish Government, has experienced double- and triple-digit year-on-year growth. Aylett believes that in today’s environment, producing speech synthesis is not merely a science, but that it is dynamic and is an art. This may sound like a technology story, but it is also a story of compassion.

Full Interview text

Russ: This is the BusinessMakers Show heard here and online at theBusinessMakers.com. And it's guest time on the show and I happen to be out on the road at the University of Edinburgh in Scotland and my guest is Dr. Matthew Aylett, cofounder and Chief Technology Officer of Cereproc. Matthew, welcome to The BusinessMakers Show.

Matthew: Hi Russ. Great to be here.

Russ: Why don't you start by telling our audience about Cereproc?

Matthew: So Cereproc is a company that makes speech synthesis. Speech synthesis is a technology which takes text that you might type into a computer and then turns it into speech, which hopefully sounds as close to a human's speech as possible.

Russ: Okay.

Matthew: So this has many different types of applications in many different type of technical areas and it has changed a lot over the years. Some time ago, people may be quite familiar with Stephen Hawking's voice -

Russ: Right.

Matthew: - which is a synthesized voice.

Russ: Right.

Matthew: His voice is actually from the 1980s and since then our synthesis technology has really improved a lot. It sounds much more natural and flowing. To a large extent, it's actually interesting - Stephen Hawking uses that voice because it's now his voice.

Russ: Right.

Matthew: And there's an element where the voice that we produce is actually really key to what we are as people. So producing speech synthesis is not just a science. To a certain extent it's also a bit of an art.

Russ: Okay. You've certainly been in the news quite a bit lately because of your new client, Mr. Roger Ebert.

Matthew: That's right. One of the things that we've done quite a lot of - because we were interested in trying to retain the character in the voice and to have the speech produce the character when it comes out of the computer, we've done a lot of what we might call voice cloning. So this taking audio from someone and then turning it into a synthetic voice so you can type in anything you want. We did an example of this with George Bush, for example, using his presidential addresses.

Russ: Right.

Matthew: And Roger came across the George Bush site and also tested our synthesis out on our online demo on our website and he got in touch with us because Roger had thyroid cancer operation several years ago and lost the power of speech and for Roger, who's a big communicator and broadcaster, this is a thing that he really wants to try and overcome.

Russ: Absolutely.

Matthew: But also he's in a special case of having a lot of broadcast material that he produced over the years. So we were given the commentaries that he made of DVDs and we were able to take that audio and produce a voice when he types it in it sounds like Roger.

Russ: That is so cool. I know he recently was on Oprah and might want to talk about what that did to your company. I understand you got a few more website hits but for those in our audience who don't know who Roger Ebert is, he's the famous movie critic from up in Chicago and as Matthew just explained, this actual surgery that he had awhile back completely eliminated his capability to speak, correct?

Matthew: That's right, yeah.

Russ: So this is really, really interesting. When he first contacted the company, I mean, you're here in Scotland; did you guys even know who he was?

Matthew: No, I actually never heard of him. And it was only one of my engineers who said, "Well, I think Roger Ebert's quite well known because he mentioned us on his blog and our hits on our website suddenly went through the roof."

Russ: [Laughter]

Matthew: It was like, "Oh, right, okay. We better do a good job then, here, you know."

Russ: So this real interesting technology. I mean you have all of this audio from him but it's probably not just the ability to go out and pick words, so you can put them together. I mean, the audio is probably captured in very different environments and him in a very different frame of mind and so forth. So how do you make that really sound like a natural speech from him?

Matthew: That's a very good question because what you find is that every time someone produces a sentence, they produce it differently. Even the same sentence. The same person says the same sentence twice; they never say it in quite the same way.

Russ: That's correct.

Matthew: However the advantage we have is that there's only about 45 or so sounds in English. Each one of those sounds varies subtly and can be quite different in different contexts. So what we have to do is we have to take a lot of audio, cut it up into lots of these tiny little sounds and then when we produce a new sentence we have to cleverly select the right sounds that will go together to produce a sentence which sounds as natural as possible. Because there's only 45 sounds, we can build up words that he's never used -

Russ: Ah ha. Wow. Okay.

Matthew: - and produce them. And in fact in many cases it sounds absolutely verbatim as if he had produced them.

Russ: Wow.

Matthew: The difficulty with his data was of course it was recorded at different times in different studios with different microphones and when we put these little sounds together, you can have a sound which is, you know, from one place - maybe recorded two years earlier being put right next to a sound recorded years after that.

Russ: Right.

Matthew: So getting the sounds to sort of be as close as possible to each other is pretty tricky and that's one of the big challenges we had to face.

Russ: But with Roger Ebert, who decides if it sounds correct or not?

Matthew: Well that's very interesting and especially - one of the big exciting things for us with this project was working with someone who really, you know, didn't have their voice and needed to replace it.

Russ: Right.

Matthew: Because they're one of the biggest critics of the technology.

Russ: Sure.

Matthew: And also for him, he's a famous communicator.

Russ: Right.

Matthew: So the way he says something is really important to him.

Russ: Sure.

Matthew: So part of the technology also allows him to change the way the synthesis comes out. So he can take a sentence. He can synthesize it. If he doesn't like the way it says it, he can change it.

Russ: Ah!

Matthew: He can synthesize it again, tweak a couple of words and get it sound different; change the emphasis here, change the intonation here and so on.

Russ: So you even actually provide him with a control panel to change the sound?

Matthew: That's right. We give him an editor which allows him to try and control this which it is obviously maybe not so good if he just wants to say, "Can I have a cup of coffee," but on the other hand if he was trying to produce his blog, for example, audio version -

Russ: Right.

Matthew: - it allows him to really take a lot of trouble over it and make it sound the way he wants it to sound.

Russ: But in the case of Roger Ebert, he will be able to communicate and sound like himself but he has to do this always by carrying some sort of keyboard, a laptop around and actually keying in the sentence and words that he wants to say, correct?

Matthew: That's right. So at the moment it's still quite a slow process. I'm hoping that in the future, maybe, we will be able to increase that speed. But of course for Roger it's also very important because it's not just producing speech immediately. He also produces a blog and for example, the ability to produce a synthesized version of his blog and for him to actually tweak the synthesis and make it sound exactly the way he wants to is also very important.

Russ: Wow. And this process have people from Cereproc actually met Roger Ebert or is it just all been over the Internet sort of thing?

Matthew: It's all been completely remote.

Russ: And I would suspect that there might be other people like Roger Ebert that are really tuned into this and perhaps also might be tuned into Cereproc now as well.

Matthew: It's very interesting because as I was saying, since this can be used in lots of different sorts of areas and technologies - so for example, reading out websites, eyes free for example on mobiles is becoming very important area for us.

Russ: Right, sure.

Matthew: We have had a lot of contact with people who are in the same position, who have - who are losing their voice or about to lose their voice - and we're hoping in the future to develop technology which allows us to produce synthetic voices with much less audio, because Roger's in a special case. Not everyone happens to have, you know, hours of broadcast material.

Russ: Correct. Well it's really interesting. I'm talking to Dr. Matthew Aylett, cofounder and CTO of Cereproc and you're listening to The BusinessMakers Show, heard here and online at theBusinessMakers.com.

[Aflac Commercial]

Russ: This is the BusinessMakers Show heard here and online at theBusinessMakers.com and continuing on with Dr. Matthew Aylett, CTO and cofounder of Cereproc. I suppose before, you know, Roger Ebert showed up as a client of Cereproc there were all sorts of applications that you guys focused on and thought about and continued to develop?

Matthew: So speech synthesis is interesting because it's a key technology. There are lots of things you just can't do without producing speech. If you want to communicate with someone and you can't give them text, you have to tell them.

Russ: Right.

Matthew: That's the only way to do it. So if you want a computer to communicate with someone and they can't read it, you have to use speech synthesis. So it's like a component technology. It's potentially available and required in lots of different applications. So we have customers that vary, for example, Scottish exams uses our technology to read exam papers out to dyslexic children -

Russ: Wow.

Matthew: - in Scotland, using a Scottish accent because that's more acceptable -

Russ: Right.

Matthew: - in Scotland.

Russ: Right.

Matthew: We're talking to car manufacturers that were interested in putting voices in cars which had more personality.

Russ: Right.

Matthew: Not so annoying, really -

Russ: Right.

Matthew: - and so sort of neutral sounding. We've worked with University of Southern California - I've been working on full-size avatars, or some, you know, digital pictures of people -

Russ: Right.

Matthew: - that can speak with -

Russ: Wow.

Matthew: - heads that move about and they sort of almost trying to recreate the sort of a virtual person.

Russ: Right.

Matthew: The number of different applications are really immense and this has affected the way we set the company up and the way we've approached things as well, I think -

Russ: Right.

Matthew: - within Cereproc.

Russ: Golly, it just seems to me, though, Matthew, that using speech synthesis that sometimes you say a word in a nice sort of way and sometimes you say a word in a angry sort of way. So when you're building the database of these words to be able to produce an output, do you actually have recordings of both in a nice tone of voice and in an angry tone of voice for the same word?

Matthew: Yeah, so we don't need to use the same word but we do need to have a lot of the sounds available.

Russ: Ah!

Matthew: And what we do do is we don't do different emotions. We don't do angry and sad.

Russ: Right.

Matthew: But what we do do is, for example, stress, stress voice quality when people speak with a stressed voice -

Russ: Okay.

Matthew: - they're normally a bit tense.

Russ: Right.

Matthew: Or, for example, you can speak with a breathy voice.

Russ: Okay.

Matthew: Because breathy voice people think, "Hello, what's going on here?" Right?

Russ: Right.

Matthew: So these two sort of voice qualities have a big impact on people when they listen to them in synthesis.

Russ: So you would actually use the same recorded word and you would put noise around it that sounded breathy?

Matthew: We would actually record different bits of audio which have those voice qualities.

Russ: Okay.

Matthew: And then when we build the sentence out, we select loads of these little tiny sounds and put them together.

Russ: Right.

Matthew: If we wanted a sound, say angry, we might change it so that it's high-pitched, it's faster, it's louder and then select the stressed voice quality and put it in the back and the result of this is people listen to it and go, "Well, they sound a bit snippy -

Russ: Right.

Matthew: - you know, quite a bit cross."

Russ: Right.

Matthew: Right?

Russ: Right.

Matthew: And this variation is really important for us. It's one - it's really, I would guess, one of our unique points of the technology that we're producing.

Russ: Wow. Wow. It kind of confirms that the human brain and voice and the way we do all that is a pretty sophisticated instrument, correct?

Matthew: Well yes, it's also maybe a change in the technology. We have synthesis, which has been very much concentrating on just getting information to people -

Russ: Right, right.

Matthew: - which is fine. I mean there's nothing wrong with trying to do that.

Russ: Right.

Matthew: But if you want to take it a step further and you want to interact with computers in a deeper way, then you have to produce speech which gives it more of a sense of character, more of a sense of this - there being something to interact with.

Russ: Right. That even brings me back again to Roger Ebert. When you're working on - as I understand that project is not complete yet, correct?

Matthew: No, that's correct. We still got another three months or so to work on his voice and really get it up to scratch.

Russ: Obviously the market opportunities and the need for this sort of capability is extremely broad. How in the world do you, and the Cereproc management team, decide which direction and what does the future look like for Cereproc?

Matthew: Well when we set out, we decided we wanted to produce a component company. So we regarded speech technology as a key component, broad component, that we would sell in any sort of field which is maybe not as fashionable as going for a very high vertical, looking for a customer in a particular need.

Russ: Right.

Matthew: But key to us, because we saw controlling this technology and really developing as really important for our overall objectives. So we've been organically grown company. We have a lot of different customers and we - we're a profitable company. We're now beginning to sort of think, "Well, you know, some of those verticals look pretty attractive, actually and given the potential for speech synthesis to be a real game-changer with various different elements, for example in social networking, within the web -

Russ: Right.

Matthew: - eyes free, mobile -

Russ: Right.

Matthew: - we are considering looking at a vertical and maybe in a year or so maybe getting more traditional investment funding and really sort of going for that vertical with our technology.

Russ: That's fantastic. I'm talking with Dr. Matthew Aylett, CTO of Cereproc and we'll be back with more with Matthew after this. You're listening to The BusinessMakers Show, heard here and online at theBusinessMakers.com.

[Aflac Commercial]

Russ: This is the BusinessMakers Show heard here and online at theBusinessMakers.com and my guest is Dr. Matthew Aylett, CTO of Cereproc. Is this where you envisioned your life's efforts ending up in some innovative company like this? Or did you end up here by a very circuitous path?

Matthew: It's actually been fairly circuitous. When I was a kid, I wanted to be an engineer. My dad was an engineer and I wanted to be an engineer. When I became an adolescent, I thought, "Well actually engineering isn't really that cool. You know, I want to be more romantic, sort of thing, more of a Renaissance man,"

Russ: Right.

Matthew: - things like this.

Russ: Right.

Matthew: Maybe do a university course where there are other girls were at it and things like this.

Russ: Absolutely.

Matthew: And then it was actually a lot later on in life when I realized that what I really got excited about was getting things to work and knowing how to get things to work. So I guess the first stage in ending up where I am is realizing that actually I did want to be an engineer and actually being an engineer was actually pretty cool -

Russ: Great.

Matthew: - when it comes down to it.

Russ: Great.

Matthew: I mean I did loads of things like I worked in publishing. I taught English as a foreign language in Mexico at one point. I did computer support. I then went back to university and got really interested in speech technology and it's very easy to be passionate about speech technology because it's so immediate and because speech is so concrete. Whenever we say anything, people have to understand it and whenever we want to produce words, we have to produce speech. It's almost part of our soul.

Russ: Absolutely. So is that what led you to the University of Edinburgh?

Matthew: That's right. It was then after finishing my Ph.D. here that I joined a start-up company and that process was very interesting for me. The start-up company itself, in the end, folded after about five years and was bought by another company.

Russ: And what was that start-up company?

Matthew: That was Rhetorical Systems.

Russ: Okay.

Matthew: There's nothing like being involved with something like that sort of thing. Well, you know, maybe I'd do a better job of it I had a go, right?

Russ: Right.

Matthew: I mean, you know, maybe not -

Russ: Right.

Matthew: - but let's give it a go.

Russ: Right.

Matthew: So to a certain extent that kind of was the impetus for me to thinking, "Well, okay, I know about speech synthesis. Let's try and set up a company and do it our way and see how we get on."

Russ: That's sort of the way that Cereproc got started?

Matthew: That's right, yeah.

Russ: That's cool and so how many team members are there in Cereproc today?

Matthew: We're quite a small company. We have about a half a dozen engineers and about half a dozen commercial people. Our commercial office is in London.

Russ: Okay.

Matthew: We keep the engineers up in Edinburgh -

Russ: Okay.

Matthew: - which is a beautiful city and -

Russ: Absolutely.

Matthew: - they love being up here.

Russ: Absolutely.

Matthew: I've got a couple of real key engineers that I've known for a long time that were absolutely fundamental for the success but also our CEO, Paul Wellham, has got a lot of experience in the commercial environment and certainly with organic companies, having that mix of experience is really important because you know, you've really got to produce something very, very quickly and get it to customers very, very quickly so you've got money again to actually produce more.

Russ: Cool.

Matthew: And we were able to do that.

Russ: Fantastic. So I would assume that everybody within the company right now is excited about all of this publicity that you're getting on the Roger Ebert project.

Matthew: We are very excited about it. Well, you know, the British tend to be a bit deadpan.

Russ: Right.

Matthew: So, I think we were in the office and one of my engineers, Graham, said, "Oh, I think we're gonna be on Oprah," and was going, "Oh, that's nice." You know.

[Laughter]

Matthew: And we're going, "Well I hope it all works because, you know, that's quite a big program, really," and so we kind of take it in our stride, but it's been a lot of fun, so far. Everyone's very excited. It's been very enjoyable, actually.

Russ: That's fantastic. Matthew, before I let you go, let's imagine that we have a young, aspiring inventor, innovator, engineer in the audience that's listening to your story right now and thinking, "My goodness, that's the coolest thing that I can possibly imagine." What sort of advice might you give to a young aspiring entrepreneur?

Matthew: I think it all varies in the way, or how you want to make your life. It's all different. For me, the key thing has been to really care about the technology that I'm producing and to, in a way, almost care about it almost more than the business part of it. You need to be part of a team and you need people to say, "Stop having fun with this stuff and to this because we've got customers to do it," but if you don't have the passion and interest in the technology to start with, then in the end, it's always going to be very hard to produce something that people are really impressed by.

Russ: Thank you for sharing that with us.

Matthew: Thank you very much.

Russ: You bet. That's Dr. Matthew Aylett, CTO and cofounder of Cereproc and you're listening to The BusinessMakers Show, heard here and online at theBusinessMakers.com.

Comments and Opinions

blog comments powered by Disqus