Jump to content
Video Files on Forum ×

The tech behind Casio new speech synthesis engine


vbdx66

Recommended Posts

Hi everyone,

 

Some geeky stuff here.

 

My good friend Paul J. Drongowski, who's a regular contributor to the PSR Tutorial forum dedicated to Yamaha keyboards, published on his blog an interesting article digging deeper into the tech behind Casio new speech synthesis engine found on the CT-S1000V.

 

You can read the article here:

 

http://sandsoftwaresound.net/casio-speech-synthesis-technology/

 

Enjoy 😊

 

Vinciane

  • Like 2
Link to comment
Share on other sites

Had a hunch all those new patents were there for a reason....interesting concept-what physical modeling by Yamaha was trying to do for acoustic musical instruments but never caught on too well. This is in a more easily accessible package. Definitely a different approach than anything else I've come across. I'm hesitant-waiting to see if something even more advanced-more of a pro board might come out of this designed more like the PX560 with weighted keys, and a large color graphic screen. AIX and voice modeling combined with hex layers, huge multi-track recording ability, a real professional workstation utilizing this new concept of sound production.

  • Like 4
Link to comment
Share on other sites

Ok! Finally! We have a thread about the new Casio VOCOPOCALYPTIS keyboard! Translated Vocals of the Apocalypse Keyboard. Lol! Well I knew the synth tech was opensource and computer based. I referred to the old 1990s software for MAC OS called Vocal Writer that would take midi and text and convert both into a vocal singing score using the built in Text to Speech synthesizer voices that come with Mac OS to this day.. IF there ever was a Windows version(there never was), you could have used ANY voice that was installed in Windows.. including Microsoft Sam lol! This reminded me of Vocaloid by Yamaha but not as clear and intelligible nor as natural sounding as Yamaha's Vocaloid. Heck, it is not even as clear or as realistic as the voices you hear on you tom tom or google or Alexa! FL Studio has a text to Speech plugin for making your midi sessions in FL Studio sing too. But call it what you want, it is grrrrrreat! A text to speech now turned text to midi/keys/samples artificial singing system in your keyboard.. It is what alot of us musicians that do not have a backing group wanted! Mind you, this is NOT, a vocoder or live mic input vocoding system. So not to be confused with the already common hardware and software vocoding systems out there. It is a speech synth in a music keyboard synth. Now I use LINUX as my only OS and know all about open source software being used in hardware:ROLAND, KORG,YAMAHA,et al. The fact that CASIO is using it and selling the hardware at a reasonable price point, is the major thing here! We love you Casio!! Always have, always will!!

  • Thanks 1
Link to comment
Share on other sites

Yeah-open source-what Google stole from the Android/Linux developers, and has worked hard to make millions (billions?) from trying to confiscate their open source code and make all of us have to pay him to use it. Like Bill Gates did when he paid peanuts for taking general dynamics (I think that's who he bought it from) GUI and calling it his. Am I angry-yes I am-that I have to pay this middleman first, before I can even think about donating to one of the opens-source developers directly instead-the people that made Android (Linux) possible. Oh well. by the way pianokeyjoe, my favorite DAW is still-and I have most software DAWs including Rosegarden, quite a few other Linux software programs which have come quite far-an old open-source program called "Jazzware". It looks curiously like the earliest versions of Cakewalk-was always free-and even after all these years-most days if i want a quick and dirty-or not so quick and dirty DAW-there it is. Not supported any more but there is a repository for the Linux and windows platform-if you want a copy, I can upload it. Not a command line program-it has a crude GUI, but most every function is there.

Link to comment
Share on other sites

5 hours ago, Jokeyman123 said:

Yeah-open source-what Google stole from the Android/Linux developers, and has worked hard to make millions (billions?) from trying to confiscate their open source code and make all of us have to pay him to use it. Like Bill Gates did when he paid peanuts for taking general dynamics (I think that's who he bought it from) GUI and calling it his.

This is the essence of the form of society that we have been cultivating for some thousand years and it is called success. 
Did Edison invented the lightbulb? So much like Bell the telephone. Were the Wrights really the first powered aviators like we get learned from smithonian? Probably.
Who cares, elbows are what counts in this society. 
And why? It is always rewarded. Heads up, the future is bright ;-)
Edited by Brockesound
typo
Link to comment
Share on other sites

@Jokeyman123! YEEEAAAHH!! Jazz++! That was the one DAW I used in SUSE LINUX 7.2 live demo DVD! I used it also in Windows 2000, but as I said in another thread about Alternate OSs, Win2K was not very compatible with alot of Music software-hardware combos and so I had to wait till XP which is the only version of Windows I will use for legacy stuff. Windows NT4 seems the only stable alternative from what I tried in VirtualPC2007. It had lots of fun in NT4 with Soundforge! Now Google is not the only culprit..KORG,ROLAND,YAMAHA,and others are using opensource software to make and sell keyboards for thousands of dollars and the opensource portions of the OS and apps, they do not release or share with the opensource community. I know cause I tried to find such bits and they have registration and password code locks on the software when you go to try and UPDATE your device. Sadly, it is the way of the world.. Casio is using this open tech and I applaud them. Hopefully, they will not lock it down to where we the people can not help in improving it, adding features,etc. But Casio I have a feeling, is open to ideas from the users, and customers, which is VERY unique in today's market! At least from my own limited experiences..

Link to comment
Share on other sites

On a side note.. get it? Note? lol! About the new CT-S1000V and 500V. I am sure I am not alone in this, but we the users want 5 pin midi din sockets for these 2 new models to be able to control them with 88 note keyboards and keyboard controllers, grand pianos with built in midi(Disklaviar,etc).. Or, can Casio release a midi din version with added features such as all the classic Casiotone sounds, rhythms and drum sounds, and of course, semi weighted or fully weighted piano keys, in a 73,76 and 88 note configuration? Trust and know, we the users have spoken throughout the years on this, and I know Casio is listening, hence.. Why I ask now.

Link to comment
Share on other sites

22 hours ago, Jokeyman123 said:

Had a hunch all those new patents were there for a reason....interesting concept-what physical modeling by Yamaha was trying to do for acoustic musical instruments but never caught on too well. This is in a more easily accessible package. Definitely a different approach than anything else I've come across. I'm hesitant-waiting to see if something even more advanced-more of a pro board might come out of this designed more like the PX560 with weighted keys, and a large color graphic screen. AIX and voice modeling combined with hex layers, huge multi-track recording ability, a real professional workstation utilizing this new concept of sound production.

Yipper  it was a long time ago hihihi I almost forgot

 

Tuzki Rocking Chair GIF - Tuzki Rocking Chair GIFs

Yep a long time ago. Now where I put  my pipe and Tabaco. 

Link to comment
Share on other sites

On 1/23/2022 at 2:34 AM, pianokeyjoe said:

Ok! Finally! We have a thread about the new Casio VOCOPOCALYPTIS keyboard! Translated Vocals of the Apocalypse Keyboard. Lol! Well I knew the synth tech was opensource and computer based. I referred to the old 1990s software for MAC OS called Vocal Writer that would take midi and text and convert both into a vocal singing score using the built in Text to Speech synthesizer voices that come with Mac OS to this day.. IF there ever was a Windows version(there never was), you could have used ANY voice that was installed in Windows.. including Microsoft Sam lol! This reminded me of Vocaloid by Yamaha but not as clear and intelligible nor as natural sounding as Yamaha's Vocaloid. Heck, it is not even as clear or as realistic as the voices you hear on you tom tom or google or Alexa!

 

For robot voices controlled by keys there is the payware softsynth Chipspeech, which contains plenty of remodelled versions of historical speech synthesis chips (listen to the youtube examples).

 

https://en.wikipedia.org/wiki/Chipspeech

 

Alexa is no fair comparison to any PC based text2speech, because its cloud AI likely dynamically accesses many GB (or TB?) of online text examples to pronounce things right. I find classic speech synths made from a few KB of memory much more exciting.

 

Did you know that already in 1980th Casio released a few speech synthesizer products? These included the talking clock calculator TA-1000 (female robot voice) and several talking alarm clocks, although AFAIK none of them got famous or were particularly successful. I experimented with the TA-1000 speech chip; by behaviour it is likely a crude kind of wavetable synthesis (data reduction by concatenating waveform samples) similar like that in early Sharp talking clocks and calculators but with different voice.

 

AFAIK a predecessor of Vocaloid did use synthesized speech before memory got cheap enough to use actual singer's phoneme samples.

Edited by CYBERYOGI =CO=Windler
  • Love 1
Link to comment
Share on other sites

@CYBERYOGI =CO=WindlerWHAAAAAA!?? I actually did NOT know of Casio early talking devices! I know I know, we have google now, how could I not know? Well, I did not. I knew only of TI and DEC and C64 and of renditions of talking computers, etc on Movies. I did know about the only talking toy of my youth, SPEAK N SPELL!! I did see and buy speech synth chips from Radioshack when I was a teen living in Puerto Rico, made by GI but I did not know electronics yet, and could only dream what that chip did and said.. I figured someday in the future I would learn electronic engineering and would be able to use the chips.. nope.. I lost them or broke them or fried them as a child trying to experiment lol! But Casio? I never even heard of the music calculator or VL1 unit until I was a late teen and had moved to the mainland USA. Now as for Alexa, I have actual text to speech VOICES from NeoSpeach, Acapella, IVONA, Loquendo, and Capstrel that sound very realistic like Alexa, and google and tom tom gps units. Hence my comparisons. The Casio voice engine is surely something similar or from one of the open source versions like Festival,mBrola, eSpeak, etc. but sounds much more realistic than those, so I am baffled, until I can buy one of these keyboards and really dissect this my self. About Vocaloid.. YES, the first time I heard of that product, it was 100% purely SYNTHESIZED voices that sounded real! That is why I wanted Vocaloid back then! But, I have to say that Casio may have a mixed tech of both Sampling and synthetic voice tech using the various opensource softwares pointed out in this thread. Sadly, I too like some users here, wanted a more user vocal input method of using this keyboard where my inputted voice could be changed to sound like some one ELSE entirely or something else.. Roland does this with ELASTIC AUDIO and VARIPHRASE technology. See ROLAND VP550/770 and V-SYNTH products as well as their early product the VP9000 rack mount. Essentially, these products use SAMPLES of vocals, and re synthesize them to sound like some one else and can take LIVE vocal input from a mic level or line level input, and change the incoming vocal signal to a realistic or robotic but highly intelligible  singing voice! Casio SEEMS to have done something close to this but with TEXT and MIDI to Speech and of course, with samples, instead of a live audio input signal.

Link to comment
Share on other sites

I have a huge collection of Speak-n-Spell samples but in a proprietary sample format only playable in my Alesis Fusion, and these cannot be reverse-engineered, nor would I since the programmer who created this set, Steve Howell who created Hollow Sun, has copyrights which belong to his wife as he passed away several years ago and I have huge respect for his work.

 

But pretty much fun to have a huge collection of speak-n-spell words mapped to each key, as well as some phrases which I can now mutate even further with formant and other filters internal to the Fusion. I also fondly remember the TS-12's vocaloid intro pronouncing "Ensoniq" which I was never able to determine how this was done-it was not a sample although the TS-12 could load and play samples-it was using the Ensoniq's internal sound engine and somehow was able to create this vocaloid pronunciation.

Link to comment
Share on other sites

Speak&Spell hardware has been fully deciphered and is emulated in MAME (so you can record the WAV output). Also various old synths and keyboards (including even a somewhat off-sounding Casio CTK-551) exist in MAME now, which AFAIK even supports midi-in (haven't tried).

 

If you want the real thing, there is even a midi kit for Speak&Spell:
https://hackaday.com/2012/02/09/midi-controlled-speak-and-spell

 

Sharp speech synthesis

 

AFAIK the Sharp speech engine has not been emulated yet.

 

I collect and have repaired many Sharp talking clocks, (rebranded) watches and calculators. The CT-660 exists in a German (CT-660G), an English (CT-660) and a Japanese language version. Strange is that on eBay the German version is much more common than the English one, while I never saw the Japanese model at all (see youtube example, front is labelled "ELSI QUARTZ" instead of "TALKING TIME"), so it may have been a prototype. I love the way the alarm starts with a 5-note jingle, then announces the time and then plays a longer squarewave melody ending with a trill (a bit like a ringtone). When waiting 5 minutes, it repeats the alarm and says "please hurry" or something like that (German version "Bitte beeilen!"). The German version rolls the "R" in a funny way and so e.g. pronounces 11 as "Errlf" instead of "Elf".

 

The Sharp CT-660 was the world first digital talking alarm clock and initially very expensive (200 US$ or such). It has a volume knob and some models have a little silicone rubber plug at the left case site. When poked out (be careful - the material rips easily apart) it reveals 2 pins wired parallel to the speech button (keyboard matrix), so it could be installed in contraptions to automatically announce time or act as a stop clock, or connect a bigger button for impaired people. (The yellow one on top is IMO too small for an alarm clock, and only switches to slumber instead of proper "alarm off", so it keeps repeating until using the slide switch in the lid at case bottom.)

 

The Sharp CT-661/665 has simpler functions (no stopwatch etc.), different melody (very shortened "Sah ein Kab ein Röslein stehn", less nicely made) and operation is optimized for the blind (i.e. different sounds guide through clock set modes instead of all those slide switches). The 1980th "Vox Clock 2" contains the same hardware without LCD and has a lovely male robot voice, which despite graininess is well understandable. It only has a little speech glitch that pronounces 12 as something like "thrown" or "throne" instead of "twelve" (like when the wavetable algorithm fails to say "two" and "one" at the same time). The 1990th "Vox Clock 2" has an additional LCD but uses a sample based chip with English female voice and only 4 beeps instead of melody.

 

The CT-660 hardware is quite complex, containing a clock CPU (32KHz) on the front PCB, a separate speech CPU (4.1MHz), a DAC (or sound?) chip and audio amplifier. Most of the PCB is occupied by a quite big discrete stepup converter to increase the 3V battery voltage (2x AA cells) to about 5.5V(?) for louder speech output.

 

I later bought a broken specimen that instead of the external DAC chip has a hybrid resistor ladder DAC with thinner sounding voice. (Only the LCD worked no speech and some buttons failed because traces were corroded by battery and someone drowned it in oil - yuck!)

The Sharp EL-640 talking calculator even has 2 speech CPUs (or an additional ROM?) because it speaks more words (for clock and calculator) and runs on 4 AA cells. The simpler EL-620 (no clock) is slimmer and so depends on 2 unusual thick button cells.

 

Sharp voice synthesizer things use a grainy but nice sounding kind of wavetable speech hardware. Like Speak&Spell, when shitshot by power glitch (battery wiggling) they make plenty of freakish noises. E.g. the chip can playback speech at half speed or interprete the same data either as speech or musical notes (squarewave with linear decay envelope varying with note length). Even the EL-620 does this despite it has no melody.

Talking watches with Sharp speech chips (e.g. by Trafalgar, Omni VoiceMaster, Micronta VoxWatch, MeisterAnker) use similar technology with smaller COB chips. Loose solder joints at SMD parts and broken PCB traces are typical issues Instead of lithium they unfortunately use strange thick alkaline button cells those are prone to leak forgotten inside.

 

I wrote more about the Sharp speech hardware here:
https://forums.bannister.org/ubbthreads.php?ubb=showflat&Number=120188#Post120188

 

https://forums.bannister.org/ubbthreads.php?ubb=showflat&Number=120195#Post120195

 

Despite many talking clock brands, only few speech engines exist. In 1980th beside Sharp only Seiko made their own speech synth chips. Seiko (WristTalk A964, A965, A966) watches use grainy lofi male samples and have a big COB containing 3 silicon dies, which has user selectable English and one other language. The Seiko world time clocks (World Time Voice Alarm DA716K) speaks English and seems to use grainy female wavetable voice. The pyramid alarm clocks (PyramidTalk) use a very different chip for each language and have female voice. English and Japan version have calendar with date and day display (LCD layouts differ). The German issue has none and the user interface differs (hold up/down buttons to set time). 1990th Seiko watches have a higher resolution female sample voice.

 

Most other 1990th talking watches and alarm clocks use Holtek COB chips, those are sample based and can be recognized by the rooster (cockadoodledoo) alarm (often also cuckoo and some others) and typically female sample voice.


Casio speech synth hardware

 

Also Casio made very few talking alarm clocks and calculators using own SMD speech chips (grainy wavetable like Sharp, but female voice). I own the Casio SQ-200 (cube shaped LCD clock) and of course a TA-1000.

 

I experimented a lot with the Casio TA-1000 speech CPU "Hitachi HD61912 C02, 3M13" (60 pin SMD) which communicates with a main CPU that is likely a "NEC D1864G" variant (64 pin SMD) like in Casio ML-81, ML-90 and such. I desoldered various pins and examined the behaviour. I also found test pins in Casio calculators and VL-Tone, those display strange counting numbers on LCD and output data (rom contents?) on keyboard matrix pins.

 

What I mean with that Alexa is no fair comparison with offline speech synths is because these speech assistant cloud apps render everything online in datacenters fed with huge quantities of bigdata. So it would not surprise me if their neural network continuously compares spoken and written versions of the same texts (e.g. official TV news) from the internet to optimize pronunciation.

Edited by CYBERYOGI =CO=Windler
  • Like 4
Link to comment
Share on other sites

My first exposure to speech synthesis was a product called S.A.M. "Software Automated Mouth", which I ran on an Apple ][+. (Boy, am I dating myself now!). It consisted of a DAC board that went into one of the slots and, of course, a program for translating text into speech.

 

I don't have to provide an audio example here of what it sounded like because you already know. It sounded exactly like Stephen Hawking's voice and I mean EXACTLY. It was very likely based on the same voice-synthesis technology. 

 

Many years later I bought an Amiga 500 (A500). One of the things that came built in to it (and into all Amiga computers since the original A1000 AFAIK) was a text-to-speech synthesizer. It was again EXACTLY the same sound as S.A.M. IIRC, I had later read that it was licensed from the same developers. 

 

Of course, S.A.M. was not very good at singing. The new Casio vocal synthesis seems to be pretty good at it, in a charming robotic kind of way.

Link to comment
Share on other sites

Stephen Hawking's speech software was much more complex (and driving a hardware speech chip). It was Hawking himself who programmed the parameter set to customize his remade voice. He was fairly envy about its voice and eventually did not want to replace it with more natural sounding newer speech synths because he considered it part of his cyborg personality. (Nowadays liarbird technology would have been used for such things.)

 

https://www.scienceabc.com/innovation/stephen-hawking-cheek-communication-help-computer-speech-generating-device.html

https://www.wired.com/2015/01/intel-gave-stephen-hawking-voice/

 

This one may be fake:

https://lingojam.com/StephenHawkingVoiceGenerator

 

SAM can not even do pitchbend for pronunciation and sounds very husky. (I know the C64 version, which needs no additional hardware and also runs on emulators. AFAIK "Tales of Arabian Nights" uses it - listen to youtube examples.) I grew up with Amiga (owning an A500 modded into a huge steampunk-like wooden desktop case). The Amiga speech is pure software (Narrator library) and can control pitch and tempo per syllable for pronunciation. AFAIK source code of both have been found. Amiga speech can sing, but tended to make pauses in unwanted spots because the 68000 CPU was too slow to render a continuous wave output fast enough.

 

The commercial speech synth which timbre IMO came closest to Hawking's Voice is my Hexaglot EG-6000 (a talking organizer with optional translation cartridges). I also own a Hexaglot Square One with similar voice (only through headphone jack, LCD is messed up by broken foil cable). Also these have a general pitch and tempo control.

Edited by CYBERYOGI =CO=Windler
Link to comment
Share on other sites

I shouldn't have said "exactly" and certainly shouldn't have emphasized the word. Listening to some online videos just now, I can hear that S.A.M. on the Apple ][ sounds glitchier than I recalled. The Amiga's voice synthesis was considerably cleaner, maybe due at least in part to better dac's. The same technology was used in that famous first press demo of the original Macintosh computer by Steve Jobs in which it introduced its "creator."

 

But there is definitely a strong family resemblance between all of those voices, including Hawking's specific voice, since they all came out of Klatt's speech-synthesis work based (fundamentally) on formant production. I am certainly not the first one to make that observation, even if as it turns out I did exaggerate a tad. :)

Link to comment
Share on other sites

I always thought Hawking's voice was from the Mac OS voices. At least I hear that same voice on Mac OS when you enable text to speech talker for the note pad and other text software. And it comes in several voices too. It is fun to learn new things about the things we love! Thats for sure! Of course, I could also be thinking of the Simpsons episode and Family Guy episodes with Stephen Hawking character in them. Indeed they DID use the Mac OS voices for that..

 

Update! I just saw some videos of Hawking and his speaking machine! Its an INTEL PC with Windows 7! SwiftSpeak? is the app used. The more I seek this stuff out, thanks in part to you lovely bunch of users, I think I may be buying a new Casio soon!

Edited by pianokeyjoe
New info!
Link to comment
Share on other sites

12 hours ago, pjd said:

After doing some reading this weekend, I posted some additional information on Casio's current technology...

The resynthesis approach resembles CELP but is likely more complex (neural nets reminds to Hartmann Neuron synth).

 

Did you know that Casio already made 2 singing keyboards decades ago? Namely there was the Casio SK-60 (toy sampling keyboard with lofi doo-whop samples etc.) and there was the CT-840 which can sing musical notes (Do,Re,Me...) in a female lofi sampled voice.

  • Like 1
Link to comment
Share on other sites

1 hour ago, CYBERYOGI =CO=Windler said:

Did you know that Casio already made 2 singing keyboards decades ago? Namely there was the Casio SK-60 (toy sampling keyboard with lofi doo-whop samples etc.) and there was the CT-840 which can sing musical notes (Do,Re,Me...) in a female lofi sampled voice.

The later was probably used by Phil Glass for the production of « Einstein on the Beach » 🙃

Link to comment
Share on other sites

14 hours ago, pjd said:

After doing some reading this weekend, I posted some additional information on Casio's current technology

Hi @pjd this looks fascinating but way beyond my present capabilities. I’ll go back to your article when I’ll have enough time and my brain is a little fresher.

Another, less challenging option to pass time during commercials, is washing the dishes. Unless  you’ve ordered a pizza or Chinese food of course 😋

Link to comment
Share on other sites

The CT840 is one of those later, not so realistic, more lo fi PCM tone bank SA/CA series keyboards. I tended to bypass buying this class of Casio keyboard as the sounds to me were not so good for the price(unless the price was fleamarket prices),but mostly because I could not circuit bend them like the older 80s Casios both synthetic and PCM based.. I am now regretting that lol! This thing looks funky, but actually seems to be one of the higher end versions of the CA100/SA21 style tonebank keyboards. I heard some demos and I just heard the Casio SK1 "piano" sound and other familiar OLD Casio favorites! Time to go Casio hunting..

  • Like 1
Link to comment
Share on other sites

Hi --

Yep, Casio has made some interesting keyboards over the years. I do miss hands-on circuit bending. 🙂

 

WRT re-synthesis, I'm still diving for details on the CT1000V's "vocalization model unit". It sure does seem similar to CELP. The deep neural net stuff is their way of encoding the phoneme-to-synthesis parameter relationships, instead of using a large database to store the mappings. That tech feeds re-synthesis.

 

The Nagoya Institute of Technology Sinsy folks work neural nets into the synthesis stage, too: PeriodNet. I don't know if Casio skipped the PeriodNet technology in favor of something simpler or something already in hand.

 

(Hi) Vinciane. True, it doesn't take much to get into the technical weeds. 🙂 I added a few more words to my post, but it may not help much. I'm hoping to find interesting projects and applications... Reading and writing cuts down on playing time, unfortunately.

 

All the best to everyone -- pj
 

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.