================== DOUBLETALK PC/LT USER'S MANUAL ================== Copyright (C) 1991-1995 RC Systems, Inc. All rights reserved. (Modified by Joe Weber 4/20/96) RC Systems, Inc. 1609 England Ave. Everett, WA 98203 Ph: (206) 355-3800 Fax: (206) 355-1098 ( You can modify the speech characteristics of the Nomad Robot by sending the speech processor commands. Commands are preceeded by a cntrl A (written ^A in this manual). This tells the processor not to speak the following characters, but to interpret them as commands. For Example, sending "^AV2 Hello. ^AV9 Hello." will make the robot say hello twice, the first time softly, the second loudly. Most commands are of the form: #A - a number and a letter. I think ^A is ascii 01, but don't quote me on that. ) Command Descriptions The following is a description of each of the software commands supported by DoubleTalk. nO (Voice) DoubleTalk's text-to-speech synthesizer has five standard voices that you can choose from, as well as a number of individual voice controls (described below) that can be used to independently vary the voice characteristics to your liking. The Voice command enables you to change voices at any time, such as for identifying text attributes (italics, bold, underline, etc.). The individual voices are selected with the commands 0O through 4O, as shown in Table 1. It is important to keep in mind that some programs, such as screen readers for the blind, often send voice control commands (such as pitch) to DoubleTalk. This can change the way a voice sounds, or even completely negate any voice change you might make. If this happens with your screen reader, try adjusting its pitch command to restore the desired voice. Check your screen reader documentation for details. n Voice Name =========================== 0 Perfect Paul (default) 1 Vader 2 Big Bob 3 Precise Pete 4 Scary Larry Table 1. Voice Selections nA (Articulation) This command adjusts the synthesizer's articulation level, from 0A through 9A. Excessively low articulation values tend to make the speech sound slurred; very high values, on the other hand, can make the speech sound choppy. The default articulation is 5A. nE (Intonation) Intonation is the variation of pitch within a sentence or phrase. When intonation is enabled, the synthesizer attempts to mimic the pitch patterns of human speech. For example, when a sentence ends with a RC SYSTEMS - 16 - DOUBLETALK PC/LT period, the pitch drops at the end of the sentence. If a sentence ends with a question mark and the sentence does not begin with "wh" (who, what, where, etc.), the pitch rises - otherwise it falls, like a period. The optional parameter n determines the degree of intonation. 0E provides no intonation (monotone), whereas 9E is very animated sounding. 5E is the default setting. If the parameter is omitted, the current value will be used. This is useful for re-enabling intonation after a Monotone command. This command works in the Text, Character, and Phoneme operating modes. M (Monotone) This command causes the synthesizer to speak in a monotone voice. Intonation should be disabled whenever manual intonation is applied using the Pitch command. Note that this command is equivalent to the 0E command. nF (Formant Frequency) This command adjusts the synthesizer's overall frequency response (vocal tract formant frequencies), over the range 0F through 9F. By varying the frequency, speech quality can be fine-tuned or voice type changed. The default frequency is 5F. This command has no effect on the DoubleTalk LT. nS (Speed) The synthesizer's overall rate (speed) of speech can be adjusted with this command, from 0S (slowest) through 9S (fastest). The default speed is 5S. nP (Pitch) This command varies the synthesizer's pitch over a wide range, which can be used to change the average pitch during speech, produce manual intonation, or create sound effects. Pitch values can range from 0P through 99P; the default is 50P. nV (Volume) This command controls the synthesizer's volume level, from 0V through 9V. 0V yields the lowest possible volume; maximum volume is attained at 9V. The default volume is 5V. The Volume command can be used to set a new listening level or create emphasis in speech. PCM mode and the DTMF generator are also affected by this command. RC SYSTEMS - 17 - DOUBLETALK PC/LT nX (Tone) The synthesizer supports three tone settings, bass (0X), normal (1X), and treble (2X), which work much like the bass and treble controls on a stereo. The best setting to use depends on the speaker being used and personal preference. Normal (1X) is the default setting. nR (Reverb) This command is used to add reverberation to the voice. 0R (the default) introduces no reverb; increasing values of n correspondingly increase the reverb delay and effect. 9R is the maximum setting. nB (Punctuation Level) Depending on the application, it may be desirable to limit the reading of certain punctuation. For example, if the synthesizer is used to proofread documents, the application may call for only unusual punctuation to be read. On the other hand, an application which orally echoes keyboard entries on a computer for a blind user may require that all punctuation be spoken. n Punctuation Level ================================= 0 All 1 Most (all but CR, LF, Space) 2 Some ($%&#@=+*^<>|\) 3 None Table 2. Base Punctuation Levels DoubleTalk supports four basic levels of punctuation, as shown in Table 2. Besides determining which punctuation characters will be spoken and which will not, the Punctuation Level command also determines how number strings will be read. The values of n listed in the table cause number strings to be read a digit at a time (e.g., 0123 = "zero one two three"). Adding 4 to these values (n = 4-7) causes number strings to be read as numbers (0123 = "one hundred twenty three"). N = 6 and 7 also cause currency strings to be read as they are normally spoken - for example, $11.95 is read as "eleven dollars and ninety five cents." Finally, if 8 is added to n (n = 12-15), leading zeros will not be suppressed (0123 = "zero one hundred twenty three"). The default punctuation level is 6B (Some punctuation, Numbers mode, leading zero suppression on). nY (Timeout Delay) The Text and Phoneme modes of the synthesizer defer translating the contents of the input buffer until a CR or Null is received. This RC SYSTEMS - 18 - DOUBLETALK PC/LT ensures that text is spoken smoothly from word to word, and that the proper intonation is given to the beginnings and endings of sentences. If text is sent to the synthesizer without a CR or Null, it will remain untranslated in the input buffer indefinitely. If it is expected that this condition may occur, use the Timeout command. DoubleTalk contains a programmable timer which can force the TTS synthesizer to translate the buffer contents after a predetermined time interval. The timer is enabled only if the Timeout parameter n is non-zero, the synthesizer is not active (not talking), and the input buffer contains no CR or Null characters. Any characters sent to DoubleTalk before timeout will automatically restart the timer. The Timeout parameter n specifies the number of 200 millisecond (0.2 sec) periods in the delay time, which can range from 200 milliseconds to 3 seconds (Table 3). The default value is zero, which disables the timer. n Delay ================================== 0 Indefinite (wait for CR/Null) 1 200 milliseconds 2 400 milliseconds . . . . 15 3000 milliseconds (3 sec.) Table 3. Timeout Delays L (Load Exceptions) This command purges DoubleTalk's exception dictionary and stores subsequent output from the host in the exception dictionary RAM. Since the memory used by the exception dictionary is the same physical RAM used by the input buffer, the space available for the input buffer is decreased proportionally by the size of the dictionary. The dictionary can be purged from DoubleTalk with the Reinitialize command, or by loading a "null" dictionary file into DoubleTalk. Both methods reallocate the memory space occupied by the dictionary to the input buffer. Exception files must be compiled into the internal format used by DoubleTalk before they can be used. A compiler program is included on the Developer's Tools disk for performing this task. The topic of writing exception dictionaries is somewhat complex for the average (sane) user, and is therefore left to the Developer's Tools. U (Enable Exceptions) The exception dictionary is enabled with this command. If the synthesizer is not in the Text or Character modes, or if the exception RC SYSTEMS - 19 - DOUBLETALK PC/LT dictionary is empty, the command will have no effect. The exception dictionary can be disabled by issuing one of the mode commands D, nT, or nC. @ (Reinitialize) This command clears the input buffer (see Clear command) and restores all of the speech parameters to their default settings. The exception dictionary memory is also cleared and reallocated to the input buffer. Z (Zap Commands) This command prevents DoubleTalk from honoring subsequent commands, enabling it to read commands as they are issued (this can be useful for debugging some types of programs). Any pending commands in the input buffer will still be honored. The only way to restore command recognition after the Zap command has been issued is to send Control-^ (1Eh) or perform a hardware reset. n* (DTMF Generator) The DTMF (touch-tone) generator supports the 16 standard tone pairs commonly used in telephone systems. Each tone pair generated by DoubleTalk is 100 ms in duration, more than satisfying the telephone signaling requirements. There is also a 17th, "silent" tone, which can be used for generating the inter-digit delay in phone number strings. The mapping of the command parameter n to the buttons on a telephone is shown in Table 4. The DTMF generator's output level can be adjusted with the TTS synthesizer's Volume (nV) command. DTMF commands are buffered, and can be intermixed with text for the TTS synthesizer without restriction. n Button ================ 0 0 . . . . 9 9 10 * 11 # 12 A 13 B 14 C 15 D 16 pause Table 4. DTMF Generator RC SYSTEMS - 20 - DOUBLETALK PC/LT J/nJ (Tone Generators) DoubleTalk's tone generators are activated with these commands. The tone generators are treated separately in the Developer's Tools. nT (Text Mode/Delay) This command places DoubleTalk in the Text operating mode. The optional delay parameter n can be used to create a variable pause between words. The shortest, and default delay of 0, is used for normal speech. For users not accustomed to synthetic speech, the synthesizer's intelligibility may be improved by using a longer delay. The longest delay that can be specified is 15. If the delay parameter is omitted, the current value will be used. This feature is useful for returning from another operating mode or disabling the exception dictionary (see Enable Exceptions command). nC (Character Mode/Delay) This command puts DoubleTalk in the Character operating mode. The optional delay parameter n specifies how long the synthesizer will pause between characters. Values between 0 (the default) and 15 provide pauses from shortest to longest, respectively. Values between 16 and 31 provide the same range of pauses, but control characters will not be spoken. If the delay parameter is omitted, the current value will be used. D (Phoneme Mode) This command disables the text-to-speech translator, allowing the synthesizer's phonemes to be accessed directly. Table 5 lists the phonemes that can be produced by DoubleTalk. When concatenating two or more phonemes, each phoneme must be delimited by a space. For example, the word "computer" would be represented phonetically as K AX M P YY UW DX ER. Phoneme attribute tokens Table 6 lists the voice attribute tokens that can be used in the Phoneme mode, in addition to the standard DoubleTalk commands. These tokens do not require the command character or any parameters. As indicated in the table, the / and \ tokens temporarily increase and decrease the pitch by m steps. Besides being temporary, the difference between the pitch tokens and the +mP and -mP command equivalents is that the effective pitch range is extended beyond the normal 0-99 range by approximately ñ20 steps, and if the pitch should fall out of range, it will simply bottom or top out, instead of wrap around. All other phoneme attribute token commands remain in effect until explicitly changed. RC SYSTEMS - 21 - DOUBLETALK PC/LT Phoneme Example Phoneme Example Symbol Word Symbol Word ====================================================== A das (Spanish) M mug AA father N new AE bat NX rung AH cut NY ni¤o (Spanish) AO lawn O no (Spanish) AW cow OW tone AX about OY boy AY kite P past B bird PX spot CH cheese R ring D dare RR tres (Spanish) DH either S some DX computer SH dish E ser (Spanish) T tip EH set TH thick EI mesa (Spanish) TX mistake ER were U uno (Spanish) EW acteur (French) UH pull EY bake UW tool F fact V give G give W went H hire WH when I libro (Spanish) Y mayo (Spanish) IH sit YY you IX relative Z zero IY meet ZH leisure JH jet space short pause * K cute , medium pause KX ski . long pause L long Table 5. Synthesizer Phonemes * Normally used between words; duration determined by nT command Symbol Function ================================== nn Set pitch to 'nn' (0-99) / Increase pitch m steps * \ Decrease pitch m steps * + Increase speed 1 step - Decrease speed 1 step > Increase volume 1 step < Decrease volume 1 step Table 6. Phoneme Attribute Tokens * Step size determined by nE command; m = 2n Applications of Phoneme mode Phoneme mode is useful for creating customized speech, when the normal text-to-speech (Text) mode is inappropriate for producing the desired RC SYSTEMS - 22 - DOUBLETALK PC/LT voice effect. For example, Phoneme mode should be used when it is important that the correct stress or emphasis be placed on specific words in a phrase. This is because Phoneme mode allows voice attributes to be modified on phoneme boundaries within each word, whereas Text mode allows changes only at word boundaries. This is illustrated in the following program examples. 100 A$ = CHR$(1) 105 LPRINT A$;"D";A$;"M" 110 LPRINT "70H AW -/D>/EH R +<\\YY UW S P\IY K T UW \M IY DH AE T -\W EY .+/" Note in line 105 that intonation is disabled, since the pitch variations due to the internal intonation algorithms would otherwise interfere with the pitch tokens. Compare this with the same phrase produced in Text mode with intonation enabled: 100 A$ = CHR$(1) 105 LPRINT A$;"T";A$;"E" 110 LPRINT "How dare you speak to me that way!" Phoneme mode is also useful in applications which provide their own text-to-phoneme translation, such as the front end of a text-to-speech system. nQ (Sleep Mode) This command places the DoubleTalk LT in a nearly powered-down state, in order to help conserve battery power (the command has no effect on the DoubleTalk PC). If you tend to forget to turn off your DoubleTalk at the end of the day or during lunch breaks, for example, the Sleep mode timer can be used to turn it off automatically. (DoubleTalk doesn't actually turn completely off - it enters a low-power state which consumes about one-tenth the power it would otherwise.) An audible reminder tone can even be programmed to sound every ten minutes, to remind you that you have left DoubleTalk on. The sleep timer is reset any time DoubleTalk is accessed from your computer (such as when reading). In this way, DoubleTalk will not shut itself off during normal use, as long as the programmed timer interval is longer than the maximum time DoubleTalk is inactive. The sleep timer is also disabled when DoubleTalk is running from the AC adapter, i.e., the timer runs only when operating from DoubleTalk's internal battery. Once DoubleTalk has entered Sleep mode, it can be woken only by turning the power off and back on. The serial port control signal DTR is forced to its "not ready" state when DoubleTalk is asleep, preventing application programs from sending DoubleTalk any more data. Just before going to sleep, DoubleTalk emits the ASCII character "S" from the serial port, which the host computer can use to detect the sleep state. The command parameter n determines when Sleep mode will be entered. You can place DoubleTalk in Sleep mode immediately, program the sleep timer to any of 15 ten-minute intervals (10 to 150 minutes), or disable Sleep mode altogether. Table 7 summarizes the Sleep mode command. RC SYSTEMS - 23 - DOUBLETALK PC/LT n Delay ============================== 0 Sleep timer disabled 1 10 min 2 20 min . . . . 15 150 min 16 0 (immediate) 17 10 min w/reminder 18 20 min w/reminder . . . . 31 150 min w/reminder Table 7. Sleep Mode Timer Note that the delay interval is simply n x 10 minutes for 0 < n < 16. Adding 16 to n (16 < n < 32) yields the same interval range, but also enables the reminder tone, which sounds at the end of each ten minute interval. Programming n = 0 disables the Sleep mode; setting n = 16 forces DoubleTalk to go to sleep as soon as it has stopped speaking (even when running from the AC adapter). Delay 22 (60 minutes with the reminder tone) is the default setting. #/n# (PCM Mode) These commands activate DoubleTalk's PCM modes. This is an advanced topic discussed in the Developer's Tools. nG (Protocol Options) This command controls various protocol options in DoubleTalk. Refer to the Developer's Tools for more information. nI (Index Marker) Index markers are non-speaking "bookmarks" that a program can use to keep track of exactly where DoubleTalk is speaking within a passage of text. Since this is a command only programmers would use, it is covered in the Developer's Tools. ? (Interrogate) This command enables a program to read DoubleTalk's current settings. Yep, you guessed it - you're gonna need the Developer's Tools to learn more about this one, too. RC SYSTEMS - 24 - DOUBLETALK PC/LT Control-X (Clear) The Clear command stops the synthesizer and clears the input buffer of all text and commands. None of the synthesizer settings are affected, but any untranslated commands will be ignored. Note that the format of this command is unique in that the command character (Control-A) is not used with it. The Control-X (18h) character is written directly to DoubleTalk's I/O port, which enables DoubleTalk to react immediately, even if its input buffer is full. To be most effective, the states of DoubleTalk's handshaking signals should be ignored when writing the Control-X character. Command Summary Table 8 is a summary of the commands supported by DoubleTalk's text-to- speech synthesizer. Command Function Range Default ============================================================ nA Articulation 0-9 5 nB Punctuation level 0-15 6 nC Character mode/delay 0-31 0 D Phoneme mode - - nE Intonation 0-9 5 nF Formant frequency (PC only) 0-9 5 nG Protocol options 0-31 2 nI Index marker 0-99 - J Tone generators - - nJ Precision tone generators 0-99 - L Load exception dictionary - - M Monotone - - nO Voice 0-4 0 nP Pitch 0-99 50 nQ Sleep mode (LT only) 0-31 6 nR Reverb 0-9 0 nS Speed 0-9 5 nT Text mode/delay 0-15 0 U Enable exception dictionary - - nV Volume 0-9 5 nX Tone 0-2 1 nY Timeout delay 0-15 0 Z Zap commands - - @ Reinitialize - - ? Interrogate - - n* DTMF generator 0-16 - # Non-buffered PCM mode - - n# Buffered PCM mode 0-99 - Table 8. Command Summary