WP8 - Text-to-Speech (TTS)讓應用程式讀出內容

Windows Phone 8 – Text-to-Speech (TTS)讓應用程式讀出內容

Speech features

討論過Voice commands與Speech recognition後,接下來該篇要討論的即是Text-to-Speech。相較於上述二種本篇的內容比較容易一些。

主要即是在應用程式中指定Speech System閱讀指定的文字。搭配Windows.Phone.Speech.Synthesis API建立synthesized speech(合成語音),

或稱text-to-speech (TTS),運用於應用程式之中做為提示用戶輸入、閱讀消息的內容、目前搜尋的結果…等。

     

接著往下說明要操作的方法與重要類別:

 

(1) 準備必要的capabilites

      要開發支持Text-to-Speech的應用程式,需要在manifest中加入:ID_CAP_SPEECH_RECOGNITION

 

 

(2) 基本的TTS Sample

      最簡單且快速建立TTS的方法,是使用SpeechSynthesizer.SpeakTextAsync()並指定一個純文字的字串給它。

閱讀時的語系會根據「設定/語音/語音功能的語言」而讀出對應的發音內容,例如:中文就會一開始讀英文,在15字段時讀出

中文的「十五」。

private async void ButtonSimpleTTS_Click(object sender, RoutedEventArgs e)
{
  SpeechSynthesizer synth = new SpeechSynthesizer();

 
  await synth.SpeakTextAsync("You have a meeting with Peter in 15 minutes.");
}

       通常情況下,使用await修飾符搭配SpeakTextAsync()方法,非同步執行內容的朗讀。由於使用SpeakTextAsync()需要呼叫系統的

Speech System所以採用asynchronous機制讓應用程式可以繼續處理其他任務,

 

 

(3) 選擇要朗讀的Voice

       WP 8系統包括多個國家的語音,每一個語音(voice generates synthesized speech )搭配一個語系,依「設定/系統/語言+地區」有所不同。

在程式裡該怎麼指定voice的語言呢

‧透過建立Windows.Phone.Speech.Synthesis.SpeechSynthesizer後,則可指定要加載語音的語言

‧建立好的SpeechSynthesizer物件可以指定載入手機中有安裝的任何語音,使用於生成講話。

‧如果沒有語言被指定,API將自動以「設定/語音(Settings/Speech)」中的語言做為預設載入的來源

 

如何在程式找到需要的Voice

‧使用Windows.Phone.Speech.Synthesis.VoiceInformation物件集合與它的Language屬性,搭配LINQ搜尋需要語系以取得語音集合;

    ->需注意這邊找到的是設備已經安裝的語音;如果沒有需要提示用戶進行安裝;

‧設定SpeechSynthesizer的SpeechSynthesizer.SetVoice(VoiceInformation)方法來指定要載入的語音;

    ->在指定VoiceInformation時,需注意透過LINQ搜尋回來的結果

        (1) only return femle或only return male;

        (2) return femle and male;

        為何會有這種情況,主要是因為安裝語音時會各語言有二種聲音(男/女)或者只有一種,所以需要指定要用的是female或male

        的來發音,所以會指定Index;

 

透過下列範例來說明:擷取<Text-to-speech (TTS) for Windows Phone>範例:

// Declare the SpeechSynthesizer object at the class level.
SpeechSynthesizer synth;

 
// Handle the button click event.
private async void SpeakFrench_Click_1(object sender, RoutedEventArgs e)
{
  // Initialize the SpeechSynthesizer object.
  synth = new SpeechSynthesizer();

 
  // Query for a voice that speaks French.
  IEnumerable<VoiceInformation> frenchVoices = from voice in InstalledVoices.All
                     where voice.Language == "fr-FR"
                     select voice;

 
  // Set the voice as identified by the query.
  synth.SetVoice(frenchVoices.ElementAt(0));

 
  // Count in French.
  await synth.SpeakTextAsync("un, deux, trois, quatre");
} 

        主要new了一個SpeechSynthesizer物件後,搭配InstalledVoiced.All取得目前設備中安裝的語音來進行LINQ的搜尋,找到後再指定至物件中。

另外,更可以使用Speech Synthesis Markup Language (SSML)來指定需要語系的語音

可參考<Speech Synthesis Markup Language Reference>。

       

 

上述應能為大家建立基本實作TTS的概念,接下來針對主要的類別元件與方法加以說明:

 

Windows.Phone.Speech.Synthesis

    該namespaces定義了包括啟動、設定speech synthesis engine的類別,以創建成語音提示(prompts)、回應事件或是為了修改語音的特性。

SpeechSynthesizer負責speech synthesis engine連結與功能,更可以搭配指定特定的語系語音來朗讀與呈現;

PromptBuilder類別提供appens speech synthesis engine的內容,透過從文字、SSML標記或錄好的語音檔;

還有很多相關類別,往下針對WP8中會用到的類別來說明:

 

(a) SpeechSynthesizer

      主要負責text-to-speech (TTS)語音工作的類別。重要的Event與Method如下:

Type Name Description
Event BookmarkReadched An event that fires when a <mark> element is reached in a Speech Synthesis Markup Language (SSML) file.
Event SpeechStarted An event that fires when the synthesized voice begins output.
Method CancelAll Cancels all asynchronous text-to-speech calls that are in the active queue.
Method Close Performs application-defined tasks associated with freeing, releasing, or resetting allocated resources.
Method SetVoice Sets the synthesized voice.
Method GetVoice Gets the active synthesized voice.
Method SpeakSsmlAsync(String) Asynchronously speaks a string of text with Speech Synthesis Markup Language (SSML) markup with a text-to-speech voice.
Method SpeakSsmlFromUriAsync(Uri) Asynchronously speaks the content of a standalone Speech Synthesis Markup Language (SSML) document with a text-to-speech voice.
Method SpeakTextAsync(String) Asynchronously speaks the content of a plain-text string.

     synthesis API有提供上述三種Speak方法來啟動語言輸出,分別支持朗讀純本文、具有SSML標籤內容或載入完整的SSML文件;

 

 

(b) VoiceInformation

      定義一個text-to-speech voice的資訊。重要的屬性如下:

Property Access-Type Description
Description Read-only Gets the description of a text-to-speech (TTS) voice.
DisplayName Read-only Gets the display name of the text-to-speech (TTS) voice.
Gender Read-only Gets the gender of the text-to-speech (TTS) voice.
Id Read-only Gets the identifier of the text-to-speech (TTS) voice.
Language Read-only Gets the language of the text-to-speech (TTS) voice.

     上述的範例程式透過Language屬性來識別要搜尋的語系;

 

 

(c) InstalledVoices

      提供連結在設備中「設定/語音」已安裝的synthesis voices。

Property Access-Type Description
All Read-only Gets the full set of synthesized voices that are available to use as part of the Speech feature.
Default Read-only Gets the default synthesized voice.

 

 

 

Speech Synthesis Markup Language (SSML)

    SSML是XML-based的標準格式語言被設計用於speech synthesis應用程式。在W3C's voice browser working group也有推薦該定義語言。

它允許開發人員控件多種synthesis speech的特性,例如:語音、語言、發音…等。然而MS實作SSML版本是基於World Wide Web Consortium

所定義的1.0版本(Speech Synthesis Markup Language (SSML) Version 1.0.)。

    然而在SpeechSyntheiszer類別提供二個使用SSML朗讀文字的方法,分別為:SpeakSsmlAsync(String)SpeakSsmlFromUriAsync(Uri)

前者接收類似參數型的文字(簡單用SSML定義要朗讀的內容),可比較方便在程式裡立即切換要發音的語系;

後者則以完整的SSML文件定義來加以朗讀,可透過完整定義各種發音內容與語系;

 

往下參考<Using SSML for advanced text-to-speech on Windows Phone 8>來說明SSML的結構:

(1) SSML文件或文字必定由<speak />標籤給包裝起來

      <speak />是在文件中是root element,也可以直接使用不包裝其他element的組合。例如:

<speak version="1.0" 
       xmlns="http://www.w3.org/2001/10/synthesis" 
       xml:lang="string"> </speak>

      內有三個屬性,其中以xml:lang為最為重要,透過字面可看出它即是定義該speak要使用何種語系來發音

      搭配SpeakSsmlAsync(String)最簡單的範例如下:

private async void SpeachBySsmlString() 
{
    synth = new SpeechSynthesizer();
    // 定義一個簡單的<speak />,指定發音語系為en-US;
    string ssmlText = "<speak version=\"1.0\" ";
    ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
    ssmlText += " Testing Windows Phone 8 TTS";
    ssmlText += "</speak>";
    await synth.SpeakSsmlAsync(ssmlText);
}

 

 

(2) 加入指定的Sound Files

       除了上述直接定義<speak />搭配文字內容外,還可以指定<audio />於要發音的文字段中,舉例來說:

有一段「this is a book.」我想要把「book」用上自己的音檔,則可以寫成

「this is a <audio src="ms-appx:///Assets/book.wav">book</audio>」。

然而,並非什麼音檔格式均可以搭配<audio />,音檔格式需要符合

       ‧support file in PCM, a-law and u-law format;

       ‧8 bits or 16 bits depth;

       ‧non-stereo (mono only);

private async void SpeakByStringInAudio()
{
    ssmlText = "<speak version=\"1.0\" ";
    ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
    ssmlText += "Here comes the dog, ";
    // 指定要播放音檔
    ssmlText += "<audio src=\"ms-appx:///Assets/cats.wav\">Dog </audio>";

 
    ssmlText += "</speak>";await synth.SpeakSsmlAsync(ssmlText);
}

src屬性指定要載入的音檔,如果朗讀過中載入該音檔失敗、格式不符合或其他理由造成無法播放音檔時,系統會自己以預設的語音朗讀。

另外,src採用的location有些可以支持Assets/cats.wav,但保險一點建議寫成具有完整URI Scheme的格式比較好。

 

 

(3) 插入暫停

       <break />標籤被用於插入至朗讀過程暫停或暫停指定時間,可搭配二個屬性使用:

       ‧strength:選用屬性,其值包括:none, x-weak, weak, medium, strong, or x-strong;

       ‧time:選用屬性,定義停止的時間,單位:seconds或milliseconds;

ssmlText = "<speak version=\"1.0\" ";
ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
// 分別定義要暫停時間與暫停強度
ssmlText += "There is a pause <break time=\"500ms\" /> here, ";
ssmlText += "and another one <break strength=\"x-strong\" /> here";
ssmlText += "</speak>";
await synth.SpeakSsmlAsync(ssmlText);

 

 

(4) 定義或改變單詞的發音

      SSML提供二種方法用於指定speech synthesis調整某一個字的發音。如下:

      ‧針對該字定義on-time的發音(pronunciation),採用<phoneme />

          =>但採用該方法也代表只要出現該字時,均需要再用<phoneme/>包裝一次;

          =>具有二個屬性:ph、alphabet;

          如下範例:

ssmlText = "<speak version=\"1.0\" ";
ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
// 定義<phoneme/>與相關屬性,ph為發音的方式;alphabet為固定
ssmlText += "<phoneme alphabet=\"x-microsoft-ups\" ph=\"O L AA\">hello</phoneme>";
ssmlText += ", I mean hello";ssmlText += "</speak>";

 
await synth.SpeakSsmlAsync(ssmlText);

 

      ‧在一個地方定義多個字的發音,採用<lexicon  />

          =>定義<lexicon />需要額外產生份lexicon file。該份文件也是XML-based,內容包括了發音與文字對應。如下範例:

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"  
        xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"  
        alphabet="x-microsoft-ups" xml:lang="en-US">  
    <lexeme>    
        <grapheme>wife</grapheme>    
        <phoneme> W AI F AI</phoneme>  
    </lexeme>
</lexicon>

               每一個字定義一個<lexeme />,它包含<phoneme />(定義該字如何發音)與<grapheme />(定義什麼字要用特定發音)

            =>定義好的lexicon file,搭配SpeakSynthesizer.SpeakSsmlAsync()時,需要在<speak />中建立的<lexicon />

                加入uri屬性與type屬性,如下:

ssmlText = "<speak version=\"1.0\" ";
ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
//指定uri屬性
ssmlText += "<lexicon uri=\"ms-appx:///Assets/lexicon1.xml\"";
//指定type類型,與MIME Type相同
ssmlText += " type=\"application/pls+xml\"/>";
ssmlText += "She is not my wife";ssmlText += "</speak>";
await synth.SpeakSsmlAsync(ssmlText);

 

需注意,如果一份SSML中同時存在<phoneme />與<lexicon />時,speech synthesis會以<phoneme />為較高的優先權。

更多相關的內容可以參考<lexicon Element SSML>與<Speech Synthesis Markup Language Reference>。

 

 

(5) 更改voices

       有很多方法可以改變指定目前要朗讀的Voice,例如上述透過SpeakSynthesizer.SetVoice()的方法,

從InstalledVioce中搜尋到需要語系再指定。在SSML裡提供<voice />標籤來指定,該標籤具有多個屬性,

但都是選擇使用,但至少要有一個,這些屬性被認為是speech synthesis的優先選的值

因此,如果在載入該voice是有屬性值有錯的話,會另外以其他屬性來使用。

Attribute Description
name Optional. Specifies the name of the installed voice that will speak the contained text.
gender Optional. Specifies the preferred gender of the voice that will speak the contained text.
The allowed values are: male, female, and neutral.
age Optional. Specifies the preferred age in years of the voice that will speak the contained text.
The allowed values are: 10 (child), 15 (teen), 30 (adult), and 65(senior).
xml:lang Optional. Specifies the language that the voice must support.
The value may contain either a lower-case, two-letter language code, (such as en for English), or may optionally include an upper-case, country/region or other variation in addition to the language code, (such as zh-CN).
variant Optional. An integer that specifies a preferred voice when more than one voice matches the values specified in any of the xml:lang, gender, or age parameters.

範例如下:

ssmlText = "<speak version=\"1.0\" ";
ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
//定義了<voice/>與相關的屬性
ssmlText += "<voice name=\"Microsoft Susan Mobile\" gender=\"female\" age=\"30\"";
ssmlText += " xml:lang=\"en-US\">";ssmlText += "This is another test </voice>";
ssmlText += "</speak>";

 
await synth.SpeakSsmlAsync(ssmlText);

另外,還可以透過<p xml:lang="" />與<s xml:lang="" />針對某些內容修改vocie,如下:

ssmlText = "<speak version=\"1.0\" ";
ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-GB\">";
// 利用<p />與<s />切換voice
ssmlText += "<p>";ssmlText += "<s>First sentence of a paragraph</s>";
ssmlText += "<s xml:lang=\"en-US\">And this is the second sentence</s>";
ssmlText += "</p>";ssmlText += "</speak>";
await synth.SpeakSsmlAsync(ssmlText);

 

 

(6) 改變語音的韻律

      可透過<break />標籤去暫停或調整朗讀的速度,另外可以搭配<prosody />提供更多屬性的設定來達到需求。例如:

<prosody pitch="value" contour="value" 
         range="value" rate="value" 
         duration="value" volume="value"> </prosody>
Attribute Description
pitch

Optional. Indicates the baseline pitch for the contained text.

This value may be expressed in one of three ways:

  • An absolute value, expressed as a number followed by "Hz" (Hertz). For example, 600Hz.

  • A relative value, expressed as a number preceded by "+" or "-" and followed by "Hz" or "st", that specifies an amount to change the pitch. For example +80Hz or -2st. The “st” indicates the change unit is semitone, which is half of a tone (a half step) on the standard diatonic scale.

  • An enumeration value, from among the following: x-low, low, medium, high, x-high, or default.

contour

Optional. Represents changes in pitch for speech content as an array of targets at specified time positions in the speech output.

Each target is defined by sets of parameter pairs, for example:

<prosody contour="(0%,+20Hz) (10%,-2st) (40%,+10Hz)">


The first value in each set of parameters specifies the location of the pitch change as a percentage of the duration of the contained text (a number followed by "%").

The second value specifies the amount to raise or lower the pitch, using a relative value or an enumeration value for pitch, see above.

range

Optional. A value that represents the range of pitch for the contained speech content.

This value may be expressed using the same absolute values, relative values, or enumeration values used to describe pitch, see above.

rate

Optional. Indicates the speaking rate of the contained text.

This value may be expressed in one of two ways:

  • A relative value, expressed as a number that acts as a multiplier of the default. For example, a value of 1 results in no change in the rate. A value of .5 results in a halving of the rate. A value of 3 results in a tripling of the rate.

  • An enumeration value, from among the following: x-slow, slow, medium, fast, x-fast, or default

duration Optional. A value in seconds or milliseconds for the period of time that should elapse while the speech synthesis (TTS) engine reads the contents of the element. For example 2s or 1800ms.
volume Optional. Indicates the volume level of the speaking voice.
This value may be expressed in one of three ways:
  • An absolute value, expressed as a number in the range of 0.0 to 100.0, from quietest to loudest. For example, 75. The default is 100.0.

  • A relative value, expressed as a number preceded by "+" or "-" that specifies an amount to change the volume. For example +10 or -5.5.

  • An enumeration value, from among the following: silent, x-soft, soft, medium, loud, x-loud, or default.

搭配程式如下:

ssmlText = "<speak version=\"1.0\" ";
ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
ssmlText += "Testing the ";
// 定義<prosody />
ssmlText += "<prosody pitch=\"+100Hz\" volume=\"70.0\" >Prosody</prosody>";
ssmlText += " element";
ssmlText += "Normal,<prosody rate=\"2\"> Very Fast,</prosody>";
ssmlText += "<prosody rate=\"0.4\"> now slow,</prosody>";
ssmlText += "and normal again";
ssmlText += "</speak>";

 
await synth.SpeakSsmlAsync(ssmlText);

 

 

(7) 監控講話進度

      如果應用程式中需要針對朗讀時有具體的監控行動,可以在SSML中為每一個監控點加上<mark />標籤,那麼,speech synthesizer在朗

讀時如遇到<mark />會自動觸發SpeechBookmarkReached event,透過該事件即可得到相關<mark />的資訊。如下程式內容:

public MainPage()
{    
    InitializeComponent();    
    synth = new SpeechSynthesizer();    
    // Add the event handler for the speech progress events    
    synth.BookmarkReached += new TypedEventHandler<SpeechSynthesizer, 
            SpeechBookmarkReachedEventArgs>(synth_BookmarkReached);
} 

 
private async void Button7_Click(object sender, RoutedEventArgs e)
{    
    ssmlText = "<speak version=\"1.0\" ";    
    ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
    //標記要取得的<mark />
    ssmlText += "<mark name=\"START\"/>";    
    ssmlText += "This is the first half of the speech.";    
    ssmlText += "<mark name=\"HALF\"/>";    
    ssmlText += "and this the second half. Ending now";    
    ssmlText += "<mark name=\"END\"/>";    
    ssmlText += "</speak>";    
    await synth.SpeakSsmlAsync(ssmlText);

 
} 

 
static void synth_BookmarkReached(object sender, SpeechBookmarkReachedEventArgs e)
{    
    Debugger.Log(1, "Info", e.Bookmark + " mark reached\n");
}

 

(8) Specifying content type and aliasing parts of a speech

       利用<say-as />來表示特定的content type(例如:日期、數字)。其格式如下:

<say-as interpret-as="string" format="digit string" detail="string"> <say-as>
Attribute Description
interpret-as Required. Indicates the content type of text contained in the element.
The SSML 1.0 say-as attribute values specification defines six content types.
format Optional. Provides additional information about the precise formatting of the contained text for content types that may have ambiguous formats. SSML defines formats for content types that use them.
detail Optional. Indicates the level of detail to be spoken. For example, this attribute might request that the speech synthesis engine pronounce punctuation marks.

There are no standard values defined for the detail attribute. Support for this attribute depends on the individual speech synthesis engine.

舉例常見interpret-as如下:

Interpret-as Format Interpretation
date dmy, mdy, ymd,
ym, my, md,
dm, d, m, y
The contained text is a date in the specified format.
In the format designations, d=day, m=month, and y=year.

The format for date indicates which date components are represented and their sequence.
The following is an example of a say-as element that contains a date:
 

Today is <say-as interpret-as="date" format="mdy">10-19-2003</say-as>

The speech synthesizer should pronounce “Today is October nineteenth two thousand three”.
cardinal - The contained text should be spoken as a cardinal number.
The following is an example of a say-as element that contains a cardinal number:
 

There are <say-as interpret-as="cardinal">3</say-as> alternatives.

The speech synthesizer should pronounce “There are three alternatives”.

ordinal - The contained text should be interpreted as an ordinal number.
The following is an example of a say-as element that contains an ordinal number:
 

Select the <say-as interpret-as="ordinal">3rd</say-as> option.

The speech synthesizer should pronounce “Select the third option”.

characters - Indicates that each letter in the contained text should be pronounced individually (spelled out).
The following is an example of a say-as element that contains a word that should be spoken as individual letters:
 

<say-as interpret-as="characters">test</say-as>.

The speech synthesizer should pronounce each letter: “T E S T”.
time hms12,
hms24
The contained text is a time. Time may be expressed using either a 12-hour clock (hms12) or a 24-hour clock (hms24).
The format attribute indicates which clock to use. The following is an example of a say-as element that contains a time:
 

The train departs at <say-as interpret-as="time" format="hms12">4:00am</say-as>.


The speech synthesizer should speak “The train departs at four A M”.

Use a colon to separate numbers representing hours, minutes, and seconds.
The following time strings are all valid examples: 12:35, 1:14:32, 08:15, and 02:50:45.
telephone digit string The contained text is a telephone number. The format attribute may contain digits that represent a country code, for example “1” for the United States or “39” for Italy.

The speech synthesis engine may use this information to guide its pronunciation of a phone number.

The country code may also be included in the phone number, and if so, takes precedence over the country code in the format attribute if there is a mismatch. The following is an example of a say-as element that contains a telephone number:

The number is <say-as interpret-as="telephone" format="1">(888) 555-1212</say-as>.

The speech synthesizer should speak “My number is area code eight eight eight five five five one two one two”.

範例如下:

ssmlText = "<speak version=\"1.0\" ";
ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
//定義為ordinal 
ssmlText += "<p>This is an ordinal number: <say-as interpret-as=\"ordinal\">121</say-as></p>";
//定義為cardinal
ssmlText += "<p>This is a cardinal number: <say-as interpret-as=\"cardinal\">121</say-as></p>";
//定義為characters
ssmlText += "<p>And these are just individual numbers: <say-as interpret-as=\"characters\">121</say-as></p>";
ssmlText += "</speak>";

 
await synth.SpeakSsmlAsync(ssmlText);

另外,還可以搭配<sub />來提供指定某一字需要換讀完整的字段,例如:在文字中可以把字寫成縮寫,但在讀的時候想要用完整字來讀的情境。

用於定義一個別名的功能,它可能不是特別有用,因為它的工作原理就像一次性別名,因此,其意圖可能是給SSML文檔所提供的方式有書面和

口頭形式的相關文本更清晰。範例如下:

ssmlText = "<speak version=\"1.0\" ";
ssmlText += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";
//定義別名,所以當遇到WP8時,不會讀WP8,而換成Windows Phone 8
ssmlText += "This code runs on <sub alias=\"Windows Phone 8\">WP8</sub>";
ssmlText += "</speak>";
await synth.SpeakSsmlAsync(ssmlText);

 

(9) 播放一份SSML document

     定義SSML document本身即是XML文件,把上述介紹過的一些參數與格式整理成一份檔案。

搭配SpeackSssmlFromUri()將應用程式中的SSML Document透過URI的方式載入進行朗讀。一份完整的SSML Document如下:

<speak version="1.0" 
       xmlns="http://www.w3.org/2001/10/synthesis" 
       xml:lang="en-US">  
    <voice gender="male" xml:lang="en-US">    
        <prosody rate="0.8">      
            <p>Thanks for reading the article, and thanks for trying the examples</p>
            <p>Now be creative, and create amazing applications for this fantastic platform</p>      
            <voice gender="male" xml:lang="es">Adios</voice>    
        </prosody>  
    </voice>
</speak>

搭配以下的程式段:

await synth.SpeakSsmlFromUriAsync(new Uri("ms-appx:///Assets/SSML1.xml"));

 

[範例程式]

======

以上是分享在WP8如果做到Text-to-Speech的功能,並且介紹SSML中常用到的標籤,讓TTS在朗讀時可更加豐富。

不過還有很多相關TTS的應用與例子在這邊沒有提到,可以考<References>中的項目來加以補充。謝謝。

 

References

Text-to-speech (TTS) for Windows Phone

What's new in Windows Phone SDK 8.0 (重要)

Speech for Windows Phone 8 (重要)

Speech recognition for Windows Phone 8

Play with text-to-speech (speech synthesis) installed voices

Text to speech on Windows Phone 8

Continuous Location tracking on Windows Phone 8 Part 2: Background & Adam Benoit

Basic text-to-speech (TTS)

Speech recognition and text-to-speech

Handling errors in speech apps for Windows Phone (錯誤排除)

Using SSML for advanced text-to-speech on Windows Phone 8 (重要)

Text to Speech in Windows Phone 7

Using text to speech on Windows Phone 8

Text To Speech in Windows Phone

Put your voice to work using the free Text-to-Speech app for Windows Phone 8

 

Dotblogs 的標籤: