Experiments with HoloLens, Bot Framework and LUIS: adding text to speech

- 2 mins

Previously I blogged about creating a Mixed Reality 2D app integrating with a Bot using LUIS via the Direct Line channel available in the Bot Framework.

I decided to add more interactivity to the app by also enabling text to speech for the messages received by the Bot: this required the addition of a new MediaElement for the Speech synthesiser to the main XAML page:

<Page
    x:Class="HoloLensBotDemo.MainPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    mc:Ignorable="d">

    <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
        <Grid.ColumnDefinitions>
            <ColumnDefinition Width="10"/>
            <ColumnDefinition Width="Auto"/>
            <ColumnDefinition Width="10"/>
            <ColumnDefinition Width="*"/>
            <ColumnDefinition Width="10"/>
        </Grid.ColumnDefinitions>
        <Grid.RowDefinitions>
            <RowDefinition Height="50"/>
            <RowDefinition Height="50"/>
            <RowDefinition Height="50"/>
            <RowDefinition Height="Auto"/>
        </Grid.RowDefinitions>
        <TextBlock Text="Command received: " Grid.Column="1" VerticalAlignment="Center" />
        <TextBox x:Name="TextCommand" Grid.Column="3" VerticalAlignment="Center"/>

        <Button Content="Start Recognition" Click="StartRecognitionButton_Click" Grid.Row="1" Grid.Column="1" VerticalAlignment="Center" />

        <TextBlock Text="Status: " Grid.Column="1" VerticalAlignment="Center" Grid.Row="2" />
        <TextBlock x:Name="TextStatus" Grid.Column="3" VerticalAlignment="Center" Grid.Row="2"/>

        <TextBlock Text="Bot response: " Grid.Column="1" VerticalAlignment="Center" Grid.Row="3" />
        <TextBlock x:Name="TextOutputBot" Foreground="Red" Grid.Column="3" 
                   VerticalAlignment="Center" Width="Auto" Height="Auto" Grid.Row="3"
                   TextWrapping="Wrap" />
        <MediaElement x:Name="media" />
    </Grid>
</Page>

Then I initialized a new SpeechSynthesizer at the creation of the page:

public sealed partial class MainPage: Page
{
    private SpeechSynthesizer synthesizer;
    private SpeechRecognizer recognizer;

    public MainPage()
    {
        this.InitializeComponent();

        InitializeSpeech();
    }

    private async void InitializeSpeech()
    {
        synthesizer = new SpeechSynthesizer();
        recognizer = new SpeechRecognizer();

        media.MediaEnded += Media_MediaEnded;
        recognizer.StateChanged += Recognizer_StateChanged;

        // Compile the dictation grammar by default.
        await recognizer.CompileConstraintsAsync();
    }

    private void Recognizer_StateChanged(SpeechRecognizer sender, SpeechRecognizerStateChangedEventArgs args)
    {
        if (args.State == SpeechRecognizerState.Idle)
        {
            SetTextStatus(string.Empty);
        }

        if (args.State == SpeechRecognizerState.Capturing)
        {
            SetTextStatus("Listening....");
        }
    } 
…….

And added a new Speech() method using the media element:

private async void Speech(string text)
{
    if (media.CurrentState == MediaElementState.Playing)
    {
        media.Stop();
    }
    else
    {
        try
        {
            // Create a stream from the text. This will be played using a media element.
            SpeechSynthesisStream synthesisStream = await synthesizer.SynthesizeTextToStreamAsync(text);

            // Set the source and start playing the synthesized audio stream.
            media.AutoPlay = true;
            media.SetSource(synthesisStream, synthesisStream.ContentType);
            media.Play();
        }
        catch (System.IO.FileNotFoundException)
        {
            var messageDialog = new Windows.UI.Popups.MessageDialog("Media player components unavailable");
            await messageDialog.ShowAsync();
        }
        catch (Exception)
        {
            media.AutoPlay = false;
            var messageDialog = new Windows.UI.Popups.MessageDialog("Unable to synthesize text");
            await messageDialog.ShowAsync();
        }
    }
}

When a new response is received from the Bot, the new Speech() method is called:

var result = await directLine.Conversations.GetActivitiesAsync(convId);
if (result.Activities.Count > 0)
{
    var botResponse = result
        .Activities
        .LastOrDefault(a => a.From != null && a.From.Name != null && a.From.Name.Equals("Davide Personal Bot"));
    if (botResponse != null && !string.IsNullOrEmpty(botResponse.Text))
    {
        var response = botResponse.Text;

        TextOutputBot.Text = "Bot response: " + response;
        TextStatus.Text = string.Empty;

        Speech(response);
    }
}

And then the recognition for a new phrase is started again via the MediaEnded event to simulate a conversation between the user and the Bot:

private void Media_MediaEnded(object sender, Windows.UI.Xaml.RoutedEventArgs e)
{
    StartRecognitionButton_Click(null, null);
}

As usual, the source code is available for download on GitHub.

Davide Zordan

Davide Zordan

Senior Software Engineer

rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora mastodon