Slides and code from my HoloLens & Cognitive Services session at DDD Wales

I’ve just uploaded my slides and samples related to my session Analysing visual content using HoloLens, Computer Vision APIs, Unity and the Windows Mixed Reality Toolkit at DDD Wales.

The source code is available on GitHub.

Experiments with HoloLens, Mixed Reality Toolkit and two-handed manipulations

I’ve always been a big fan of manipulations, as in the past I worked on some multi-touch XAML Behaviors implementing rotate, translate and scale on 2D objects.

As I progress with my learning about HoloLens and Windows Mixed Reality, I had on my to-do list the task of exploring how to recreate this scenario in the 3D Mixed Reality context. Finally, during the weekend, I started some research while preparing a demo for some speaking engagements I’ll have over the next weeks.

As usual, the Mixed Reality Toolkit is a fantastic help for speeding up development, so I started my investigation analysing the GitHub repo and found that the dev branch now contains a new readme illustrating all the steps required to use a new Unity script TwoHandManipulatable.cs which enables rotate, translate and scale to 3D objects in Unity using two hands tap gestures with HoloLens and the Motion Controllers with the Immersive headsets.

I decided to give this a try using a model imported from the Remix 3D repository.

Importing the model from Remix 3D

I fired up Paint 3D and selected More models to explore the ones available online in Remix 3D: this is a great container of assets you can use in Mixed Reality apps. I chose to explore a model of the Mars Rover, so I selected and opened it:

Then I exported it as a 3D FBX object to be able to import in Unity and saved it on my local machine as MarsRover.fbx.

Creating the Unity project

I started a new Unity Project, and copied the folder Assets\MixedRealityToolkit from the dev branch of the MRT GitHub repository.

After applying the Mixed Reality Project and Scene settings from the corresponding menu, I was ready to import the Mars Rover model and associate the manipulation script.

I selected Assets->Import New Asset, searched for the previously saved mode MarsRover.fbx and adjusted the scale to X:60, Y:60, Z:60 to have it correctly visualised in my scene. Then, I inserted a new Capsule Collider for enabling interaction with the object:

Adding the TwoHandManipulatable script

After selecting the imported model, I actioned the Add Component button from the inspector tab and searched for the Two Hand Manipulatable script in the toolkit and then added it to the asset together with a BoundingBoxBasic for showing the boundaries when applying manipulations.

And set the manipulation mode to Move Rotate Scale.

The scene was completed, so I only had to test it with the device: I selected the File->Build Settings, generated the package and deployed it to HoloLens, and I got the Mars Rover ready to be explored with two-handed manipulations. Amazing! 

The source code is available on my GitHub playground here.

I will speak at the DDD Wales 2018 conference

I’m very excited to announce that I will be speaking about HoloLens, Windows Mixed Reality and Cognitive Services at the DDD (Developer! Developer! Developer!) Wales 2018 conference in Swansea on 24th March.

I’m looking forward to meeting all the community! See you there? 🙂

Analysing visual content using HoloLens, Computer Vision APIs, Unity and the Windows Mixed Reality Toolkit

In these days, I’m exploring the combination of HoloLens/Windows Mixed Reality and the capabilities offered by Cognitive Services to analyse and extract information from images captured via the device camera and processed using the Computer Vision APIs and the intelligent cloud.
In this article, we’ll explore the steps I followed for creating a Unity application running on HoloLens and communicating with the Microsoft AI platform.

Registering for Computer Vision APIs

The first step was to navigate to the Azure portal and create a new Computer Vision API resource:

I noted down the Keys and Endpoint and started investigating how to approach the code for capturing images on HoloLens and sending them to the intelligent cloud for processing.

Before creating the Unity experience, I decided to start with a simple UWP app for analysing images.

Writing the UWP test app and the shared library

There are already some samples available for Cognitive Services APIs, so I decided to reuse some code available and described in this article here supplemented by some camera capture UI in UWP.

I created a new Universal Windows app and library (CognitiveServicesVisionLibrary) to provide, respectively, a test UI and some reusable code that could be referenced later by the HoloLens experience.

The Computer Vision APIs can be accessed via the package Microsoft.ProjectOxford.Vision available on NuGet so I added a reference to both projects:

The test UI contains an image and two buttons: one for selecting a file using a FileOpenPicker and another for capturing a new image using the CameraCaptureUI. I decided to wrap these two actions in an InteractionsHelper class:

I then worked on the shared library creating a helper class for processing the image using the Vision APIs available in Microsoft.ProjectOxford.Vision and parsing the result.

Tip: after creating the VisionServiceClient, I received an unauthorised error when specifying only the key: the error disappeared by also specifying the endpoint URL available in the Azure portal.

I then launched the test UI, and the image was successfully analysed, and the results returned from the Computer Vision APIs, in this case identifying a building and several other tags like outdoor, city, park: great!

I also added a Speech Synthesizer playing the general description returned by the Cognitive Services call:

I then moved to HoloLens and started creating the interface using Unity, the Mixed Reality Toolkit and UWP.

Creating the Unity HoloLens experience

First of all, I created a new Unity project using Unity 2017.2.1p4 and then added a new folder named Scenes and saved the active scene as CognitiveServicesVision Scene.

I downloaded the corresponding version of the Mixed Reality Toolkit from the releases section of the GitHub project and imported the toolkit package HoloToolkit-Unity-2017.2.1.1.unitypackage using the menu Assets->Import Package->Custom package.

Then, I applied the Mixed Reality Project settings using the corresponding item in the toolkit menu:

And selected the Scene Settings adding the Camera, Input Manager and Default Cursor prefabs:

And finally set the UWP capabilities as I needed access to the camera for retrieving the image, the microphone for speech recognition and internet client for communicating with Cognitive Services:

I was then ready to add the logic to retrieve the image from the camera, save it to the HoloLens device and then call the Computer Vision APIs.

Creating the Unity Script

The CameraCaptureUI UWP API is not available in HoloLens, so I had to research a way to capture an image from Unity, save it to the device and then convert it to a StorageFile ready to be used by the CognitiveServicesVisionLibrary implemented as part of the previous project.

First of all, I enabled the Experimental (.NET 4.6 Equivalent) Scripting Runtime version in the Unity player for using features like async/await. Then, I enabled the PicturesLibrary capability in the Publishing Settings to save the captured image to the device.

Then, I created a Scripts folder and added a new PhotoManager.cs script taking as a starting point the implementation available in this GitHub project.

The script can be attached to a TextMesh component visualising the status:

Initialising the PhotoCapture API available in Unity

Saving the photo to the pictures library folder and then passing it to the library created in the previous section:

The code references the CognitiveServicesVisionLibrary UWP library created previously: to use it from Unity, I created a new Plugins folder in my project and ensured that the Build output of the Visual Studio library project was copied to this folder:

And then set the import settings in Unity for the custom library:

And for the NuGet library too:

Nearly there! Let’s see now how I enabled Speech recognition and Tagalong/Billboard using the Mixed Reality Toolkit.

Enabling Speech

I decided to implement a very minimal UI for this project, using the speech capabilities available in HoloLens for all the interactions.

In this way, a user can just simply say the work Describe to trigger the image acquisition and the processing using the Computer Vision API, and then naturally listening to the results.

In the Unity project, I selected the InputManager object:

And added a new Speech Input Handler Component to it:

Then, I mapped the keyword Describe with the TakePhoto() method available in the PhotoManager.cs script already attached to the TextMesh that I previously named as Status Text Object.

The last step required to enable Text to Speech for receiving the output: I simply added a Text to Speech component to my TextMesh:

And enabled the speech in the script using StartSpeaking():

I also added other two components available in the Mixed Reality Toolkit: Tagalong and Billboard to have the status text following me and not anchored to a specific location:

I was then able to generate the final package using Unity specifying the starting scene:

And then I deployed the solution to the HoloLens device and started extracting and categorising visual data using HoloLens, Camera, Speech and the Cognitive Services Computer Vision APIs.


The combination of Mixed Reality and Cognitive Services opens a new world of experiences combining the capabilities of HoloLens and all the power of the intelligent cloud. In this article, I’ve analysed the Computer Vision APIs, but a similar approach could be applied to augment Windows Mixed Reality apps and enrich them with the AI platform

The source code for this article is available on GitHub: