Saturday, June 21, 2014

Recording the Self-presenting Presentation

If you watched the screencast video in my last post, "Free Voice Talent ..." you may have noticed that I used a TTS voice for parts of the presentation. What is not evident is the fact that this recorded presentation was entirely automatic, including the human and TTS narration. It was a self-presenting presentation.

Here's how it is done using only ScreenFlow, Keynote, Applescript and Apple's built-in text to speech (TTS) capabilities.

We start by creating a Keynote presentation and making the presenter's notes visible with View > Show Presenter Notes which looks like this:


Next, we create a slide in the Keynote deck using the presenter notes to script the narration for that slide. Repeat until you have laid out all visual and narrative aspects of your message.

At this point, we need to develop an Applescript application that will launch Keynote and cause the presenter notes to be read with a TTS voice. So, go to /Applications/Utilities, open Applescript Editor. Copy and paste in the following script:

property defaultSlideDuraton : 2
property pauseBeforeSpeaking : 1.5
property stoppingStatement : "[[slnc 1000]] Stopping presentation."

tell application "Keynote"
if not (exists document 1) then error number -128

if playing is true then stop the front document

tell document 1
set the slideCount to the count of slides
start from first slide
end tell
repeat with i from 1 to the slideCount
if playing is false then
my speakSlideNotes(stoppingStatement)
error number -128
end if
tell document 1
set thisSlidesPresenterNotes to presenter notes of slide i
end tell
if thisSlidesPresenterNotes is "" then
delay defaultSlideDuraton
if pauseBeforeSpeaking is not 0 then
delay pauseBeforeSpeaking
end if
my speakSlideNotes(thisSlidesPresenterNotes)
end if
if i is slideCount then
exit repeat
if playing is false then
my speakSlideNotes(stoppingStatement)
error number -128
end if
show next
end if
end repeat

tell document 1 to stop
on error errorMessage number errorNumber
if errorNumber is not -128 then
display alert (("ERROR " & errorNumber) as string) message errorMessage
end if
end try
end tell

on speakSlideNotes(thisSlidesPresenterNotes)
if thisSlidesPresenterNotes begins with "[voice:" then
set x to the offset of "]" in thisSlidesPresenterNotes
set textToSay to text from (x + 1) to -1 of thisSlidesPresenterNotes
set thisVoiceName to text 8 thru (x - 1) of thisSlidesPresenterNotes
say textToSay using thisVoiceName
say thisSlidesPresenterNotes -- with waiting until completion
end if
end speakSlideNotes

Save it as a script so that you can use it at any time in the future.

At this point, you could set ScreenFlow to record and then Run the script with your Keynote presentation open and in the forefront. The default system voice will read each slide as it is presented. The presentation will automatically exit after the last slide has been narrated so you can stop the ScreenFlow recording, trim the ends in ScreenFlow and be done.

However and as we discussed in the last post, you may not be satisfied with such a plain vanilla rendition. Thus, we get into a few of the more sophisticated techniques that apply to this kind of work. You should already know about auditioning, using phonetic mis-spelling, punctuation and embedded speech commands to create a more human-sounding rendition of text. If not, review the "Free Voice Talent ..." post for the details.

To these we will add two new techniques: 1) Adjusting for video and animation. 2) Selecting voices on a per slide basis.

Adjusting for video and animation. If your presentation includes a video clip as the screencast referenced here does, you'll need to build in a delay that is equal to the duration of the video. This can be done with the embedded speech command [[slnc nnn]] where nnn is the duration of the embedded video in milliseconds. The following screenshot is of a slide containing a video with a duration of 9 minutes 16.033 seconds. That converts to 556,000 milliseconds. YMMV (Your math may vary).


This also applies to any transitions or animations that you may have included. Simply run the script without recording to audition the slide making adjustments to the placement and duration of the silence command. This command, as you learned from the last post, is also very useful in creating appropriate pauses between paragraphs.

Selecting voices on a per slide basis. Since there is no embedded speech command that enables selecting which voice is operative, we do this via the speakSlideNotes function in the script. This special command must be the first text encountered in the presenter notes (no leading spaces, please). Inserting [voice:Daniel] as the first element in the presenter notes for a slide will cause that voice to be used - no need to change the system default. Note the single (not double) square brackets. This is especially valuable as you may use an alternating male and a female voice to add narrative interest to your screencast. The following screenshot illustrates the use of this special command.


So that's all there is to it. Create a slide deck in Keynote, add presenters notes, audition and adjust the text of the presenter notes with the provided Applescript until the desired presentation with narration is achieved. Then, launch ScreenFlow to record the self-presenting presentation, execute the script, stop recording, trim and export.

Because voice quality and variety is so easily assured with a single take, this could be called Rapid Screencast Development (RSD).

Wednesday, June 4, 2014

Free Voice Talent for Your Next ScreenFlow Screencast

Screencasting often entails narration and many screencasters take a do-it-yourself (DIY) approach to this critical part of the screencasting process. The price (free) is usually right and ScreenFlow does a fine job of recoding from an external or built-in microphone. However, there are a few downsides to consider.

  • The quality of the microphone and the recording environment may not be as good as they ought to be.
  • Not everyone has a great “radio voice” and not everyone has a smooth manner of speaking. If that weren’t true, professional voice talent wouldn’t be so expensive.
  • Items A and B often result in having to do many “takes” and that requires time that may not be available or justifiable.
  • Even where the screencaster/narrator has a fine voice, a good recording environment and a great speaking style, there is often a need for a different voice if only to provide variety or counterpoint – important to audience engagement.

These are some of the reasons why screencasters may want to employ additional voice talent instead of or to supplement their own. The show stopper for many is the cost of professional voice talent. The website provides a good overview of the field and the costs involved.

But wait! There is a free option that only requires that you have and be able to use ScreenFlow on a MacOS X computer. Interested?

As you may already know, MacOS X has a very good text-to-speech engine built-in and ready to use with voices ranging from very human sounding (Alex) to the bizarre (Zarvox). Additional voices organized by language/accent groups can be downloaded from Apple. What’s more, you can purchase and download even more high quality voices from companies such as
Cepstral and AssistiveWare. In most MacOS X apps, one can select a body of text and then choose Edit > Speech >Start Speaking to hear those words spoken with the selected system voice. Selecting a different system voice is a simple matter of using the drop-down menu in the Text to Speech tab of the Dictation & Speech preference pane.

Less well known is the fact that this process can be extended to include creating a recording of such a reading and that this can be automated as a service that is easily accessed in any application that handles text. With narration scripted, it becomes a simple matter to add audio narration to a ScreenFlow screencast in one or more interesting voices.

Before we get into the details of how to do that, let’s take a look at an example that was created using the procedures about to be described. We start with some publicly accessible content so that the reader can follow along. I used Hubblecast 68:
The Hubble Time Machine because it was short (07m 27s), in the public domain, included a script (English) and several subtitle tracks in English and various other languages. I also discovered a great source of free soundtracks in the Zero Project used in this and other Hubblecast epodes.

Of course the video already had world class narration in English so, to make things interesting, I set out to create a Russian narrated version of this video. I used the subtitle track for Russian, a simple text file, to generate a number of audio clips identifiable by the start time of each clip. The original video alternated between a male and female narrators so I chose to mirror that in the Russian version by using two Russian voices, Yuri (male) and Katia (female). These had to be downloaded from Apple as they are not included in the basic set of voices.

So here is our demo, the Russian narrated version of Hubblecast 68: The Hubble Time Machine.

There are a number of flaws in this demo that we’ll address in the how-to section. As a proof of concept, though, it does clearly demonstrate the feasibility of easily adding other voices and other languages to a video.

I added both the English and the Russian subtitles so that you could follow the audio with one or the other. These may be accessible to you if you are looking at the video using the very latest versions of Safari on iOS or MacOS X. The Safari subtitle selector will look like this:


No other web browsers yet support internal subtitles so if you don't have or use Safari for some reason, download
this version of the video and play it in iTunes, QuickTime X Player, VLC or some other capable video viewer that supports subtitles. Note also that I am using the HTML video tag with MPEG-4 video and no WebM fallback so some versions of Firefox may refuse to play the video because there is no *.webm or *.ogg fallback version. The solution is to switch to any other major web browser.

So, how is this magic achieved? It’s actually pretty easy. Let’s go through the following three easy steps.

Step 1: A more thorough tutorial would have us build the four requisite Automator workflows by hand (you can get that here and here) but automation is not one of our main learning objectives so we’ll take a shortcut and download the four Automator workflows needed for this tutorial here. Once you unpack this archive, you will find four Automator workflows as follows:

1) Read the Selection Aloud (female).workflow
2) Read the Selection Aloud (male).workflow
3) Render Selection as Audio (female).workflow
4) Render Selection as Audio (male).workflow

These are easily installed by double-clicking each of them and then clicking the Install button in the resulting dialog. Note that MacOS X 10.7 and newer
may object saying that the workflow is from an unidentified developer. To circumvent this, Control-click on the workflow file and click Open from the resulting contextual menu. This is a one-time operation so the second and all subsequent uses will require only double-clicking the file. Opening the *.workflow file will yield the following dialog:


Here is a more detailed version of these instructions.

Please note that this security obstacle is very important to protecting your computer and the data handled by it. You should never download and install apps, scripts or Automator workflows from untrusted sources. While apps can be signed by the developer, scripts and Automator workflows cannot. These workflows are from and, so, I considered them trustworthy.

After installing, you'll see a completion dialogue with an Open with Automator button. Click on that to see how each workflow was assembled. The two named “Read the Selection Aloud …” are very simple whereas the two named “Render Selection as Audio …” are more complex as follows:


All four workflows come preconfigured with the following voices: Daniel (Male) and Serena (Female). Note the Play button to the right of the selected voice where you may audition it. The language is English (United Kingdom). Feel free to change these as you like or need. For more choices, download additional voices.

Here's how to download new voices from Apple:
• Open System Preferences choosing the Dictation & Speech preference pane and then selecting the Text to Speech tab.
• Click on the System Voice drop down menu selecting Customize ...
• In the resulting dialog, select new voices using the checkboxes to the left of the name of each voice. Note that voices are organized and labeled by language, location (accent) and gender.
• Click on the OK button and, wait patiently for the voice to download. This can take a while.
Once the download is complete, you will be able to select new voices in the Automator workflow as described above.

For the Russian demo, I duplicated, reconfigured and renamed the workflows so that I would have separate services for rendering English and Russian text to speech. It is not at all necessary for you to follow this more complicated procedure. Simply opening the Automator workflows and changing the voice from Daniel to Yuri and from Serena to Katia would have been sufficient. Nonetheless, here's how to do it:

• The Russian voices are not installed by default so have to be downloaded as described above.
• Go to ~/Library/Services where all of your Services are kept. Note that the tilde (~) indicates your Home folder which carries the short username that you chose when your computer was first set up.
• Select the file, "Render Selection as Audio (female).workflow" and duplicate it (Cmd-D or File > Duplicate).
• Rename the duplicated file to reflect its new, specialized function. I used, "Render Selection as Audio (female-Russian).workflow."
• Double-click this file to open it in Automator.
• Change the voice to Katia using the drop down menu in the Text to Audio segment of the workflow and close the file.
• Repeat the process to create "Render Selection as Audio (male-Russian).workflow" using the voice, Yuri.
You will now have both the originally downloaded workflows and these specialized additional workflows available to you in any text handling application that supports Services. To illustrate, here's a screenshot from Pages.


Step 2: Using the provided script, the subtitle track or other means, create a time stamped working script then use that script to create a set of audio files as follows.

Here is a snippet of the PDF script that I found on the Hubblecast resource page:

4. As we can’t travel to other galaxies or star systems and view them for
ourselves, we rely on telescopes like Hubble.
One of the main scientific justifications for building Hubble was to
measure the size and age of the Universe. This task has produced
some of the telescope’s most iconic images, taken as Hubble peered
into the faraway Universe to see what galaxies looked like in the past.
[Dr. J - STUDIO 2]
5. So how is it possible that Hubble can look into the past?
Well, that’s because, just like a spacecraft, light also travels at a finite
speed. At 300,000 kilometres per second, this speed is very high, but it
is still finite. That means that, in principle, everything we see is a thing of
the past.
Now normally, in our everyday lives, it doesn’t matter, because the
distances are just too small. But when we look at the Moon, we see it as
it was about 1 second ago. The Sun we see as it was about 8 minutes
ago. For the nearest star, it’s about 4 years, and the edge of our galaxy
we see as it was about 100,000 years ago.

The equivalent female (narrator) parts of the subtitle track look like this.


00:01:24,000 --> 00:01:30,000
As we can't travel to other galaxies or star systems and view them for ourselves

00:01:30,000 --> 00:01:33,000
we rely on telescopes like Hubble.

00:01:34,000 --> 00:01:38,000
One of the main scientific justifications for building Hubble

00:01:38,000 --> 00:01:43,000
was to measure the size and age of the Universe.

00:01:43,000 --> 00:01:48,000
This task has produced some of the telescope's most iconic images,

00:01:48,000 --> 00:01:55,000
taken as Hubble peered into the faraway Universe to see what galaxies looked like in the past.


00:01:24,000 --> 00:01:30,000 NF
И поскольку мы не можем слетать к другим галактикам или звездным системам и посмотреть на них своими глазами

00:01:30,000 --> 00:01:33,000
мы полагаемся на телескопы вроде Хаббла.

00:01:34,000 --> 00:01:38,000
Одним из главных научных обоснований строительства Хаббла

00:01:38,000 --> 00:01:43,000
была необходимости измерить размер и возраст Вселенной.

00:01:43,000 --> 00:01:48,000
Эта задача привела к тому, что были созданы самые лучшие изображения,

00:01:48,000 --> 00:01:55,000
когда Хаббл вглядывался вглубь Вселенной, чтобы увидеть, как же выглядели галактики в прошлом.

So, that part of our working script for this particular female (narrator) excerpt winds up looking like this:

И поскольку мы не можем слетать к другим галактикам или звездным системам и посмотреть на них своими глазами
мы полагаемся на телескопы вроде Хаббла.
Одним из главных научных обоснований строительства Хаббла
была необходимости измерить размер и возраст Вселенной.
Эта задача привела к тому, что были созданы самые лучшие изображения,
когда Хаббл вглядывался вглубь Вселенной, чтобы увидеть, как же выглядели галактики в прошлом.

Next, we select this text in our text editor (almost any text handling app will do) and go to the Services menu (usually in the app menu) selecting one of the following Automator workflows:
• Render Selection as Audio (female) if you have changed the voice of that Automator workflow to Katia.
• Render Selection as Audio (Russian-Female) if you have created a dedicated Automator workflow as I did.

This creates an audio file in QuickTime X Player like this with a title using the first few characters of the selected text.


It's auto-saved to disk (~Desktop/Audio Rendering Service/) but we will re-save it to a different location changing the name to 01-23F.m4a to indicate a start time of 1 minute, 23 seconds and the gender of the speaker (as a cross-check because we know they alternate). This filename will be very helpful in placing the audio clip when we get to ScreenFlow.

However, before you start cranking out audio files like sausages, you may want to audition your script with the voices that you are using. This is what the male and female "Read the Selection Aloud ..." workflows are for. It is quite possible that you will not be fully satisfied with the way that that the Text To Speech (TTS) engine treats the selected text. You may want to tweak it a bit. Fortunately, there are ways to influence these readings and, consequently, the recordings too.

The simplest tactics involve mis-spelling words in a more phonetic fashion, using punctuation and issuing commands that the TTS engine will obey such as the silence command. Here is an example where we insert 400 millisecond periods of silence into a sentence:

Don't forget to bring your hat, [[slnc 400]] sunglasses, [[slnc 400]] sandals, [[slnc 400]] and towel.

... compare that to:

Don't forget to bring your hat, sunglasses, sandals, and towel.

... or:

Don't forget to bring your hat; sunglasses; sandals; and towel.

Mac Developer Library PDF document is extremely detailed and technical but you can extract from it such usable gems as those in the examples above. Just skip over the stuff that seems too complicated. Table 3-1 is where you want to focus your attention.

Once you have a complete script that reads well, decide how to organize it in segments or chunks that correspond with the gender of the speaker and the pauses that occur naturally as the video plays. An *.srt subtitle file will provide you will all the timing information you need to make sure that the audio corresponds with the video. Finally, go ahead and produce a set of audio files as described above.

Step 3: Using ScreenFlow to assemble the audio narration track and synchronize it to the video.

At this point, you should have a video file and a bunch of audio files whose names tell you where on the timeline they should start. Determine the resolution (width and height in pixels) of your video using QuickTime X Player by opening the video and doing Window > Show Movie Inspector or Cmd-I. Make sure that the video is being displayed at its actual size (View > Actual Size or Cmd-1). The first figure is the width while the second is the height. My example movie (Hubblecast 68) was 1920x1080 which is the resolution of 1080p video.

Open ScreenFlow 4.5, dismiss the configure/start recording dialog that usually opens by default on launch and do File > New Empty Document or Shift-Cmd-N to get the following dialog.


Choose one of the presets or enter custom dimensions to equal the resolution of your video so that it will use all of ScreenFlow's canvas. Next, bring the video file and all of the audio files into ScreenFlow's Media Library using either drag and drop from the Finder or using the Add Media function in the Media Library. Either way, the Media Library should look something like the image below. Note how the names of the audio files cause them to be arranged sequentially. This will help assembly go more smoothly.


Next, drag the video file onto the canvas and center it. The video should use all of the canvas. Use the scrubber to move the playhead randomly to confirm that fact.

Since we're going to replace the English narration with Russian, we'll need to select the video track in the ScreenFlow timeline and then go to Audio Properties where we can Mute Audio. Play the movie for a few seconds to confirm that it is, indeed, silent. The next and most important task, then, is to begin adding audio files to the timeline such that the narration jibes with the visuals in the video.

Here is where our audio file naming scheme pays off. Because our first audio file is named 00-00F (female narrator starts at 0 minutes, 0 seconds), we know that it should begin to play as soon as the video starts. That's easy so we drag that file to the timeline placing it below the video and as far to the left as possible. If we drag the scrubber/playhead to the beginning of that file, the timecode readout should be zero.


Here's a closeup of the timecode readout when the scrubber/playhead is in the leftmost position.


This timecode readout will help us place each audio file in the proper location so that the narration matches the visuals in our screencast. For example, if our next audio file is named 00-53M, we know that it should start at the 53 second mark. Simply move the scrubber until the timecode readout looks like this:


Then drag the file named 00-53-M.m4a to a position to the right of the scrubber/playhead, double-click the gap between them and then press the Delete key. That will place this audio sample precisely on the timeline where it ought to be. Repeat this process for the remaining audio files and optionally place a music track beneath all this with ducking turned on for the narration track.

Then, export your screencast as you normally would.

Finally, let's recapitulate these three steps in a screencast as follows:

Download the 720p version of this video
here with Control-click.