Archive

Posts Tagged ‘YouTube’

Captioning YouTube Videos

May 27th, 2010 jeb 3 comments

You Tube logoBack in March 2010, I rather gleefully blogged about YouTube’s latest feature called “automatic captioning.” Since that time, I have become bemused and amused by the state of this “service.” It seems Google – the owners and operators of YouTube – have been using our videos as fodder for their new Google Voice speech-to-text (S-t-T) translation machine. Google claims, “It (Google Voice transcripts) will improve over time as our transcription engine gets smarter.” It is not clear how the Google transcription engine will get “smarter,” but I’m, figuring the more the system is used, the more it will learn, and the smarter it will become…make sense?

Whoever perfects S-t-T stands to make billions in the first year, so it stands to reason Google would be interested in tapping into that treasure chest. But perfecting S-t-T has always been an elusive goal and anyone worth their salt in the captioning or transcription business knows the human beings still make the best captionists.

That said, at the recent Accessibility Unconference a few weeks ago, the issue of S-t-T came up and there was lots of interest in YouTube’s “automatic captioning” service. I should note here that YouTube currently calls this a “machine transcription” service and offered it with some caveats. They also seem, in some ways, to be more interested in the language translation tool that was also delivered on YouTube at the same time. Perhaps there is more money to be made in the translation of Chinese to English than in S-t-T.

At the Unconference, there was one gentleman who represented a transcription service company in Massachusetts that used a system based upon a combination of automated S-t-T and human power. He claimed that his system was much faster than regular human-only transcription because machines  take the first cut at the translation and humans completed the final edits. He also claimed it was flawless. Lastly, he noted that the fee for this service ranged on a scale based upon the quality of the audio. Apparently, the poorer the quality of the speech, the more interactions with humans is necessary, and the more expensive is the price tag.

So all this got me thinking about the experimental YouTube video I created and posted back in early March. The “automatic captioning,” eh, machine translation, of my video was indeed a bit hilarious. Sharing it with friends, we all howled at the bizarre transcripts that were produced by the system. It was a bit like playing that children’s game, “Telephone,” where you whisper something into someone’s ear and they whisper it into the next person and so on down the line until the last person says it out loud. The final product never comes out correctly and is usually quite funny. And indeed, the YouTube “machine transcription” was much the same.

For my test video, I purposely read a printed text -  as opposed to spontaneous speech – so I would have an exact copy of the content from which to compare the transcript. The results were marginal at best and honestly, the transcript really made no logical sense. It was also amazing what YouTube’s machine translation failed to recognize. The machine translation had a particular difficult time with the words “accessibility” and “web design.” Go figure.

I recently learned that you could download the YouTube machine translation, edit it, and then re-post it to the original YouTube video. So, today I finally got around to trying this and though successful, the process was not without pain.

First, the machine transcript is saved in some unique YouTubian format (.SBV). The content is readable using a simple text editor and looks like this:

 0:00:02.179,0:00:07.740
   okay so am I- of doing it tested video here
   it and I'm going to read this to see if the
   0:00:07.740,0:00:09.959
 captioning system works well

Fortunately, my MovCaptioner software could import the file and provide an easy way for editing the content. But after editing the text, I could not export the transcript without first merging it with a video. I had to grab the original video from YouTube (which I downloaded in .MP4 format) and then load that into MovCaptioner. Once the editing was finished (see note below about time), I was able to save and export the file in another format (.SUB for Subtitle format) and then upload that transcript file to YouTube.

The final edited .SUB file looks like this:

 00:00:02.17,00:00:07.72
   Okay so I am doing a test
   video here and I'm going to
   read this to see if the
   00:00:07.74,00:00:09.94
 captioning system works well

As predicted, the most strenuous part of the process is the actual editing of the transcript. Even though the machine transcript had gotten about 50% of the content correct, it still took close to 45 minutes for me to edit the three minutes of video. It is clear that I talk pretty fast, as there was 75 lines of text that had to be edited. I can’t imagine doing this for anything longer.

So, I’ve learned a few things here:

First, YouTube’s “automatic captioning/machine translation” is far from perfect and must not be used, at this point, for anything other than amusement. I am not sure if Google has a timeline on when this will get better, but until it produces accuracy at a 85% or higher basis, I would not rely on it as a usable transcription.

Second, while machine translation, followed by human editing is clearly more accurate than machine translation alone, the time savings may not be all that one might imagine. I’m guessing that a professional transcriptionist using state of the art equipment would have been able to transcribe the three minutes of video a lot faster than I was able to edit the machined version.

Last, we are still a long way from fully accurate S-t-T and if you are going to use videos on your websites, and want them to be accessible, you are probably still going to have to pay someone to create a transcript/caption file for you.

Note: jeremykemp has posted a YouTube video comparing human vs. machine translation on several video clips. You can see the errors produced by the machine transcription.

Captioning and YouTube

March 10th, 2010 jeb 2 comments

youtube logo

UPDATE – March 10, 2010: Yes, it is true. Google has announced that the “automatic captioning service” first detailed in November, is now available to all accounts (channels). It appears that, for now, you have to “request” the service (although it appears they automatically had captioned my latest video which was posted several months ago), and they will eventually get to all of them. Pretty cool. More on the announcement. Directions on how to caption

I recently heard the news about the new “automatic captioning” that Google is providing to certain YouTube accounts. According to the “Official Google Blog:”

…we’ve combined Google’s automatic speech recognition (ASR) technology with the YouTube caption system to offer automatic captions, or auto-caps for short. Auto-caps use the same voice recognition algorithms in Google Voice to automatically generate captions for video. The captions will not always be perfect (check out the video below for an amusing example), but even when they’re off, they can still be helpful—and the technology will continue to improve with time.

Apparently, Google is rolling this out with a select group of partners and on specific channels. My understanding is that Google will simply start captioning videos in these groups using this new automatic system.

Anyone who knows anything about captioning knows that automatic systems are fraught with problems. It seems the best captioners are still human beings. And, well, I’m guessing Google is not interesting in hiring half the population of the planet and training them to become transcriptionists. Cause that’s what it would probably take to get enough human power to deal with the zillions of YouTube videos out there.

But if you can’t wait for Google to automatically caption the home videos of your kids opening their Christmas presents, you can use another, lesser-known, and equally free service called CaptionTube. It is not clear from my reading if CaptionTube is a service that Google Labs developed themselves or whether is was acquired through some kind of company merger, but in any case, the price is right. I’m still playing with it so I don’t have an official opinion yet. If you are a master user, send me a comment or an e-mail.

I have, for a year or so, been also playing around with an application called MovCaptioner that runs on the Mac OSX. SynchriMedia, the maker of MovCaptioner has been promising a Windows version, but I’m thinking CaptionTube might be the right product at the right price. MovCaptioner costs $39.95 for one license which provides free updates. Multiuser licenses are also available for a discount.

Both MovCaptioner and CaptionTube work essentially the same way. You load your video (in the case of CaptionTube, you can work off an existing YouTube video that has already been  published). As you play back your video in the application, you can stop (marking the time code automatically) and type in what the people on the video are saying. It is not really easy to do, so I have developed an new affinity for the people who do this work professionally. People do not talk in nice tight sound bytes, so you will quickly find it is hard to “stop the tape” at the appropriate spot and add the caption. You also have to have pretty good listening skills. You will end up often repeating the clip to get the wording correctly. Again, it’s not easy.

After you have created the text for your captions, you click some buttons, uploading the caption file, and check back in a little while and see your YouTube with captions. In the case of MovCaptioner, you have a number of options for saving and publishing your video. MovCaptioner has the advantage of saving a file that can use it with, or converted for use with any media player, not just the Flash media player that YouTube uses.

Both captioning systems appear to use an “closed caption” method meaning the caption transcript is kept separate from the video file (not embedded like subtitles in old movies). It can be turned off and on by the user, and the transcript itself can be saved and used separately – with or without the time codes. This is a nice option.

I’ve made this all sound very simple; it’s not. But, it is not all that difficult either. Like anything, it is an acquired skill.

I am hoping this new automatic service from Google takes off and become universally available soon. At the very least, Google could first provide this as a service for folks who need to get their videos captioned now (e.g., educational institutions, governments, etc.). Maybe even open it up with invites like they did with GMail and GoogleWave. I’d be happy to be a beta tester.

Anyway, a solution to finding a quick and inexpensive way of captioning short videos is coming closer to fruition. Exciting times. Stay tuned!