<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>jebsblog &#187; speech-to-text</title>
	<atom:link href="http://jebswebs.net/blog/tag/speech-to-text/feed/" rel="self" type="application/rss+xml" />
	<link>http://jebswebs.net/blog</link>
	<description>comments about accessible and universal web design</description>
	<lastBuildDate>Mon, 30 Jan 2012 21:37:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Captioning YouTube Videos</title>
		<link>http://jebswebs.net/blog/2010/05/captioning-youtube-videos/</link>
		<comments>http://jebswebs.net/blog/2010/05/captioning-youtube-videos/#comments</comments>
		<pubDate>Thu, 27 May 2010 18:39:45 +0000</pubDate>
		<dc:creator>jeb</dc:creator>
				<category><![CDATA[Accessibility]]></category>
		<category><![CDATA[General Information]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[captioning]]></category>
		<category><![CDATA[speech-to-text]]></category>
		<category><![CDATA[transcription]]></category>
		<category><![CDATA[YouTube]]></category>

		<guid isPermaLink="false">http://jebswebs.net/blog/?p=509</guid>
		<description><![CDATA[Back in March 2010, I rather gleefully blogged about YouTube&#8217;s latest feature called &#8220;automatic captioning.&#8221; Since that time, I have become bemused and amused by the state of this &#8220;service.&#8221; It seems Google &#8211; the owners and operators of YouTube &#8230; <a href="http://jebswebs.net/blog/2010/05/captioning-youtube-videos/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://jebswebs.net/blog/wp-content/uploads/2009/12/youtube_logo.jpg"><img class="alignright size-full wp-image-276" title="youtube_logo" src="http://jebswebs.net/blog/wp-content/uploads/2009/12/youtube_logo.jpg" alt="You Tube logo" width="264" height="198" /></a>Back in March 2010, <a href="http://jebswebs.net/blog/2010/03/captioning-and-youtube/">I rather gleefully blogged about  YouTube&#8217;s latest feature called &#8220;automatic captioning.&#8221;</a> Since that  time, I have become bemused and amused by the state of this  &#8220;service.&#8221; It seems Google &#8211; the owners and operators of YouTube &#8211;  have been using our videos as fodder for their new <a href="http://www.google.com/voice">Google Voice</a> speech-to-text  (S-t-T) translation machine. Google claims, &#8220;It (Google Voice transcripts) will  improve over time as our transcription engine gets smarter.&#8221; It is not  clear how the Google transcription engine will get &#8220;smarter,&#8221; but  I&#8217;m, figuring the more the system is used, the more it will learn, and the  smarter it will become&#8230;make sense?</p>
<p>Whoever perfects S-t-T stands to make billions in the first  year, so it stands to reason Google would be interested in tapping into that treasure  chest. But perfecting S-t-T has always been an elusive goal and anyone worth  their salt in the captioning or transcription business knows the human beings  still make the best captionists.</p>
<p>That said, at the recent <a href="http://jebswebs.net/blog/2010/05/the-unconference/">Accessibility Unconference</a> a few  weeks ago, the issue of S-t-T came up and there was lots of interest in YouTube&#8217;s  &#8220;automatic captioning&#8221; service. I should note here that YouTube  currently calls this a &#8220;machine transcription&#8221; service and offered it  with some caveats. They also seem, in some ways, to be more interested in the  language translation tool that was also delivered on YouTube at the same time.  Perhaps there is more money to be made in the translation of Chinese to English  than in S-t-T.</p>
<p>At the Unconference, there was one gentleman who represented  a transcription service company in Massachusetts that used a system  based upon a combination of automated S-t-T and human power. He claimed that his  system was much faster than regular human-only transcription because machines  take the first cut at the translation and humans completed the final edits. He  also claimed it was flawless. Lastly, he noted that the fee for this service  ranged on a scale based upon the quality of the audio. Apparently, the poorer  the quality of the speech, the more interactions with humans is necessary, and  the more expensive is the price tag.</p>
<p>So all this got me thinking about <a href="http://jebswebs.net/blog/2010/03/captioning-and-youtube/">the experimental YouTube video  I created and posted back in early March</a>. The &#8220;automatic captioning,&#8221;  eh, machine translation, of my video was indeed a bit hilarious. Sharing it  with friends, we all howled at the bizarre transcripts that were produced by  the system. It was a bit like playing that <a href="http://en.wikipedia.org/wiki/Chinese_whispers">children&#8217;s game, &#8220;Telephone,&#8221;</a> where you whisper something into  someone&#8217;s ear and they whisper it into the next person and so on down the line  until the last person says it out loud. The final product never comes out  correctly and is usually quite funny. And indeed, the YouTube &#8220;machine  transcription&#8221; was much the same.</p>
<p>For my test video, I purposely read a printed text -  as  opposed to spontaneous speech &#8211; so I would have an exact copy of the content  from which to compare the transcript. The results were marginal at best and  honestly, the transcript really made no logical sense. It was also amazing what YouTube&#8217;s machine translation failed to recognize. The machine translation had a particular  difficult time with the words &#8220;accessibility&#8221; and &#8220;web  design.&#8221; Go figure.</p>
<p>I recently learned that you could download the YouTube  machine translation, edit it, and then re-post it to the original YouTube video.  So, today I finally got around to trying this and though successful, the  process was not without pain.</p>
<p>First, the machine transcript is saved in some unique  YouTubian format (.SBV). The content is readable using a simple text editor and  looks like this:</p>
<pre> 0:00:02.179,0:00:07.740
   okay so am I- of doing it tested video here
   it and I'm going to read this to see if the
   0:00:07.740,0:00:09.959
 captioning system works well</pre>
<p>Fortunately, my <a href="http://www.synchrimedia.com/">MovCaptioner software</a> could import the file  and provide an easy way for editing the content. But after editing the text, I  could not export the transcript without first merging it with a video. I had to  grab the original video from YouTube (which I downloaded in .MP4 format) and  then load that into MovCaptioner. Once the editing was finished (see note below  about time), I was able to save and export the file in another format (.SUB for  Subtitle format) and then upload that transcript file to YouTube.</p>
<p>The final edited .SUB file looks like this:</p>
<pre> 00:00:02.17,00:00:07.72
   Okay so I am doing a test
   video here and I'm going to
   read this to see if the
   00:00:07.74,00:00:09.94
 captioning system works well</pre>
<p>As predicted, the most strenuous part of the process is the  actual editing of the transcript. Even though the machine transcript had gotten  about 50% of the content correct, it still took close to 45 minutes for me to  edit the three minutes of video. It is clear that I talk pretty fast, as there  was 75 lines of text that had to be edited. I can&#8217;t imagine doing this for  anything longer.</p>
<p>So, I&#8217;ve learned a few things here:</p>
<p>First, YouTube&#8217;s &#8220;automatic captioning/machine  translation&#8221; is far from perfect and must not be used, at this point, for  anything other than amusement. I am not sure if Google has a timeline on when  this will get better, but until it produces accuracy at a 85% or higher basis,  I would not rely on it as a usable transcription.</p>
<p>Second, while machine translation, followed by human editing  is clearly more accurate than machine translation alone, the time savings may  not be all that one might imagine. I&#8217;m guessing that a professional  transcriptionist using state of the art equipment would have been able to  transcribe the three minutes of video a lot faster than I was able to edit the  machined version.</p>
<p>Last, we are still a long way from fully accurate S-t-T and if  you are going to use videos on your websites, and want them to be accessible,  you are probably still going to have to pay someone to create a  transcript/caption file for you.</p>
<p>Note: <a href="http://www.youtube.com/watch?v=6jiFrnFvUJs">jeremykemp has posted a YouTube video </a>comparing human vs. machine translation on several video clips. You can see the errors produced by the machine transcription.</p>
<p>Second Note: Human transcribed captioning is still your best bet. The company we have used for transcription/captioning is <a href="http://www.automaticsync.com/captionsync/">AST Sync</a> out of California. They are very easy to work with, provide great customer service, are fast and very reasonably priced (about $185 per one hour of video; less if you already have a transcript). If you do it yourself, count on it taking you a minimum of 3-to-1 in staff time to do a complete transcription with edits and time marks. In other words, a one hour video will take three hours of your &#8211; or someone&#8217;s time to get a quality final product. You can do the math.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://jebswebs.net/blog/2010/05/captioning-youtube-videos/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	<a href="#wrapper" class="skip-content screen-reader-text">Skip to top</a></channel>
</rss>

