Creating Subtitles/Captions for a Video Project
I’m working on a quick 20 minute video for a non-profit group (and I do mean quick — less than two weeks from initial contact to the premiere). Everything has come together pretty nicely and I’ve finished the editing to my satisfaction in Cinelerra. Because some of the interviewees in the video have impaired speech, we’ve decided that it would be good to add subtitles in English to help viewers to follow the audio. Since we don’t want to stigmatize the people with the speech difficulties, I’m subtitling the whole video. This is fine since it isn’t too long. In any case, it also has the benefit of making the video accessible to the hard of hearing.
Rather than use Cinelerra’s title effect to add the captions, I decided to teach myself how to do it the “proper” way with a separate subtitle file.
Transcribing the Subtitles
Surprisingly, it isn’t completely obvious how and when to place the subtitles in the video. How long should they appear on screen? Should they be synchronized precisely with the video? How do you punctuate people’s verbal stream-of-consciousness ramblings? Do you include the speaker’s pauses and “um”s and so on? What do you do about words you simply cannot decipher? There’s lots of technical info out there on closed captioning, but not much in the way of tips for the beginner who is creating do-it-yourself video. The best synopsis I could find was a quote from this thread about live streaming closed captioning:
The real keys are timing and readability. Captioning is not just typing what the person is saying. You also decide when the caption appears and disappears. Tying a new caption to a shot change or any sort of movement that indicates that a person is about to speak is a good idea. The idea is that the caption itself should work with the rest of the visuals and not be too distracting. Don’t let any captions be too short or too long either. Right around two seconds a piece is a good rule of thumb. As far as readability, be wary of where you break a sentence if a caption is more than one line long. Again, the idea here is that the captions should flow.
Based on these thoughts, plus some common sense, I’ve devised the following informal guidelines for my project:
1) Try to transcribe precisely all of the actual words. Indicate long pauses by creating a new caption, or with ellipses or whatever punctuation seems appropriate. Ignore interjected particles such as “uh” or “um” unless they seem to be essential to the meaning of the utterance.
2) Synchronize the caption to start and end at the same time as the audio, unless the audio is less than two seconds long, in which case let the caption linger for two seconds.
3) Try to break captions on cuts in the video where possible.
4) Watch out for dramatic conclusions to sentences — keep the parts synchronized separately, so you don’t give away the ending in advance.
5) When a speaker is talking at length, use two line captions as required and try to break on sentence clauses or on pauses in the speech. Shorter is usually better.
Creating the Subtitle File
The technical process of creating subtitles is fairly easy, if tedious. I used the subtitleeditor application (tips here and here and here) which can be easily installed in Ubuntu to transcribe most of the titles. Unfortunately, I found it frustrating that the application sometimes seemed to arbitrarily ignore what I had typed, forcing me to re-enter all or most of many subtitles. Just for this step, I switched to the “Gaupol Subtitle Editor” which worked much the same way and didn’t seem to have the same bug. Next time, I’d investigate using a spreadsheet and then importing the data into the subtitle editor program. I found that transcribing and synchronizing had to be done as two separate steps anyhow.
For the synchronizing, I found the easiest method was to go back to the first Subtitle Editor app, use it to generate a waveform from the audio track of my video, then use the mouse and keyboard to set the correct start and end points for each subtitle. I set the “play/pause” shortcut to Super (Windows key) plus space, “short” (skip backward) to Super-left arrow, and “short” (skip forward) to Super-right arrow (yes, there are two keyboard shortcuts with the same name, but you can tell which is which by clicking on them in the shortcuts preferences). The default is for left-mouse-click to set the start point for the selected subtitle line, and right-mouse-click to set the end point. Middle-click restarts the playback at the point in the wave timeline where you clicked. Since I already had all the words transcribed, I just had to run through the audio track, listening and watching the waveform graph, using the shortcuts to move around, and click to select the right in and out points for each line. I found I had to split some of the subtitles, but that was also easy using the Subtitle Editor tool. Overall, synchronization went pretty smoothly.
There are many different subtitle formats, but the default format worked fine for me.
Combining the Subtitles with the Video
According to the Cinelerra manual, there are three obvious choices for combining subtitles with your video:
# Distribute it with your video. People will have to load the appropriate subtitle file in their video player to actually see the subtitles.
# Use it with dvdauthor, to add the subtitles in a DVD. Read dvdauthor’s documentation for more information.
# Incrust the subtitles into the video using mencoder.
I used the second method, which seems to be the most flexible option, but also the most complicated. In my case, I wanted to force the subtitles to appear rather than allowing the user to select whether to play them. Dvdauthor provides a tool called spumux, which allows you to add a subtitle track to the video. Then, you use dvdauthor (as you normally would) to build the dvd filesystem. From there, the process is the same as for burning a non-subtitled video (I’ll write up my own process at some point, but there’s already a very good explanation from Crazed Mule: look here for exporting from Cinelerra and look here for creating a DVD). You’ll need to create an xml file for spumux and another for dvdauthor.
Here is my subtitle xml file for the first step:
<textsub filename=”/home/kevin/Video/video_ykacl/project_xml/en_subtitles.sub” characterset=”UTF-8″
top-margin=”20″ bottom-margin=”30″ subtitle-fps=”29.97″
movie-fps=”29.97″ movie-width=”720″ movie-height=”480″
Note that for spumux to work, you MUST have copy or symbolic link to the truetype font file (in this case, Arial.ttf, but use whatever you like) in the .spumux folder in your home directory. I did it by running this command:
The above command assumes you have the msttcorefonts package installed. Also note the force=”yes” option, which tells the dvd player to always display these subtitles.
Once you’ve created the xml, you run spumux to combine your subtitle file with your video:
I put this command in a script in which I defined $VIDEO_PROJECT and $VIDEO_TMP — you can subsitute whatever path is appropriate.
Now you need to create another xml file for dvdauthor to use when building the dvd menu and filesystem (in my case the video is supposed to play automatically, so there is no actual menu):
<subpicture lang=”en” />
<pre> subtitle=64; </pre>
<vob file=”/home/kevin/Video/video_tmp/video_ykacl_tmp/full_render_subtitle.mpg” />
The most important option in this file is the subtitle=64, which tells the dvd player to automatically display subtitle track 0 (see the links below for details on this). Now you just run dvdauthor:
At this point you should have a working dvd filesystem ready to be burned onto an actual dvd. Before burning, you can test your output by running
More Information About Subtitles
Explanation of how to add subtitles in dvdauthor is here.
Useful details on spumux is here.
Good examples here.
No comments yet.
No trackbacks yet.
So far I have always converted my HDV footage to NTSC before editing. For a small project I’m working on now, I’ve decided to try working directly with the raw HDV footage. This thread explains why my m2t files show up as 1440×1080 even though they can be edited as 1920×1080 (aka 1080i). It has [...]
April 1, 2010 - 10:13 pm
I’ve been experimenting with video codecs and formats for uploading video to the web (Facebook, Youtube, etc.). Encoding for the Web To make a tiny file, you can encode files this way: ffmpeg -i foo.dv foo.mp4 For better quality output, there are a number of variables you can control. I’m starting with two assumptions for [...]
February 23, 2010 - 10:36 pm
I needed to isolate the audio track from an HDV video I captured from my camera. This command worked great: mplayer -dumpaudio video.m2t -dumpfile audio.mp3 as discussed here: http://linux.byexamples.com/archives/229/extract-audio-from-video-or-online-stream/ UPDATE: To extract wav audio data from a dv file, I had to use this command: ffmpeg -i video.dv -vn -acodec copy audio.wav Details here: http://www.tuxradar.com/content/ffmpeg-made-easy
I’m putting on a workshop/demo this evening to introduce some people to open source software and Linux for video editing. Here are the handouts: linux_video. Thanks to Western Arctic Moving Pictures for hosting the workshop!
November 30, 2009 - 2:53 pm
I’ve moved my Cinelerra projects from one directory path to another and scrambled a few things in the process. I’ve written a few bash scripts which helped me sort things out. In the process, I’ve learned a bit about the xml format which Cinelerra uses to store references to files and edits. When you add [...]
November 30, 2009 - 2:44 pm
This is a bash script to remove all the unwanted assets from a Cinelerra project XML file. I needed it to cut out bloat from a project I was working on which had hundreds of unused resources after I split the project up into ten minute chunks. Doing so stopped Cinelerra from crashing when I [...]
November 30, 2009 - 2:39 pm
This is a bash script which contains a function used to figure out which assets in a Cinelerra project are not required for the edit list. I created it because I had pulled in a couple of thousand media resources for a project and wanted to eliminate bloat. I moved the project and found it [...]
November 30, 2009 - 2:33 pm
This is a bash script to test all the filenames and paths in a Cinelerra project to make sure they are still valid. #!/bin/bash #kk test_xml_paths.sh #kk #kk check a cinelerra xml file to make sure the file references are all valid #kk provide filename to check as parameter if [ -z "$1" -o "$1" [...]
Three choices as per the details here: tcprobe -i filename ffmpeg -i filename midentify filename Info on file types here. I’m trying to understand video formats to make it easier to transcode for various purposes. I’ve found some useful links: adamwilt.com > the DV, DVCAM & DVCPRO Formats HDV vs HD: A Primer Digital Video [...]
Everytime I try to set up Cinelerra on a new project or on a new system configuration, it takes a while to get everything working well. In my case, I have several new variables to contend with: I’m starting with HDV footage from a new camera, I have a new machine, I’m using a new [...]