Manipulate subtitles in GO (.srt, .ssa/.ass, .stl, .ttml, .vtt (webvtt), teletext, etc.)

Overview

GoReportCard GoDoc Travis Coveralls

This is a Golang library to manipulate subtitles.

It allows you to manipulate srt, stl, ttml, ssa/ass, webvtt and teletext files for now.

Available operations are parsing, writing, syncing, fragmenting, unfragmenting, merging and optimizing.

Installation

To install the library and command line program, use the following:

go get -u github.com/asticode/go-astisub/...

Using the library in your code

WARNING: the code below doesn't handle errors for readibility purposes. However you SHOULD!

// Open subtitles
s1, _ := astisub.OpenFile("/path/to/example.ttml")
s2, _ := astisub.ReadFromSRT(bytes.NewReader([]byte("00:01:00.000 --> 00:02:00.000\nCredits")))

// Add a duration to every subtitles (syncing)
s1.Add(-2*time.Second)

// Fragment the subtitles
s1.Fragment(2*time.Second)

// Merge subtitles
s1.Merge(s2)

// Optimize subtitles
s1.Optimize()

// Unfragment the subtitles
s1.Unfragment()

// Write subtitles
s1.Write("/path/to/example.srt")
var buf = &bytes.Buffer{}
s2.WriteToTTML(buf)

Using the CLI

If astisub has been installed properly you can:

  • convert any type of subtitle to any other type of subtitle:

      astisub convert -i example.srt -o example.ttml
    
  • fragment any type of subtitle:

      astisub fragment -i example.srt -f 2s -o example.out.srt
    
  • merge any type of subtitle into any other type of subtitle:

      astisub merge -i example.srt -i example.ttml -o example.out.srt
    
  • optimize any type of subtitle:

      astisub optimize -i example.srt -o example.out.srt
    
  • unfragment any type of subtitle:

      astisub unfragment -i example.srt -o example.out.srt
    
  • sync any type of subtitle:

      astisub sync -i example.srt -s "-2s" -o example.out.srt
    

Features and roadmap

  • parsing
  • writing
  • syncing
  • fragmenting/unfragmenting
  • merging
  • ordering
  • optimizing
  • .srt
  • .ttml
  • .vtt
  • .stl
  • .ssa/.ass
  • .teletext
  • .smi
Comments
  • Support reading of non utf-8 ttml

    Support reading of non utf-8 ttml

    Not sure if you would be interested in this change? I had to deal with some pesky TTML files that were created on windows and were encoded in utf-16le charset (ironically not valid TTML). This was semi annoying to resolve. The problem is explained in here https://groups.google.com/forum/#!topic/golang-nuts/tXcECEKC2rs

    The problem lies with encoding/xml's design: in order to use the charset reader the xml library needs to examine the first line of text from the xml file (where the encoding is specified). unfortunately that first line contains invalid UTF-8 already, and libxml barfs before it even figures out what the encoding should be to pass it to our charset reader.

    This change should allow passing in utf-16 or utf-8 encoded files into the reader.

    opened by saintberry 11
  • stl: have vertical position no lower than 1

    stl: have vertical position no lower than 1

    According to EBU 3264 (https://tech.ebu.ch/docs/tech/tech3264.pdf):

    For teletext subtitles, VP contains a value in the range 1-23 decimal (01h-17h)
    corresponding to theteletext row number of the first subtitle row.
    
    For in-vision subtitles, VP contains a value in the range 0..NN decimal,
    where NN is the maximumnumber of rows indicated in the MNR field in the GSI block
    (Note: NN cannot be greater than 99 decimal(63h)).
    This VP represents the number of row locations from the top of the screen to the first subtitle row.
    

    My motivation: some STL subs I encountered had an erroneous zero vertical position.

    opened by dlecorfec 9
  • TTML ticks higher than 15 minutes display invalid values

    TTML ticks higher than 15 minutes display invalid values

    Tick values higher than a certain amount will result in incorrect times.

    For instance, take this test TTML:

    <?xml version="1.0" encoding="UTF-8"?>
    <tt xmlns="http://www.w3.org/ns/ttml" xmlns:tt="http://www.w3.org/ns/ttml" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:ttp="http://www.w3.org/ns/ttml#parameter" xmlns:tts="http://www.w3.org/ns/ttml#styling" ttp:tickRate="10000000" ttp:version="2" xml:lang="ja">
     <head>
      <styling>
       <initial tts:backgroundColor="transparent" tts:color="white" tts:fontSize="6.000vh"/>
       <style xml:id="style0" tts:textAlign="center"/>
       <style xml:id="style1" tts:textAlign="start"/>
       <style xml:id="style2" tts:ruby="container" tts:rubyPosition="auto"/>
       <style xml:id="style3" tts:ruby="base"/>
       <style xml:id="style4" tts:ruby="text"/>
       <style xml:id="style5" tts:ruby="text"/>
      </styling>
      <layout>
       <region xml:id="region0" tts:displayAlign="after"/>
      </layout>
      </head>
     <body xml:space="preserve">
      <div>
       <p xml:id="subtitle1" begin="18637368750t" end="18676157500t" region="region0" style="style0"><span style="style1">テソプ<span style="style2"><span style="style3">の所だ</span><span style="style4">カン食ン</span></span>食おう<br/><span style="style2"><span style="style3">江陵</span><span style="style5">カンヌン</span></span>で刺身でも食おう</span></p>
      </div>
     </body>
    </tt>
    

    The resulting ASS is:

    [Script Info]
    
    [V4 Styles]
    Format: Name
    Style: italic
    Style: span
    
    [Events]
    Format: Start, End, Text
    Dialogue: 00:0-8:0-44..48,00:0-8:0-43..02,TEST
    

    Probably something to do with ticks not being int64 but normal integers, not sure though.

    bug 
    opened by rrooij 8
  • add colors to webvtt output

    add colors to webvtt output

    I looked at a TTML to WebVTT converted file from https://transcribefiles.net/other/pages/caption-subtitle-converter.htm, they used the font attribute, so here it is ... Not sure if it's OK to rely on TTMLColor but at least it has the proper value (for now).

    opened by dlecorfec 8
  • ttml to vtt region settings migration

    ttml to vtt region settings migration

    1. WEBVTT: Nil pointer exception when the region doesn't have parent style.
    2. TTML to VTT region settings migrated.
    3. WebVTT read cue settings from parent style if no setting present inline
    4. TTML to VTT cue position settings migrated.
    opened by discovery-avishekgulshan 7
  • I couldn't manage to find

    I couldn't manage to find "Item" components when reading SRT.

    I wanted to get the ID number of a specific item of an SRT file; but it brings nil.

    Tried this code: s1, _ := astisub.OpenFile("example-in.srt")

    fmt.Println(s1.Items[0].Region)

    bug 
    opened by MerNat 7
  • changed stl default chars and rows value to comply with standard

    changed stl default chars and rows value to comply with standard

    Hi,

    I need to have width and height datas formated to standardized values regarding maximum number of characters per line and maximum number of rows in stl file metadatas.

    Currently they are :

    maximumNumberOfDisplayableCharactersInAnyTextRow: 40, maximumNumberOfDisplayableRows: 23,

    which is way too much especially for Rows

    I changed it to

    maximumNumberOfDisplayableCharactersInAnyTextRow: 37, maximumNumberOfDisplayableRows: 2,

    Ideally, it would be nice to be able to edit theses values through flags when converting subtitle.

    As an information on standardized values, please go to this link https://tech.ebu.ch/docs/tech/tech3360.pdf

    and page 17 under "1.4 conversion strategies" second paragraphe which explain 👍

    Specifically the Teletext format has a line length limit of 40 characters, which includes the control characters to select colours and to activate background display for normal pages (‘Start box’ codes). Teletext pages marked as ‘sub-title’ or newsflash pages within a Teletext service should be displayed as ‘boxed text’ by a receiver.16 The control code overhead required for boxed text has resulted in a ‘common practise’ of using a maximum of 37 (or 36) characters in a subtitle row for text.

    and a number of row set to 2 seems to be the most used number of maximum lines

    thanks a lot

    Best regards

    Fabien

    opened by flenoir 7
  • TTML: Fixed improper spacing between captions

    TTML: Fixed improper spacing between captions

    1. Fixed improper spacing between captions while converting TTML -> VTT
    2. Converted TTML style attributes to pointers
    3. Fixed Webvtt out not using parent style attributes in the "region" section
    opened by discovery-avishekgulshan 5
  • Cheetah CAP Files?

    Cheetah CAP Files?

    Hello there- I apologize if this is the wrong venue for a simple question.

    Do you plan to support CAP files in this library in the near future?

    Thanks.

    question 
    opened by davidkotas 5
  • webvtt,srt: Tolerate edge whitespace (and print line numbers on error)

    webvtt,srt: Tolerate edge whitespace (and print line numbers on error)

    Hi! I'm by no means an expert of WebVTT format but this fixed a parsing problem for me. Feel free to reject if it's a bad change, or improve it or whatever you'd like.

    PS. It'd be really helpful to output the line number when an error is encountered. Would you be open to a PR for that?

    opened by mholt 5
  • Speaker not included when writing VTT

    Speaker not included when writing VTT

    This is a fantastic module thank you. I've noticed that when I read VTT files that it will set the VoiceName field. However, when I write the file back down it doesn't include the speaker.

    I'm not sure if this effects other formats as I've only tested with VTT.

    question 
    opened by agorman 4
  • convert ass subtitle contains more than one language to vtt fail

    convert ass subtitle contains more than one language to vtt fail

    cat /tmp/Armageddon.1998.ass

    Title: CNXP
    Original Script: lzqc
    PlayResX: 384
    PlayResY: 288
    Timer: 100.0000
    
    [V4+ Styles]
    Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
    Style: chs,simhei,20,&H00ffffff,&H0000ffff,&H00000000,&H80000000,1,0,0,0,90,90,0,0.00,1,2,2,2,20,20,17,1
    
    [V4 Styles]
    Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, TertiaryColour, BackColour, Bold, Italic, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, AlphaLevel, Encoding
    Style: eng,Arial Narrow,12,&H00ffeedd,&H00ffc286,&H00000000,&H80000000,-1,0,1,1,0,2,20,20,4,0,1
    
    [Events]
    Format: Marked, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
    Dialogue: 0,0:00:53.11,0:00:55.23,*eng,,0000,0000,0000,,This is the Earth at a time...
    Dialogue: 0,0:00:55.36,0:01:00.19,*eng,,0000,0000,0000,,when the dinosaurs roamed a lush and fertile planet.
    Dialogue: 0,0:01:07.58,0:01:11.03,*eng,,0000,0000,0000,,A piece of rock just six miles wide...
    Dialogue: 0,0:00:53.11,0:00:55.23,*chs,,0000,0000,0000,,这是地球
    Dialogue: 0,0:00:55.36,0:01:00.19,*chs,,0000,0000,0000,,那是恐龙称霸的时代 万物滋长 欣欣向荣
    Dialogue: 0,0:01:07.58,0:01:11.03,*chs,,0000,0000,0000,,一块只有六里宽的石头
    

    reproduct bug

    go install github.com/asticode/go-astisub/[email protected]
    astisub convert -i /tmp/Armageddon.1998.ass -o /tmp/out.vtt
    

    it shows 2022/12/13 21:32:07 astisub: style *eng not found while opening /tmp/Armageddon.1998.ass

    opened by tonytony2020 6
  • not support .ass file?

    not support .ass file?

    s1, err := astisub.OpenFile("Call.Me.by.Your.Name.2017.BluRay.ass")
    if err != nil {
    	return nil, err
    }
    

    // s1.items is nil, why ,if subtitle file is .srt ,it works.

    opened by rhettli 4
  • Use Items.Index in VTT write method

    Use Items.Index in VTT write method

    currently use array index vs item.index. I think this better :) I add method FixIndex, 4 reindex Items. this useful after call fragments & unfragments

    opened by mysamimi 1
  • add a debug option to astisub to output informations on a subtitle

    add a debug option to astisub to output informations on a subtitle

    Hello!

    This PR adds a "-d" option to the astisub command, which sets a bool in the option struct passed to Open(). On reading a STL file, outputs to stdout:

    • some values of the GSI block
    • subtitle lines with their number, start time, end time, vertical position and number of line (and exclamation marks if something odd in timestamps)
    • the total number of errors encountered

    It is designed for the interactive human user, no concern has been given to automatic parsing of this output.

    STL:GSIBlock
    STL:  DisplayStandardCode:0x31
    STL:  TotalNumberOfTTIBlocks:959
    STL:  TotalNumberOfSubtitleGroups:1
    STL:  TotalNumberOfSubtitles:959
    STL:
    STL: #0000   00000.080 - 00002.000   vp=22	lines=1 [*Thème musical de l'émission]
    STL: #0001   00002.200 - 00004.800   vp=22	lines=1 [...]
    STL: #0002   00005.000 - 00008.760   vp=20	lines=2 [-Mais oui ! Mais oui, les petits potes sont là,]
    [...]
    STL: #0378 ! 00990.560 - 00988.640 ! vp=16	lines=1 [avant celui pour le public belge.]
    [...]
    STL: #0958   02614.560 - 02618.400   vp=20	lines=2 [Sous-titrage ST' 501]
    STL: 1 error(s)
    

    Maybe it would be a better idea to do the lines output at the end of subtitles.go:Open, so it would be common to all kinds of subtitles and not clutter stl.go.

    An unoprotected global struct keeps the debug state: debug is not really meant to be enabled when using astisub as a lib.

    Not sure if it's worth being incorporated in astisub, but I keep going back to that debug output when problems are signaled in our subs ;)

    opened by dlecorfec 3
  • ssa code

    ssa code

    when i read this line:

    Dialogue: 0,0:00:36.38,0:00:38.84,DX,NTP,0,0,0,!Effect,Even solo players need to\Ntake this more seriously{\fscx300}-{\r}
    
    

    the result is nil. I read code : https://github.com/asticode/go-astisub/blob/master/ssa.go#L1068

    and find this func not handle text before {\fscx300}. is this a feature or bug ?

    bug 
    opened by eager7 3
Owner
Quentin Renard
Freelance | Senior backend developer (GO)
Quentin Renard
Asu-go2js - Asu is a library to work with subtitles on ASS format.

asu-go2js Asu is a library to work with subtitles on ASS format. asu-go2js is a port of Asu (originally for .NET) written in Go and compiled to JavaSc

Eduardo Hinojosa (Frost) 0 Jan 8, 2022
ffcommander - An easy frontend to FFmpeg and Imagemagick to automatically process video and manipulate subtitles.

% FFCOMMANDER(1) ffcommander 2.39 % Mikael Hartzell (C) 2018 % 2021 Name ffcommander - An easy frontend to FFmpeg and Imagemagick to automatically pro

Mikael Hartzell 2 May 9, 2022
A Golang SSA Interpreter

A Golang SSA Interpreter

GoPlus 68 Nov 9, 2022
a simple golang SSA viewer tool use for code analysis or make a linter

ssaviewer A simple golang SSA viewer tool use for code analysis or make a linter ssa.html generate code modify from src/cmd/compile/internal/ssa/html.

null 7 May 17, 2022
Data structure and algorithm library for go, designed to provide functions similar to C++ STL

GoSTL English | 简体中文 Introduction GoSTL is a data structure and algorithm library for go, designed to provide functions similar to C++ STL, but more p

stirlingx 752 Dec 26, 2022
Go implementation of C++ STL iterators and algorithms.

iter Go implementation of C++ STL iterators and algorithms. Less hand-written loops, more expressive code. README translations: 简体中文 Motivation Althou

disksing 170 Dec 19, 2022
An yet-another red-black tree implementation, with a C++ STL-like API.

A red-black tree with an API similar to C++ STL's. INSTALLATION go get github.com/yasushi-saito/rbtree EXAMPLE More examples can be fou

Yasushi Saito 18 Apr 25, 2022
A library to read, write, and transform Stereolithography (.stl) files in Go.

stl A library to read, write, and transform Stereolithography (.stl) files in Go. It is used in the command line STL manipulation tool stltool. Featur

Hagen Schendel 65 Sep 26, 2022
A mining pool proxy tool, support BTC, ETH, ETC, XMR mining pool, etc.

Tier2Pool A mining pool proxy tool, support BTC, ETH, ETC, XMR mining pool, etc. Build I use Ubuntu as a demo. sudo update sudo apt install git make s

Tier2Pool 6 Jul 29, 2022
Golang package to manipulate time intervals.

timespan timespan is a Go library for interacting with intervals of time, defined as a start time and a duration. Documentation API Installation Insta

null 82 Sep 26, 2022
A Golang library to manipulate strings according to the word parsing rules of the UNIX Bourne shell.

shellwords A Golang library to manipulate strings according to the word parsing rules of the UNIX Bourne shell. Installation go get github.com/Wing924

Wei He 18 Sep 27, 2022
Explore Docker registries and manipulate Docker images!

L/S tags Utility and API to manipulate (analyze, synchronize and aggregate) images across different Docker registries. Example invocation $ lstags alp

Ivan Ilves 302 Nov 25, 2022
manipulate and inspect VCS repositories in Go

go-vcs - manipulate and inspect VCS repositories go-vcs is a library for manipulating and inspecting VCS repositories in Go. It currently supports Git

Sourcegraph 73 Nov 27, 2022
Handy tools to manipulate korean character.

About hangul hangul is a set of handy tools for manipulate korean character in Go language. Example package main import ( "fmt" hangu

Homin Lee 43 Oct 27, 2022
QueryCSV enables you to load CSV files and manipulate them using SQL queries then after you finish you can export the new values to a CSV file

QueryCSV enable you to load CSV files and manipulate them using SQL queries then after you finish you can export the new values to CSV file

Mohamed Shapan 100 Dec 22, 2021
Decorated Syntax Tree - manipulate Go source with perfect fidelity.

Decorated Syntax Tree The dst package enables manipulation of a Go syntax tree with high fidelity. Decorations (e.g. comments and line spacing) remain

Dave Brophy 1k Dec 29, 2022
datatable is a Go package to manipulate tabular data, like an excel spreadsheet.

datatable is a Go package to manipulate tabular data, like an excel spreadsheet. datatable is inspired by the pandas python package and the data.frame R structure. Although it's production ready, be aware that we're still working on API improvements

Datasweet 221 Nov 23, 2022
Tiny lib to manipulate the .line format (.rm in the reMarkable2) in Go

linestogo Tiny lib to manipulate the .line format

Olivier Wulveryck 3 Apr 30, 2021
manipulate WireGuard with OpenID Connect Client Initiated Backchannel Authentication(CIBA) Flow

oidc-wireguard-vpn manipulate WireGuard with OpenID Connect Client Initiated Backchannel Authentication(CIBA) Flow Requirements Linux WireGuard nftabl

Kurochan 28 Oct 7, 2022
API in Golang with CRUD to manipulate GeoJson

Golang API with Hexagonal Architecture API to manage point of navigation Object Point: - Name - Latitude - Longiture TODO ❏ Create point in Mongodb [X

Julien Bouquet 1 Dec 24, 2022