[whatwg] SRT research
philipj at opera.com
Wed Aug 25 00:28:12 PDT 2010
Here's the script used: http://pastebin.com/KhdsydzJ
Input was determined to be valid UTF-8 if text.decode('utf-8') didn't
raise an exception, same for ASCII. I haven't tried to analyze what other
encodings were used.
On Tue, 24 Aug 2010 21:47:14 +0200, Kevin Marks <kevinmarks at gmail.com>
> When you say 'invalid utf8' what were you seeing? win1252 encoding of
> accents? or illegal unicode characters like 0x80 ?
> On Tue, Aug 24, 2010 at 4:20 AM, Philip Jägenstedt
> <philipj at opera.com>wrote:
>> As mentioned deep in another thread, I've gotten hold of a big batch of
>> files and have collected some statistics, which may help inform
>> decisions on
>> the WebSRT format. Many thanks to OpenSubtitles for providing the data.
>> Philip Jägenstedt
More information about the whatwg