I want to learn more about how video codecs work so someday I can grow up to be a Jedi-master Video Engineer. To aid me on this epic quest, I'm doing the one week "mini-batch" at the Recurse Center. Here's my notes from Day 4.

Late Night Brainstorming

So, on my walk home last night, I had a whole flood of new ideas for this project come into my head. Here's a quick overview of some of the most interesting ones:

  • Make an interactively GDB-style repl for exploring the elements in a Matroska container - I could almost definitely do this in a few hours
  • Compile to WebAssembly and create a client-side Matroska parser that can:
    • present a tree explorer using [+] and [-] buttons, like a bit like Finder
    • create a visualization of frame size along a timeline, possibly passing the same data to a video element and seeking to the frame on hover
    • decode vpx frames, analyze frame data and visualize them and show them in a canvas
  • Some mashup of these things

Setting Intentions for the Day

Okay, so I'm committing to digging into decoding a frame. I feel like if I'm willing to properly ask for help, so many more interesting things will be unlocked for me. Plus...I've told literally everyone that's what I want to do! After I finish that (and maybe only that today), then I'll try compiling to WebAssembly and doing fun things in the browser.

A few duckduckduckgo's later, I find this super helpful looking header file in bitstream guide. Hmm, after reading that it still seems like I want to be looking at simple_decoder.c to grok libvpx and perhaps just use the header file as a bit of documentation about the data structures.

My goal is to simply rewrite simple_decoder.c so that I can just feed it a keyframe (I'll figure out how to get just one keyframe later on).

Rewriting simple_decoder.c in C

Right now, I'm totally buggered by all the video_reader_* stuff coming out of video_reader.c. Rewriting needs to entirely replace that code.

0 - Can I recompile it?

Since this is stored in git, I can just edit that file. Let's see make sure really do know how to recompile this file. I'll add a gratuitious print statement at the beginning and just try to run that. I add a die("Hello!") as the first line in main(), and then recompile it like this:

$ cd build
$ make
make[1]: Nothing to be done for `all'.
    [DEP] examples/simple_decoder.c.d
    [CC] examples/simple_decoder.c.o
    [LD] examples/simple_decoder
make[1]: Nothing to be done for `all'.
make[1]: Nothing to be done for `all'.

Great! Make was smart enough to only recompiled simple_decoder for me. Fantastic! Let's see it work:

$ ./examples/simple_decoder
Hello!
Usage: (null) <infile> <outfile>

That's odd -- well, my code clearly ran, but the program didn't run the way I expected at all. I thought that die() would actually stop execution, but we still saw the usage info code being run, so that just proves I don't know C.

1 - Figure out video_reader is doing

The main loop of the program looks like this:

  while (vpx_video_reader_read_frame(reader)) {
    vpx_codec_iter_t iter = NULL;
    vpx_image_t *img = NULL;
    size_t frame_size = 0;
    const unsigned char *frame =
        vpx_video_reader_get_frame(reader, &frame_size);
    if (vpx_codec_decode(&codec, frame, (unsigned int)frame_size, NULL, 0))
      die_codec(&codec, "Failed to decode frame.");
    while ((img = vpx_codec_get_frame(&codec, &iter)) != NULL) {
      vpx_img_write(img, outfile);
      ++frame_cnt;
    }
  }

There's two calls to video_reader_* things in this part:

  1. while (vpx_video_reader_read_frame(reader)) { - just from squinting at this code, it looks to me like the reader contains all the state, and vpx_video_reader_read_frame must be returning false or NULL whenever reader has reached the end, possibly returning something truthy before then.
  2. const unsigned char *frame = vpx_video_reader_get_frame(reader, &frame_size); - a few observations here:
  • frame_size is initialized as 0, but apparently is nonzero later on. So, it must be passed by reference so that vpx_video_reader_get_frame can mutate it...presumably because multiple return values is hard in C
  • frame holds all the coded data

So, theoretically, if I could hardcode the values of frame_size and frame, I could remove all the video_reader_* things. Great! Let's try printing those things out in the shell on the first iteration, crash the program. So the main loop becomes:

  while (vpx_video_reader_read_frame(reader)) {
    vpx_codec_iter_t iter = NULL;
    vpx_image_t *img = NULL;
    size_t frame_size = 0;
    const unsigned char *frame =
        vpx_video_reader_get_frame(reader, &frame_size);

    printf("frame_size=%d", frame_size);  // THIS IS PRINT STATEMENT!

    if (vpx_codec_decode(&codec, frame, (unsigned int)frame_size, NULL, 0))
      die_codec(&codec, "Failed to decode frame.");
    while ((img = vpx_codec_get_frame(&codec, &iter)) != NULL) {
      vpx_img_write(img, outfile);
      ++frame_cnt;
    }
  }

Now the compiler complains that my format string is wrong...and has helpfully suggested replacing %d with %zu (whatever that means!). Great! Now I can frame_size being printed. Let's try that again for the frame variable too...apparently the right format string is %s (I actually know that one from golang and python!). Also, I can't imagine that string is the right way to visualize encoded binary data...and it's not. I see nasty unicode things in the terminal (but it did compile).

Let's actually stop and learn about C-style format strings. A quick skim through man printf shows that there's a %b format which is like %s but escapes differently (including octal) -- but that doesn't produce the output I want. There's also a %X for octal but it only prints the first octet, not the entirety of frame. No idea how to do this, time to duckduckgo how to print out binary data from C in printf. So, it looks like we can iterate over frame and print each value as %x:

printf("\nframe_size=%zu | frame=\n", frame_size);
for (int i = 0; i < frame_size; i++) {
    printf("%02x", frame[i]);
}

Amazing! So I wind up with things like this as output:

frame_size=1650 | frame= 114d001d10e4147b8c4fd3c78f2713ff7daac8....

Now, I need to figure out how I would actually create a value I would copy and paste into a C program as valid syntax...Maybe I'll ask for help from RC after lunch.

Man, I'm dragging! Okay, well let's just try writing a frame to a file as binary. Phew -- how do I write binary to a file in C? Well I know that this example program already does that, writing each frame as "image" into the outfile argument. So, tracing through the code I find this call to fwrite:

    const unsigned char *buf = img->planes[plane];
    const int stride = img->stride[plane];
    const int w = vpx_img_plane_width(img, plane) *
                  ((img->fmt & VPX_IMG_FMT_HIGHBITDEPTH) ? 2 : 1);
    const int h = vpx_img_plane_height(img, plane);
    int y;
    for (y = 0; y < h; ++y) {
      fwrite(buf, 1, w, file);
      buf += stride;
    }

Omg, that's far too many parameters for a write function. What do all these mean? Ahhh, I'm in C, so man fwrite (redacted below) is my friend:

NAME

 fread, fwrite -- binary stream input/output

SYNOPSIS

 #include <stdio.h>
 size_t
 fwrite(const void *restrict ptr, size_t size, size_t nitems, FILE *restrict stream);

DESCRIPTION

 The function fwrite() writes nitems objects, each size bytes long, to the stream pointed to by stream, obtaining them
 from the location given by ptr.

RETURN VALUES

 The function fread() does not distinguish between end-of-file and error; callers must use feof(3) and ferror(3) to
 determine which occurred.  The function fwrite() returns a value less than nitems only if a write error has occurred.

Shockingly helpful, that. So going back to my fwrite example:

fwrite(buf, 1, w, file);
  • buf - this is a pointer that I'm going to copy from
  • 1 - is the size in bytes of each "item" I'm going to take take from buf
  • w - is the number of "items" to read from buf
  • file - is the destination where all those items are being written to

Not so scary now. Let's go back and try to write my frame out to a file:

fwrite(
  frame,      // (replaces buf) this is the raw frame data I want to store in a file
  1,          // (stays the same)
  frame_size, // (replaces w) this is the number of bytes in frame, all of which I want to write
);

Okay, so I've now rewritten my inner loop to be a conditional (so that it only runs once) that looks like this now:

  if (vpx_video_reader_read_frame(reader)) {
    vpx_codec_iter_t iter = NULL;
    vpx_image_t *img = NULL;
    size_t frame_size = 0;
    const unsigned char *frame =
        vpx_video_reader_get_frame(reader, &frame_size);

    printf("\nframe_size=%zu\n", frame_size);
    fwrite(frame, 1, frame_size, outfile);
  }

Compile and run against my IVF file and it appears to have worked! Here's a screenshot of the outfile inside Hex Fiend.

Hex Fiend of Raw VP8 frame

You can see that the number of bytes from my hacked-up C program and Hex Fiend are both 34974 bytes.

Pair Programming Iterlude

My fellow Recurser, Andy, offered to pair with me on figuring out all this and we were able to get a simplified C version of simple_decoder working:


int main(int argc, char **argv) {
  int frame_cnt = 0;
  FILE *infile = NULL;
  vpx_codec_ctx_t codec;
  const VpxInterface *decoder = NULL;

  if (argc != 2) die("Invalid number of arguments.");

  if (!(infile = fopen(argv[1], "rb")))
    die("fread: Failed to open %s for reading", argv[2]);

  decoder = get_vpx_decoder_by_fourcc(0x30385056); //info->codec_fourcc);
  if (!decoder) die("Unknown input codec.");

  if (vpx_codec_dec_init(&codec, decoder->codec_interface(), NULL, 0))
    die_codec(&codec, "Failed to initialize decoder.");

  vpx_codec_iter_t iter = NULL;
  vpx_image_t *img = NULL;
  size_t frame_size = 34974;
  const unsigned char *frame = malloc(frame_size);

  if(fread(frame, 1, frame_size, infile) != frame_size) die("Nope");

  if (vpx_codec_decode(&codec, frame, (unsigned int)frame_size, NULL, 0))
    die_codec(&codec, "Failed to decode frame.");

  while ((img = vpx_codec_get_frame(&codec, &iter)) != NULL) {
    ++frame_cnt;
  }

  printf("Processed %d frames.\n", frame_cnt);
  if (vpx_codec_destroy(&codec)) die_codec(&codec, "Failed to destroy codec");

  fclose(infile);
  return EXIT_SUCCESS;
}

This program basically just reads in the raw binary from a previous file (infile) and has various configuration settings hardcoded for the decoder to be initialized and the frame size.

It still doesn't work

Unfortunately, the libvpx_native_sys doesn't provide many of structs and things I need even to get this simplified simple_decoder to work.... I give up for today.