FFmpeg APIの使い方(3): エンコードしてみる

このブログは、株式会社フィックスターズのエンジニアが、あらゆるテーマについて自由に書いているブログです。

2017年9月21日 Koji Ueno

今回は、エンコードしてみます。設定する項目が増えるので少し大変です。

まず、エンコードするフレームを用意しましょう。前々回作ったデコードプログラムを少し変更して、フレームを用意します。

前々回のデコードプログラムのmain関数を別の名前で関数化して、

void decode_all()
{
  const char* input_path = "hoge.mov";
  AVFormatContext* format_context = nullptr;

  ...（デコード処理）

}

on_frame_decodedを修正して、フレームを溜め込むようにします。

std::deque<AVFrame*> frames;

static void on_frame_decoded(AVFrame* frame) {
  AVFrame* new_ref = av_frame_alloc();
  av_frame_ref(new_ref, frame);
  frames.push_back(new_ref);
}

これで、全てのフレームを溜め込むようになりました。全て溜め込んだ後、これらのフレームをエンコードするようにします。サンプルなので、簡単のため「全フレームをデコード」→「全フレームをエンコード」という処理の流れにしますが、これだとすぐにメモリが溢れてしまうので、実際のプログラムでは、デコードしながらエンコードするようにしてください。

framesにフレームは全て集まりましたが、これだけだと、各フレームのタイムスタンプの単位が分かりません。video_streamのtime_baseも記憶するようにします。映像ストリームを探した後で、video_stream->time_baseをコピーしておけばOKです。

AVRational time_base;

void decode_all()
{
  ...
  time_base = video_stream->time_base;
  ...
}

これで、エンコードするフレームに関する必要なデータは集まりました。エンコード処理を書いていきます。

まずは、書き込むファイルを開きます。

const char* output_path = "output.mp4";
AVIOContext* io_context = nullptr;
if (avio_open(&io_context, output_path, AVIO_FLAG_WRITE) < 0) {
  printf("avio_open failed\n");
}

muxerをallocします。

AVFormatContext* format_context = nullptr;
if (avformat_alloc_output_context2(
    &format_context, nullptr, "mp4", nullptr) < 0) {
  printf("avformat_alloc_output_context2 failed\n");
}

mp4で出力したいので、3番目の引数format_nameに”mp4″を渡しています。

format_contextに先程開いた出力ファイルのio_contextをセットします。

format_context->pb = io_context;

次に、エンコーダを作っていきます。まず、コーデックを見つけてきます。

AVCodec* codec = avcodec_find_encoder(AV_CODEC_ID_H264);
if (codec == nullptr) {
  printf("encoder not found ...\n");
}

今回は、H264でエンコードしたいので、H264のコーデックを探してきました。

このコーデックでコーデックコンテキストをallocします。

AVCodecContext* codec_context = avcodec_alloc_context3(codec);
if (codec_context == nullptr) {
  printf("avcodec_alloc_context3 failed\n");
}

デコードでは、ファイルから必要なパラメータを読み込んでくれるので、プログラムからパラメータを設定する必要はなかったのですが、エンコードでは、いくつかパラメータをセットする必要があります。まず、映像のフォーマット等をcodec_contextに設定します。

// set picture properties
AVFrame* first_frame = frames[0];
codec_context->pix_fmt = (AVPixelFormat)first_frame->format;
codec_context->width = first_frame->width;
codec_context->height = first_frame->height;
codec_context->field_order = AV_FIELD_PROGRESSIVE;
codec_context->color_range = first_frame->color_range;
codec_context->color_primaries = first_frame->color_primaries;
codec_context->color_trc = first_frame->color_trc;
codec_context->colorspace = first_frame->colorspace;
codec_context->chroma_sample_location = first_frame->chroma_location;
codec_context->sample_aspect_ratio = first_frame->sample_aspect_ratio;

どのフレームの同じはずなので、最初のフレーム１枚参照して、値をセットしています。

デコードで取得したtime_baseも設定します。

// set timebase
codec_context->time_base = time_base;

フォーマットによっては必要なので以下のおまじないも書いてください。

// generate global header when the format requires it
if (format_context->oformat->flags & AVFMT_GLOBALHEADER) {
  codec_context->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
}

エンコードのパラメータは、AVDictionaryで指定します。

// make codec options
AVDictionary* codec_options = nullptr;
av_dict_set(&codec_options, "preset", "medium", 0);
av_dict_set(&codec_options, "crf", "22", 0);
av_dict_set(&codec_options, "profile", "high", 0);
av_dict_set(&codec_options, "level", "4.0", 0);

ffmpegのコマンドライン引数で”-preset medium -crf 22 -profile:v high -level 4.0″と指定したときと同じになるようにしました。

これで必要なパラメータが設定できたので、コーデックをopenします。

if (avcodec_open2(codec_context, codec_context->codec, &codec_options) != 0) {
  printf("avcodec_open2 failed\n");
}

これでエンコードはできるようになりましたが、まだ、エンコードされたH264ストリームを入れるストリームがありません。format_contextに新しくストリームを追加します。

AVStream* stream = avformat_new_stream(format_context, codec);
if (stream == NULL) {
  printf("avformat_new_stream failed");
}

codec_contextから必要なパラメータをコピーします。

stream->sample_aspect_ratio = codec_context->sample_aspect_ratio;
stream->time_base = codec_context->time_base;
if (avcodec_parameters_from_context(stream->codecpar, codec_context) < 0) {
  printf("avcodec_parameters_from_context failed");
}

time_baseはavcodec_open2で変わっているかもしれないので、codec_contextの値をコピーします。

ストリームのパラメータもセットできたので、セットップの最後にavformat_write_headerを呼び出します。

if (avformat_write_header(format_context, nullptr) < 0) {
  printf("avformat_write_header failed\n");
}

これで準備はできたので、エンコードして行きます。

while(frames.size() > 0) {
  AVFrame* frame = frames.front();
  frames.pop_front();
  int64_t pts = av_frame_get_best_effort_timestamp(frame);
  frame->pts = av_rescale_q(pts, time_base, codec_context->time_base);
  frame->key_frame = 0;
  frame->pict_type = AV_PICTURE_TYPE_NONE;
  if (avcodec_send_frame(codec_context, frame) != 0) {
    printf("avcodec_send_frame failed");
  }
  av_frame_free(&frame);
  AVPacket packet = AVPacket();
  while (avcodec_receive_packet(codec_context, &packet) == 0) {
    packet.stream_index = 0;
    av_packet_rescale_ts(&packet, codec_context->time_base, stream->time_base);
    if (av_interleaved_write_frame(format_context, &packet) != 0) {
      printf("av_interleaved_write_frame failed\n");
    }
  }
}

PTSは入力フレームのptsをそのまま使いたいところですが、time_baseが変わっているかもしれないので、av_rescale_qでtime_baseの差を反映させます。key_frameとpict_typeをリセットしていますが、そのままだとエンコーダへのヒントとして使われてしまうので、自動判定させるためにリセットしています。他にもデコーダで設定された値が使われてしまう可能性はあるので、AVFrameを作り直して、必要な値だけセットした方が良いかもしれません。avcodec_receive_packetで受け取ったパケットのタイムスタンプを、av_packet_rescale_tsで修正していますが、これも、avformat_write_headerでstream->time_baseが変更されているかもしれないので、codec_context->time_baseとの差を反映させています。また、エンコーダから受け取ったパケットにstream_indexを設定するのは、呼び出し側の仕事です。ここでは、ストリームは１つしかないので、0を設定しています。デコード時と違って、av_packet_unrefを呼び出していませんが、これはav_interleaved_write_frameがパケットの所有権を奪うので、呼び出し側では必要ありません。

フレームを全てエンコーダに渡したら、エンコーダをflushします。avcodec_send_frameにnullptrを渡せばflushになります。

// flush encoder
if (avcodec_send_frame(codec_context, nullptr) != 0) {
  printf("avcodec_send_frame failed\n");
}
AVPacket packet = AVPacket();
while (avcodec_receive_packet(codec_context, &packet) == 0) {
  packet.stream_index = 0;
  av_packet_rescale_ts(&packet, codec_context->time_base, stream->time_base);
  if (av_interleaved_write_frame(format_context, &packet) != 0) {
    printf("av_interleaved_write_frame failed\n");
  }
}

エンコードする前に、avformat_write_headerを呼び出しましたが、エンコードが終わったら、av_write_trailerを呼び出します。

if (av_write_trailer(format_context) != 0) {
  printf("av_write_trailer failed\n");
}

これでほぼ完了です。コンテキストを解放、ファイルを閉じます。

avcodec_free_context(&codec_context);
avformat_free_context(format_context);
avio_closep(&io_context);

なぜstreamのtime_baseを使うのか

デコード時にtime_baseはvideo_stream (AVStream)から取得しました。しかし、エンコード時はcodec_context (AVCodecContext) に設定しました（その後、streamに波及させてはいます）。video_streamはformat_contextの一部なので、コンテナ（mp4やmkvやmpeg2-tsなどのストリームの入れ物となるフォーマットをコンテナと言います）のパラメータです。codec_contextはエンコーダ・デコーダです。time_baseは、AVStreamにも、AVCodecContextにもあります。コンテナから取得したtime_baseをエンコーダにセットするのは不思議に思うかもしれません。なぜ、デコーダから取得しないのか？コンテナから取得したのだから、コンテナに設定すべきではないか？ということです。

これは、デコードとエンコードでの動作の違いによるものです。通常、フレームのタイムスタンプはコンテナで定義されます。なので、デコードされたフレームのタイムスタンプはコンテナのストリーム(AVStream)のtime_baseが単位になっています。codec_contextのtime_baseではありません。デコード時はcodec_contextのtime_baseは使われないのです。しかし、エンコードするときは、エンコーダがtime_baseを必要とするので、codec_contextにこれを設定するのは必須となっています。なので、デコード時にstreamから取得したタイムスタンプを、エンコード時はcodec_contextに設定するのです。

また、デコード時はコンテナのtime_baseしか存在しなかったのが、エンコード時は、エンコーダのtime_baseとコンテナのtime_baseの２つが存在することになります（しかも違う値で）。フレームやパケットをエンコーダやmuxerに流すときにタイムスタンプの変換が必要になったのはこのためです。

最後に、エンコードで使ったコード全文を貼っておきます。

#define __STDC_CONSTANT_MACROS
#define __STDC_LIMIT_MACROS
#include &lt;stdio.h&gt;
#include &lt;deque&gt;
extern "C" {
#include &lt;libavutil/imgutils.h&gt;
#include &lt;libavcodec/avcodec.h&gt;
#include &lt;libavformat/avformat.h&gt;
}
#pragma comment(lib, "avutil.lib")
#pragma comment(lib, "avcodec.lib")
#pragma comment(lib, "avformat.lib")

AVRational time_base;
std::deque<AVFrame*> frames;

static void on_frame_decoded(AVFrame* frame) {
  AVFrame* new_ref = av_frame_alloc();
  av_frame_ref(new_ref, frame);
  frames.push_back(new_ref);
}

void decode_all()
{
  const char* input_path = "hoge.mov";
  AVFormatContext* format_context = nullptr;
  if (avformat_open_input(&format_context, input_path, nullptr, nullptr) != 0) {
    printf("avformat_open_input failed\n");
  }

  if (avformat_find_stream_info(format_context, nullptr) < 0) {
    printf("avformat_find_stream_info failed\n");
  }

  AVStream* video_stream = nullptr;
  for (int i = 0; i < (int)format_context->nb_streams; ++i) {
    if (format_context->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
      video_stream = format_context->streams[i];
      break;
    }
  }
  if (video_stream == nullptr) {
    printf("No video stream ...\n");
  }

  time_base = video_stream->time_base;

  AVCodec* codec = avcodec_find_decoder(video_stream->codecpar->codec_id);
  if (codec == nullptr) {
    printf("No supported decoder ...\n");
  }

  AVCodecContext* codec_context = avcodec_alloc_context3(codec);
  if (codec_context == nullptr) {
    printf("avcodec_alloc_context3 failed\n");
  }

  if (avcodec_parameters_to_context(codec_context, video_stream->codecpar) < 0) {
    printf("avcodec_parameters_to_context failed\n");
  }

  if (avcodec_open2(codec_context, codec, nullptr) != 0) {
    printf("avcodec_open2 failed\n");
  }

  AVFrame* frame = av_frame_alloc();
  AVPacket packet = AVPacket();

  while (av_read_frame(format_context, &packet) == 0) {
    if (packet.stream_index == video_stream->index) {
      if (avcodec_send_packet(codec_context, &packet) != 0) {
        printf("avcodec_send_packet failed\n");
      }
      while (avcodec_receive_frame(codec_context, frame) == 0) {
        on_frame_decoded(frame);
      }
    }
    av_packet_unref(&packet);
  }

  // flush decoder
  if (avcodec_send_packet(codec_context, nullptr) != 0) {
    printf("avcodec_send_packet failed");
  }
  while (avcodec_receive_frame(codec_context, frame) == 0) {
    on_frame_decoded(frame);
  }

  av_frame_free(&frame);
  avcodec_free_context(&codec_context);
  avformat_close_input(&format_context);
}

int main(int argc, char* argv[])
{
  av_register_all();

  decode_all();

  const char* output_path = "output.mp4";
  AVIOContext* io_context = nullptr;
  if (avio_open(&io_context, output_path, AVIO_FLAG_WRITE) < 0) {
    printf("avio_open failed\n");
  }

  AVFormatContext* format_context = nullptr;
  if (avformat_alloc_output_context2(&format_context, nullptr, "mp4", nullptr) < 0) {
    printf("avformat_alloc_output_context2 failed\n");
  }

  format_context->pb = io_context;

  AVCodec* codec = avcodec_find_encoder(AV_CODEC_ID_H264);
  if (codec == nullptr) {
    printf("encoder not found ...\n");
  }

  AVCodecContext* codec_context = avcodec_alloc_context3(codec);
  if (codec_context == nullptr) {
    printf("avcodec_alloc_context3 failed\n");
  }

  // set picture properties
  AVFrame* first_frame = frames[0];
  codec_context->pix_fmt = (AVPixelFormat)first_frame->format;
  codec_context->width = first_frame->width;
  codec_context->height = first_frame->height;
  codec_context->field_order = AV_FIELD_PROGRESSIVE;
  codec_context->color_range = first_frame->color_range;
  codec_context->color_primaries = first_frame->color_primaries;
  codec_context->color_trc = first_frame->color_trc;
  codec_context->colorspace = first_frame->colorspace;
  codec_context->chroma_sample_location = first_frame->chroma_location;
  codec_context->sample_aspect_ratio = first_frame->sample_aspect_ratio;

  // set timebase
  codec_context->time_base = time_base;

  // generate global header when the format require it
  if (format_context->oformat->flags & AVFMT_GLOBALHEADER) {
    codec_context->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
  }

  // make codec options
  AVDictionary* codec_options = nullptr;
  av_dict_set(&codec_options, "preset", "medium", 0);
  av_dict_set(&codec_options, "crf", "22", 0);
  av_dict_set(&codec_options, "profile", "high", 0);
  av_dict_set(&codec_options, "level", "4.0", 0);

  if (avcodec_open2(codec_context, codec_context->codec, &codec_options) != 0) {
    printf("avcodec_open2 failed\n");
  }

  AVStream* stream = avformat_new_stream(format_context, codec);
  if (stream == NULL) {
    printf("avformat_new_stream failed");
  }

  stream->sample_aspect_ratio = codec_context->sample_aspect_ratio;
  stream->time_base = codec_context->time_base;

  if (avcodec_parameters_from_context(stream->codecpar, codec_context) < 0) {
    printf("avcodec_parameters_from_context failed");
  }

  if (avformat_write_header(format_context, nullptr) < 0) {
    printf("avformat_write_header failed\n");
  }

  while(frames.size() > 0) {
    AVFrame* frame = frames.front();
    frames.pop_front();
    int64_t pts = av_frame_get_best_effort_timestamp(frame);
    frame->pts = av_rescale_q(pts, time_base, codec_context->time_base);
    frame->key_frame = 0;
    frame->pict_type = AV_PICTURE_TYPE_NONE;
    if (avcodec_send_frame(codec_context, frame) != 0) {
      printf("avcodec_send_frame failed");
    }
    av_frame_free(&frame);
    AVPacket packet = AVPacket();
    while (avcodec_receive_packet(codec_context, &packet) == 0) {
      packet.stream_index = 0;
      av_packet_rescale_ts(&packet, codec_context->time_base, stream->time_base);
      if (av_interleaved_write_frame(format_context, &packet) != 0) {
        printf("av_interleaved_write_frame failed\n");
      }
    }
  }

  // flush encoder
  if (avcodec_send_frame(codec_context, nullptr) != 0) {
    printf("avcodec_send_frame failed\n");
  }
  AVPacket packet = AVPacket();
  while (avcodec_receive_packet(codec_context, &packet) == 0) {
    packet.stream_index = 0;
    av_packet_rescale_ts(&packet, codec_context->time_base, stream->time_base);
    if (av_interleaved_write_frame(format_context, &packet) != 0) {
      printf("av_interleaved_write_frame failed\n");
    }
  }

  if (av_write_trailer(format_context) != 0) {
    printf("av_write_trailer failed\n");
  }

  avcodec_free_context(&codec_context);
  avformat_free_context(format_context);
  avio_closep(&io_context);

  return 0;
}

About Author

Koji Ueno

1件のコメント

とても参考になる情報，ありがとうございます．
ところで，提示されているサンプルでは av_packet_rescale_ts() にAVPacketを喰わせる前に

packet.stream_index = 0;

のみ設定されていますが，こちらで試した感じでは，その他に

avpacket.duration = 1;

も設定しておかないと，出力されるファイルのフレームレートがおかしくなるという現象が出ました．何かのご参考になれば幸いです．

ChaoticActivity 2018年11月13日

Reply

Favorite Post

「OpenFOAMスレッド並列化のための基礎検討」を投稿＆発表してきました
2018年2月6日
FFmpeg API の使い方(1): デコードしてみる
2017年8月22日
ディリクレ過程混合モデルによるクラスタリングの振舞い方
2017年10月31日

FFmpeg APIの使い方(3): エンコードしてみる

なぜstreamのtime_baseを使うのか

Tags

About Author

Koji Ueno

1件のコメント

Leave a Comment コメントをキャンセル

Tags

Favorite Post

Archives

Categories

コンピュータビジョンセミナーvol.2 開催のお知らせ - ニュース一覧 - 株式会社フィックスターズ in Realizing Self-Driving Cars with General-Purpose Processors 日本語版

【Docker】NVIDIA SDK Managerでエラー無く環境構築する【Jetson】 | マサキノート in NVIDIA SDK Manager on Dockerで快適なJetsonライフ

Windowsカーネルドライバを自作してWinDbgで解析してみる① - かえるのほんだな in Windowsデバイスドライバの基本動作を確認する (1)

2021年版G検定チートシート | エビワークス in ニューラルネットの共通フォーマット対決！ NNEF vs ONNX

YOSHIFUJI Naoki in CUDAデバイスメモリもスマートポインタで管理したい

Social Media