videox

package
v1.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 14, 2025 License: MIT Imports: 25 Imported by: 0

README

Annex-B performance hit

On a Raspberry Pi 5, our Annex-B encoder (the bit that adds the Emulation Prevention Byte) can encode 716 MB/s. Memcpy on this platform is 4690 MB/s. Decode is 916 MB/s.

You can use misc_test.cpp to measure the speed yourself (instructions at top of that file).

I don't have enough numbers right now to figure out the total system impact, but my gut doesn't like it. It seems plausible that one should be able to improve the speed of the encoder, but I don't know how. The alternative that I'm considering is to delay encoding to Annex-B for as long as possible - perhaps even doing it in the browser immediately before display.

If we're recording to disk, then it would be useful to avoid this penalty completely, but that precludes us from using regular video formats like mp4. On the other hand, we might want to avoid regular formats anyway.

Documentation

Index

Constants

View Source
const DebugVideoDecodeTimes = false

If true, report the decode FPS

View Source
const EnableEmulationPreventBytesEscaping = true

Topic: $ANNEXB-CONFUSION Here's the story: When we receive packets from Hikvision cameras, via github.com/bluenviron/gortsplib, the packets are supposedly NALUFormatRBSP, aka raw data bits, with no start codes, and no emulation prevention bytes. The codecs seem to want packets in SODB (aka AnnexB) encoding, so we dutifully encode the raw packets into AnnexB, with emulation prevention bytes added. HOWEVER, when we activate this code path, we get sporadic errors from ffmpeg, telling us that we've got bad frames. If we comment out the code that does the emulation prevention byte injection, then these errors go away. To be clear, we must inject the start codes. This is unambiguous. It's the emulation prevention bytes that cause errors. This confusion is the reason for this constant. At some point we'll hopefully learn more, and make better sense of this. Right now the culprit could be any one of these: 1. HikVision cameras 2. gortsplib 3. The way I'm using the h264 codec in ffmpeg 4. My SODB/Annex-B encoder 5. My understanding ------------------------ UPDATE WITH ANSWER ------------------------ I have come to the conclusion that my Hikvision cameras are sending data with emulation prevention bytes added to the byte stream, but without start codes. So this has led me to store two pieces of state with each NALU: 1. Does it have a start code? 2. How is the payload encoded? I initially thought that the presence of a start code should be synonymous with the presence of emulation prevention bytes, but I've learned that this is not the case.

Variables

View Source
var ErrNoFrame = errors.New("No frame available") // No frame available yet; Try sending more data
View Source
var ErrNoSPS = errors.New("No SPS NALU found")
View Source
var ErrResourceTemporarilyUnavailable = errors.New("Resource temporarily unavailable") // common response from avcodec_receive_frame if a frame is not available

Functions

func AnnexBWorstSize

func AnnexBWorstSize(startCodeLen, rawLen int) int

Return the worst case size of an Annex-B encoded packet, given the size of the raw packet (including a 3 byte start code).

func CodecToFsv

func CodecToFsv(codec Codec) string

func DecodeAnnexB

func DecodeAnnexB(encoded []byte) []byte

Decode an Annex-B encoded packet into a Raw Byte Sequence Payload (RBSP). We assume that you're handling the 3 or 4 byte NALU prefix outside of this function.

func DecodeAnnexBSize

func DecodeAnnexBSize(encoded []byte) int

Return the number of bytes needed to decode an Annex-B encoded packet. This function is for analysis of camera streams. In ordinary usage, we just call DecodeAnnexB().

func DecodeClosestImageInPacketList

func DecodeClosestImageInPacketList(codec Codec, packets []*VideoPacket, targetTime time.Time, cache *FrameCache, videoCacheKey string) (*cimg.Image, time.Time, error)

Decode the list of packets, and return the decoded image who's presentation time is closest to targetTime. If targetTime is zero, then we return the first image coming out of the decoder. If cache is not nil, then we will insert/query the provided cache. videoCacheKey is the key for this video. We use {videoCacheKey-PTS} as the complete cache key.

func DecodeFirstImageInPacketList

func DecodeFirstImageInPacketList(codec Codec, packets []*VideoPacket) (*cimg.Image, time.Time, error)

Decode the list of packets, and return the first image that successfully decodes

func DecodeSinglePacketToImage

func DecodeSinglePacketToImage(codec Codec, packet *VideoPacket) (*cimg.Image, error)

Creates a decoder and attempts to decode a single IDR packet. This was built for extracting a thumbnail during a long recording. Obviously this is a bit expensive, because you're creating a decoder for just a single frame.

func EncodeAnnexB

func EncodeAnnexB(raw []byte, startCodeLen int, flags AnnexBEncodeFlags) []byte

Encode an RBSP (Raw Byte Sequence Packet) into Annex-B format, optionally adding a 3 or 4 byte start code (00.00.01 or 00.00.00.01) to the beginning of the encoded byte stream. Also, we adds the "emulation prevention byte" (0x03) where necessary, if the relevant flag is set. If startCodeLen is zero, then we do not add a start code

func EncodeAnnexBInto

func EncodeAnnexBInto(raw []byte, startCodeLen int, flags AnnexBEncodeFlags, dst []byte) (encodedSize int, bufferSizeOK bool)

Encode an RBSP (Raw Byte Sequence Packet) into Annex-B format, optionally adding a 3 byte start code (00.00.01) to the beginning of the encoded byte stream. This encoding adds the "emulation prevention byte" (0x03) where necessary.

func ExtractFrame

func ExtractFrame(srcFilename string, atSecond float64, outputWidth int) ([]byte, error)

Extract a single frame from a video file and return the JPEG bytes If outputWidth is zero, then we use the same width as the input video

func ExtractVideoDuration

func ExtractVideoDuration(srcFilename string) (time.Duration, error)

Extract the duration of a video file

func FirstLikelyAnnexBEncodedIndex

func FirstLikelyAnnexBEncodedIndex(encoded []byte) int

func NALUStartCode

func NALUStartCode(length int) []byte

func NumPlanes

func NumPlanes(pixelFormat AVPixelFormat) int

func ParseBinFilename

func ParseBinFilename(filename string) (packetNumber int, naluNumber int, timeNS int64)

This is just used for debugging and testing

func ParseH264SPS

func ParseH264SPS(nalu []byte) (width, height int, err error)

Parse a raw SPS NALU (not annex-b!!!) On Rpi5, this takes 305ns for a 50 byte SPS packet, which is typical on my Hikvisions. On AMD Ryzen 9 5900X, this takes 94ns

func ParseH265SPS

func ParseH265SPS(nalu []byte) (width, height int, err error)

Parse a raw SPS NALU (not annex-b!!!)

func ReadNaluTypeH264

func ReadNaluTypeH264(firstByte byte) h264.NALUType

func ReadNaluTypeH265

func ReadNaluTypeH265(firstByte byte) h265.NALUType

func RunAppCombinedOutput

func RunAppCombinedOutput(app_name string, args []string) ([]byte, error)

app_name is an executable, such as "ffmpeg" or "ffprobe" args must not include the executable name as the first parameter Returns the string output from exec.Cmd's "CombinedOutput" method.

func TranscodeMediumQualitySeekable

func TranscodeMediumQualitySeekable(srcFilename, dstFilename string) error

Transcode the high quality video stream into a slightly lower quality stream, with keyframes every 8 frames, and with noise reduction. This is for use on our training platform, where people need to be able to seek randomly inside a video.

func TranscodeSeekable

func TranscodeSeekable(srcFilename, dstFilename string) error

Transcode a video to make it easy for a low powered mobile browser to seek to random video positions

Types

type AVPixelFormat

type AVPixelFormat int

Export some of the ffmpeg C pixel formats to Go

const (
	AVPixelFormatYUV420P AVPixelFormat = C.AV_PIX_FMT_YUV420P
	AVPixelFormatRGB24   AVPixelFormat = C.AV_PIX_FMT_RGB24
)

type AbstractNALUType

type AbstractNALUType int
const (
	AbstractNALUTypeOther         AbstractNALUType = iota // Any other NALU type
	AbstractNALUTypeEssentialMeta                         // SPS, PPS, VPS. Required before we can decode a frame.
	AbstractNALUTypeIDR                                   // Keyframe (Instantaneous Decoder Refresh)
	AbstractNALUTypeNonIDR                                // Visual frame, but not a keyframe
)

func H264ToAbstractType

func H264ToAbstractType(firstByte byte) AbstractNALUType

func H265ToAbstractType

func H265ToAbstractType(firstByte byte) AbstractNALUType

func (AbstractNALUType) IsVisual

func (t AbstractNALUType) IsVisual() bool

type AnnexBEncodeFlags

type AnnexBEncodeFlags int

Flags that control how EncodeAnnexB works

const (
	AnnexBEncodeFlagNone                        AnnexBEncodeFlags = 0 // This is nonsensical - it is simply a memcpy
	AnnexBEncodeFlagAddEmulationPreventionBytes AnnexBEncodeFlags = 1 // Add emulation prevention bytes (0x03) where necessary
)

type Codec

type Codec int
const (
	CodecUnknown Codec = iota
	CodecH264
	CodecH265
)

func ParseCodec

func ParseCodec(codec string) (Codec, error)

func ParseFsvCodec

func ParseFsvCodec(codec string) (Codec, error)

func (Codec) FourByteName added in v1.0.2

func (c Codec) FourByteName() uint32

func (Codec) InternalName

func (c Codec) InternalName() string

func (Codec) String

func (c Codec) String() string

func (Codec) ToFFmpeg

func (c Codec) ToFFmpeg() string

Return the string that FFMpeg uses to identify this codec

type Frame

type Frame struct {
	Image *accel.YUVImage // Image (might be a deep reference into ffmpeg memory)
	PTS   int64           // Presentation time in native time units. Use VideoDecoder.FrameTimeToDuration() to convert to a time.Duration
}

A decoded frame

func (*Frame) DeepClone

func (f *Frame) DeepClone() *Frame

Return a deep clone of the frame (new image memory)

type FrameCache

type FrameCache struct {
	MaxMemory  int // Maximum bytes of RAM to use
	MemoryUsed int // Current bytes of RAM used
	// contains filtered or unexported fields
}

FrameCache is used to speed up the fetching of individual frames while a user is seeking around in a video. We cache YUV images.

func NewFrameCache

func NewFrameCache(maxMemory int) *FrameCache

NewFrameCache creates a new FrameCache with the given maximum memory usage

func (*FrameCache) AddFrame

func (f *FrameCache) AddFrame(key string, frame *accel.YUVImage)

Add a frame to the cache

func (*FrameCache) GetFrame

func (f *FrameCache) GetFrame(key string) *accel.YUVImage

Return the frame or nil

func (*FrameCache) MakeKey

func (f *FrameCache) MakeKey(videoKey string, framePTSUnixMS int64) string

type MPGTSEncoder

type MPGTSEncoder struct {
	// contains filtered or unexported fields
}

MPGTSEncoder allows to encode H264 NALUs into MPEG-TS.

func NewMPEGTSEncoder

func NewMPEGTSEncoder(log logs.Log, output io.Writer, sps []byte, pps []byte) (*MPGTSEncoder, error)

NewMPEGTSEncoder allocates a mpegtsEncoder.

func (*MPGTSEncoder) Close

func (e *MPGTSEncoder) Close() error

close closes all the mpegtsEncoder resources.

func (*MPGTSEncoder) Encode

func (e *MPGTSEncoder) Encode(nalus []NALU, pts time.Duration) error

encode encodes H264 NALUs into MPEG-TS.

type NALU

type NALU struct {
	PayloadIsAnnexB  bool // True if the payload is escaped with "emulation prevention bytes", for example with 00 00 03 01 replacing 00 00 00 01
	PayloadNoEscapes bool // True if PayloadIsAnnexB BUT we know that we have no "emulation prevention bytes", so we can avoid decoding them.
	Payload          []byte
}

Codec NALU

func WrapRawNALU

func WrapRawNALU(raw []byte) NALU

Wrap a raw buffer in a NALU object. Do not clone memory, or add prefix bytes.

func (*NALU) AbstractType

func (n *NALU) AbstractType(codec Codec) AbstractNALUType

func (*NALU) AsAnnexB

func (n *NALU) AsAnnexB() NALU

Return payload data, but make sure it's in AnnexB format, and has a start code of 00.00.01 or 00.00.00.01

func (*NALU) AsRBSP

func (n *NALU) AsRBSP() NALU

Return payload data, but make sure it's in RBSP format, with no start code

func (*NALU) DeepClone

func (n *NALU) DeepClone() NALU

func (*NALU) IsAnnexBWithStartCode

func (n *NALU) IsAnnexBWithStartCode() bool

Returns true if the NALU has a start code, and the payload is encoded with emulation prevention bytes

func (*NALU) IsRBSPWithNoStartCode

func (n *NALU) IsRBSPWithNoStartCode() bool

Returns true if the NALU has no start code, and the payload is not encoded with emulation prevention bytes

func (*NALU) PayloadOnly

func (n *NALU) PayloadOnly() []byte

Returns only the payload, without any start code

func (*NALU) StartCodeLen

func (n *NALU) StartCodeLen() int

Returns length of start code Possible return values: 0: No start code 3: 00 00 01 4: 00 00 00 01 I can't recall precisely now, but I think this function covers all possible legal NALU beginnings. But I am uncomfortable now with this. We should maybe store a flag in the NALU indicating whether it has a start code or not.

func (*NALU) Type

func (n *NALU) Type(codec Codec) byte

Return the NALU type (from the first byte of the header)

func (*NALU) Type264

func (n *NALU) Type264() h264.NALUType

Return the NALU type

func (*NALU) Type265

func (n *NALU) Type265() h265.NALUType

Return the NALU type

type PacketBuffer

type PacketBuffer struct {
	Packets []*VideoPacket
}

PacketBuffer is a list of packets, with some helper functions

func ExtractFsvPackets

func ExtractFsvPackets(fsvCodec string, input []fsv.NALU) (*PacketBuffer, error)

Convert FSV packets to our VideoPacket format

func LoadBinDir

func LoadBinDir(dir string) (*PacketBuffer, error)

Opposite of RawBuffer.DumpBin NOTE: We don't attempt to inject SPS and PPS into RawBuffer, but would be trivial for H264.. just look at first byte of payload... (67 and 68 for SPS and PPS)

func (*PacketBuffer) Codec

func (r *PacketBuffer) Codec() Codec

func (*PacketBuffer) DecodeHeader

func (r *PacketBuffer) DecodeHeader() (width, height int, err error)

Decode SPS and PPS to extract header information

func (*PacketBuffer) DumpBin

func (r *PacketBuffer) DumpBin(dir string) error

Dump each NALU to a .raw file

func (*PacketBuffer) ExtractThumbnail

func (r *PacketBuffer) ExtractThumbnail() (*cimg.Image, error)

Decode the center-most keyframe This is O(1), assuming no errors or funny business like no keyframes.

func (*PacketBuffer) FindClosestPacketWallPTS

func (r *PacketBuffer) FindClosestPacketWallPTS(wallPTS time.Time, keyframeOnly bool) int

Find the packet with the WallPTS closest to the given time

func (*PacketBuffer) FindFirstIDR

func (r *PacketBuffer) FindFirstIDR() int

Returns the index of the first keyframe in the buffer, or -1 if none found

func (*PacketBuffer) FindFirstPacketOfType

func (r *PacketBuffer) FindFirstPacketOfType(ofType AbstractNALUType) int

func (*PacketBuffer) FirstNALUOfType264

func (r *PacketBuffer) FirstNALUOfType264(ofType h264.NALUType) *NALU

Returns the first NALU of the given type, or nil if none found

func (*PacketBuffer) FirstNALUOfType265

func (r *PacketBuffer) FirstNALUOfType265(ofType h265.NALUType) *NALU

Returns the first NALU of the given type, or nil if none found

func (*PacketBuffer) HasIDR

func (r *PacketBuffer) HasIDR() bool

Returns true if we have at least one keyframe in the buffer

func (*PacketBuffer) ResetPTS

func (r *PacketBuffer) ResetPTS()

Adjust all PTS values so that the first frame starts at time 0

func (*PacketBuffer) SaveToMP4

func (r *PacketBuffer) SaveToMP4(filename string) error

func (*PacketBuffer) SaveToMPEGTS

func (r *PacketBuffer) SaveToMPEGTS(log logs.Log, output io.Writer) error

Extract saved buffer into an MPEGTS stream

type VideoDecoder

type VideoDecoder struct {
	// contains filtered or unexported fields
}

VideoDecoder is a wrapper around ffmpeg, for decoding videos

func NewVideoFileDecoder

func NewVideoFileDecoder(filename string) (*VideoDecoder, error)

Create a new decoder that will decode a file

func NewVideoStreamDecoder

func NewVideoStreamDecoder(codec Codec) (*VideoDecoder, error)

Create a new decoder that you will feed with packets

func (*VideoDecoder) Close

func (d *VideoDecoder) Close()

func (*VideoDecoder) Decode

func (d *VideoDecoder) Decode(packet *VideoPacket) (*Frame, error)

Decode the packet and return a copy of the YUV image. This is used when decoding a stream (not a file).

func (*VideoDecoder) DecodeDeepRef

func (d *VideoDecoder) DecodeDeepRef(packet *VideoPacket) (*Frame, error)

WARNING: The image returned is only valid while the decoder is still alive, and it will be clobbered by the subsequent DecodeDeepRef/Decode(). The pixels in the returned image are not a garbage-collected Go slice. They point directly into the libavcodec decode buffer. That's why the function name has the "DeepRef" suffix.

func (*VideoDecoder) FrameTimeToDuration

func (d *VideoDecoder) FrameTimeToDuration(pts int64) time.Duration

Convert a native frame time to a time.Duration

func (*VideoDecoder) Height

func (d *VideoDecoder) Height() int

func (*VideoDecoder) NextFrame

func (d *VideoDecoder) NextFrame() (*Frame, error)

NextFrame reads the next frame from a file and returns a copy of the YUV image.

func (*VideoDecoder) NextFrameDeepRef

func (d *VideoDecoder) NextFrameDeepRef() (*Frame, error)

NextFrameDeepRef will read the next frame from a file and return a deep reference into the libavcodec decoded image buffer. The next call to NextFrame/NextFrameDeepRef will invalidate that image.

func (*VideoDecoder) ReceiveFrameDeepRef

func (d *VideoDecoder) ReceiveFrameDeepRef() (*Frame, error)

WARNING: The image returned is only valid while the decoder is still alive, and it will be clobbered by the subsequent DecodeDeepRef/Decode(). The pixels in the returned image are not a garbage-collected Go slice. They point directly into the libavcodec decode buffer. That's why the function name has the "DeepRef" suffix.

func (*VideoDecoder) Width

func (d *VideoDecoder) Width() int

type VideoEncoder

type VideoEncoder struct {
	InputPixelFormat AVPixelFormat
	// contains filtered or unexported fields
}

func NewVideoEncoder

func NewVideoEncoder(codec, format, filename string, width, height int, pixelFormatIn, pixelFormatOut AVPixelFormat, encoderType VideoEncoderType, fps int) (*VideoEncoder, error)

NewVideoEncoder creates a new video encoder You must Close() a video encoder when you are done using it, otherwise you will leak ffmpeg objects

func (*VideoEncoder) Close

func (v *VideoEncoder) Close()

func (*VideoEncoder) WriteImage

func (v *VideoEncoder) WriteImage(pts time.Duration, data [][]uint8, stride []int) error

Write an RGB (single plane) or YUV (3 planes) image to the encoder

func (*VideoEncoder) WriteNALU

func (v *VideoEncoder) WriteNALU(dts, pts time.Duration, nalu NALU) error

func (*VideoEncoder) WritePacket

func (v *VideoEncoder) WritePacket(dts, pts time.Duration, packet *VideoPacket) error

func (*VideoEncoder) WriteTrailer

func (v *VideoEncoder) WriteTrailer() error

type VideoEncoderType

type VideoEncoderType int
const (
	VideoEncoderTypePackets     VideoEncoderType = C.EncoderTypePackets     // Sending pre-encoded packets/NALUs to the encoder
	VideoEncoderTypeImageFrames VideoEncoderType = C.EncoderTypeImageFrames // Sending image frames to the encoder
)

type VideoPacket

type VideoPacket struct {
	NALUs       []NALU        // NALUs in the packet.
	ValidRecvID int64         // Arbitrary monotonically increasing ID of useful decoded packets. Used to detect dropped packets, or other issues like that.
	PTS         time.Duration // Raw packet PTS received from RTSP reader. Subtracted from a reference time to compute WallPTS.
	WallPTS     time.Time     // Reference wall time combined with the received PTS. We consider this the ground truth/reality of when the packet was recorded on the camera.
	Codec       Codec         // h264 or h265
	IsBacklog   bool          // a bit of a hack to inject this state here. maybe an integer counter would suffice? (eg nBacklogPackets)
}

VideoPacket is one or more NALUs that were received together. There is generally structure to this. For example, a keyframe packet from a camera will likely contain a SPS, PPS, and IDR NALU. For H265, it may also contain a VPS.

func ClonePacket

func ClonePacket(nalusIn [][]byte, codec Codec, pts time.Duration, recvTime time.Time, wallPTS time.Time, isPayloadAnnexBEncoded bool) *VideoPacket

Clone a packet of NALUs and return the cloned packet NOTE: gortsplib re-uses buffers, which is why we copy the payloads. NOTE2: I think that after upgrading gortsplib in Jan 2024, it no longer re-uses buffers, so I should revisit the requirement of our deep clone here.

func (*VideoPacket) Clone

func (p *VideoPacket) Clone() *VideoPacket

Deep clone of packet buffer

func (*VideoPacket) EncodeToAnnexBPacket

func (p *VideoPacket) EncodeToAnnexBPacket() []byte

Encode all NALUs in the packet into AnnexB format (i.e. with 00,00,01 prefix bytes, and emulation prevention bytes)

func (*VideoPacket) FirstNALUOfType264

func (p *VideoPacket) FirstNALUOfType264(t h264.NALUType) *NALU

Returns the first NALU of the given type, or nil if none exists

func (*VideoPacket) HasAbstractType

func (p *VideoPacket) HasAbstractType(t AbstractNALUType) bool

Return true if this packet has a NALU of type t inside

func (*VideoPacket) HasIDR

func (p *VideoPacket) HasIDR() bool

Returns true if this packet has a keyframe

func (*VideoPacket) IsIFrame

func (p *VideoPacket) IsIFrame() bool

Return true if this packet has one NALU which is an intermediate frame

func (*VideoPacket) PayloadBytes

func (p *VideoPacket) PayloadBytes() int

Returns the number of bytes of NALU data. If the NALUs have annex-b prefixes, then these are included in the size.

func (*VideoPacket) Summary

func (p *VideoPacket) Summary() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL