HACKER Q&A
📣 rschachte

How can I learn about video encoding, h.264, ffmpeg, etc.


I'm looking to learn more about video encoding, compression, decoding, transcoding, ffmpeg, codecs, etc.

This is outside of my typical work, but I'm starting a new job in this realm shortly and I'm very interested in learning more about it for fun. Anyone have good books or resources for this? Thanks.


  👤 pertique Accepted Answer ✓
A good high-level breakdown of H.264: https://sidbala.com/h-264-is-magic/

Associated HN post, although there have been a few: https://news.ycombinator.com/item?id=30710574

More technical look at video codecs in general with sources: https://github.com/leandromoreira/digital_video_introduction...


👤 joeld42
It's pretty fun to try writing a toy codec. It's not too hard if you don't have to worry about all the compatibility and generality stuff. Try to make something that outperforms the standard but only on a specific video. :)

Also I learned a lot by messing with the ffmpeg source, it's pretty readable.


👤 chrisp_how
All video can be compressed by treating it as a verbatim, monosyllable sentence, or essay. It is about why a frame is a single-unit and not combined with other frames, though syllabic sounds between words can be shortened—I’d advise against it.

Sound is a “channel-pitch” stream, which relates a single-, or more different streams through connected pitches. You don’t need any programming to practice! Use a pen and ecofriendly paper: “record” your voice by “strict, composition sheet-music”: A C A Aflat etc. Recompose, rehash, and other operations can turn the sound into all known formats and related sounds!

For video, try to manipulate a single distilled image, use a point-by-point display for convenience—again, on paper. Try to split the image in 2, find which way is about equal, and develop a step-by-step algorithm: this is the most advanced ratio-encoding method. Try to convect an image with another image, which is to blend them into a single square of space, without interfering the images: one is a “C”, and the other an “o”, for example. This is the most useful single-display format, which is how avimpeg-2, hma, ncoa—which is an old vhs format— all were combining different frames. Good luck!



👤 boneitis
If it brings you any more motivational drive, (as someone who has no background in signal processing) it absolutely blows my mind that technology like Shazam works, and I've always wanted to figure it out.

One of my backburner tasks that will probably remain forever is to get a grip on DSP, obtain General/Amateur ham radio license, mess around with it, yadda yadda yadda.

I've always meant to get around to digesting this article:

https://news.ycombinator.com/item?id=9870408 ("How Shazam works")


👤 davidhyde
I would begin with the basics of JPEG compression followed by audio compression and then, finally video compression. They are all related in more ways than you think. The 3Blue1Brown YouTube channel has a couple of gems on digital signal processing like how fast Fourier transforms work and everything in between. This is a good modern book on the subject: “Digital Signal Processing in Modern Communication Systems 2nd Edition” released in 2021 by Andreas Schwarzinger.

👤 treman
These resources helped me get started, although somewhat dated though.

https://github.com/leandromoreira/ffmpeg-libav-tutorial

http://dranger.com/ffmpeg/


👤 pipeline_peak
Learn image encoding, put it in a for loop set to 60 * n seconds AHAHA! pours glass of wine