Tech News
← Back to articles

Fast CVVDP implementation in C

read original related products more articles

fcvvdp

A fast C implementation of the CVVDP metric (arXiv) from the University of Cambridge. More information about how CVVDP works according to this implementation is provided here.

Benchmarks

Benchmarked using poop on Linux, Core i7 13700k. Note that fcvvdp runs with one CPU thread here while cvvdp uses multiple threads. This is a current limitation of fcvvdp, which does not yet support multithreading.

poop "cvvdp -r fm360p.y4m -t fm360p_x264.y4m --display standard_fhd" "./fcvvdp -m fhd fm360p.y4m fm360p_x264.y4m" Benchmark 1 (3 runs): cvvdp -r fm360p.y4m -t fm360p_x264.y4m --display standard_fhd measurement mean ± σ min … max outliers delta wall_time 19.6s ± 568ms 19.2s … 20.2s 0 ( 0%) 0% peak_rss 1.00GB ± 28.1MB 979MB … 1.03GB 0 ( 0%) 0% cpu_cycles 747G ± 8.54G 741G … 757G 0 ( 0%) 0% instructions 362G ± 1.20G 361G … 363G 0 ( 0%) 0% cache_references 2.77G ± 46.9M 2.71G … 2.81G 0 ( 0%) 0% cache_misses 899M ± 11.7M 890M … 912M 0 ( 0%) 0% branch_misses 107M ± 1.80M 105M … 109M 0 ( 0%) 0% Benchmark 2 (3 runs): ./fcvvdp -m fhd fm360p.y4m fm360p_x264.y4m measurement mean ± σ min … max outliers delta wall_time 16.1s ± 56.2ms 16.0s … 16.1s 0 ( 0%) ⚡- 17.9% ± 4.7% peak_rss 86.7MB ± 109KB 86.6MB … 86.8MB 0 ( 0%) ⚡- 91.4% ± 4.5% cpu_cycles 82.8G ± 80.9M 82.8G … 82.9G 0 ( 0%) ⚡- 88.9% ± 1.8% instructions 255G ± 30.0M 255G … 255G 0 ( 0%) ⚡- 29.6% ± 0.5% cache_references 1.49G ± 6.43M 1.49G … 1.50G 0 ( 0%) ⚡- 46.1% ± 2.7% cache_misses 369M ± 2.84M 365M … 371M 0 ( 0%) ⚡- 59.0% ± 2.2% branch_misses 8.50M ± 62.3K 8.45M … 8.57M 0 ( 0%) ⚡- 92.1% ± 2.7%

fcvvdp uses 91% less RAM, 88% fewer CPU cycles, and is almost 18% faster in terms of wall clock time. In terms of user time, fcvvdp is ~15x more efficient.

Usage

Compilation requires:

zlib-rs

libunwind

... continue reading