Niklas Haas - RSS feed

GSoC 2018 Project + Results

2018-08-21T00:00:00Z

GSoC 2018 Project + Results

by Niklas Haas on August 21, 2018

This summer I participated in GSoC, working on libplacebo and its integration into VLC.

Project Goal

The idea was to implement a new video output module (vout_placebo) based on both libplacebo and the Vulkan graphics API. The ultimate aim was rough feature compatibility with mpv’s vo_gpu renderer, upon which libplacebo is based, but the end was essentially open as far as libplacebo features were concerned.

Current State

All of the major essentials are implemented (including direct rendering), and the video output works without any major issues on setups I’ve tested. The current known limitations include:

Support for subtitles is implemented in libplacebo but still needs to be hooked up to VLC’s module. (easy)
Not all libplacebo settings are hooked up to GUI options in VLC, specifically the advanced upscaling options are still missing. (easy)
Frame interpolation / temporal mixing, an attractive vo_gpu feature, is still missing in libplacebo. (hard)
Missing performance optimizations in some code paths (e.g. plane merging for more efficient debanding / chroma upscaling)

The features I cared about most (debanding, HDR tone mapping, upscaling, dithering) are all implemented and working.

Using it

libplacebo

libplacebo, which has been developed as an independent library, can be obtained and built by following the build instructions, summarized as follows:

$ git clone https://github.com/haasn/libplacebo && cd libplacebo
$ meson build
$ ninja -Cbuild

This will build the libplacebo shared library. If you want to install it system wide, you can use ninja install (however it’s recommended to use proper system packages instead). Refer to the meson documentation for more information about how to customize e.g. the target install directory.

Make sure you have a working Vulkan loader library and driver on your system. There is currently no OpenGL support in libplacebo, nor any immediate plans of adding it.

VLC module

The VLC module I have been working on is available as a WIP branch on GitHub. It will be merged into VLC upstream once the last bits (subtitle support, missing GUI options) are added and the code has undergone a final evaluation / inspection / cleanup pass.

You can build it the same way you build VLC. To make sure vulkan and libplacebo are supported and enabled, use ./configure --enable-libplacebo --enable-vulkan when building.

To use it, simply choose the “Vulkan” video output option in the VLC settings. All of the video quality settings for customizing the use of libplacebo’s features are found in the advanced options dialog in VLC.

How to benchmark mpv's raw throughput

2017-10-05T00:00:00Z

How to benchmark mpv's raw throughput

by Niklas Haas on October 5, 2017

Tagged as: video, mpv, tips.

mpv exports pass timers which allow you to benchmark the performance in theory, but in practice these are very unreliable in multiple scenarios:

Some drivers group the timer queries into the wrong command buffer, causing the first measured pass to include the time spent waiting for the vsync.
Some drivers overlap the timer durations for jobs running in parallel, thus leading to an over-counting of the time spent on each pass.
Some drivers just flat out refuse to report timer results at all.
Some timers (and in particular, the vulkan code) outsource asynchronous commands to different queues, not all of which even support timers; leading to passes being measured as 0μs despite taking time in reality.

Instead, a more comparable way to benchmark the raw throughput of mpv is to uncap the framerate and see how fast you can push frames. The most basic way to accomplish this is with a profile like this:

[bench]
audio=no
untimed=yes
video-sync=display-desync
vulkan-swap-mode=immediate
opengl-swapinterval=0
d3d11-sync-interval=0
osd-msg1="FPS: ${estimated-display-fps}"

Disclaimer / caveats

This relies on you being able to uncap the rendering. Some systems don’t support this configuration correctly. On some systems you need to use --vulkan-swap-mode=mailbox instead. On other systems, you have no way of disabling OpenGL vsync at all; or you need to force it off in the driver. Obviously, if the measured FPS is exactly equal to your display FPS (e.g. 60 Hz), the results are invalid.
This requires your CPU to be able to decode the file as fast as you’re trying to render it. So if you’re using this with really light settings, you’d end up rendering at like 3000 fps and maxing out on the decoding speed. Obviously, such scenarios are unrealistic. This test only really makes sense when GPU rendering is the bottleneck; i.e. when you’re using heavy scalers.
The display-sync logic still applies. This means that, for example, if the video is 24 fps and your display identifies itself as 60 fps. mpv will draw one fresh frame followed by two redraws of the same frame (which are just cheap blits), specifics depending on the exact pattern needed to synchronize the two framerates. So as a result, your estimated FPS will be way higher than your GPU is actually doing work. For example, it may report 300 fps when in reality your GPU is only processing ~100 frames per second. In essence, what’s happening is that it’s measuring the number of vsyncs it can output per second - not the number of video frames it can render. To solve this, you can either use --display-fps to trick the display sync code into simulating a lower or higher display FPS,¹ or you can use --speed to make the video faster or slower. For example, to display a 24 Hz video on a 60 Hz display you can use -speed 2.5 to lock the video framerate to the display framerate.
Actually drawing the OSD can cause the performance to decrease. Although in this case, the difference shouldn’t be that big, it makes a big difference when using stats.lua, especially at high screen resolutions. So I recommend sticking to the osd-msg1, or perhaps switching to term-status-msg instead if needed.

This is actually useful if you want to see if you could, for example, upgrade from a 60 Hz monitor to a 144 Hz monitor without framedrops. If you can render with --display-fps=144 --profile=bench at 144 FPS or more, then you’re good to go. (For this type of content)↩︎

Jailing specific processes inside a VPN

2017-05-09T00:00:00Z

Jailing specific processes inside a VPN

by Niklas Haas on May 9, 2017

Tagged as: networking, linux, tips.

I’ve always wondered how difficult it would be to do something like this, so I decided to give it a try. Turns out the answer is, since the addition of UID matching to ip rule, not very difficult.

iproute2 configuration

The basic approach is to give the VPN interface a separate routing table, and redirect suspect processes to that routing table instead. Since working with numeric IDs directly is sort of a pain, you can give them friendly names:

$ cat /etc/iproute2/rt_tables
#
# reserved values
#
255	local
254	main
253	default
0	unspec
#
# local
#
1	vpn

Confining your process to a specific user

Since ip rule can only match based on UID, rather than PID (which is more stable anyway), the first step is making sure your process is running under some suitable user. For example, suppose you’re trying to isolate transmission-daemon, then the appropriate user would be transmission, which (at least on my system) transmission-daemon gets run under. If your program lacks such a convenient user, then you could always add your own and use something like sudo to switch to it, e.g.:

$ cat /etc/sudoers.d/rtorrent
joe ALL = (rtorrent) NOPASSWD: /usr/bin/rtorrent

Then user joe could use sudo -u rtorrent /usr/bin/rtorrent to run rtorrent as a separate user rtorrent.

OpenVPN configuration

The second part of the configuration is making sure to set up the correct routing table as part of OpenVPN’s initialization. For the purposes of this example, I want to ignore the VPN provider’s pushed routes (since they try overriding my system-wide routing to go through their VPN, whereas I only want it for certain processes), which the addition of route-noexec solves.

$ cat /etc/openvpn/example/openvpn.conf
...
script-security 2
route-noexec
route-up /etc/openvpn/example/route.sh
route-pre-down /etc/openvpn/example/route-down.sh

$ cat /etc/openvpn/example/route.sh
#!/bin/sh
sudo ip route add default via $route_vpn_gateway table vpn

# Confine transmission and rtorrent to this table (as an example)
for user in rtorrent transmission; do
    uid=$(id -u $user)
    sudo ip rule add uidrange $uid-$uid table vpn
done

$ cat /etc/openvpn/example/route-down.sh
#!/bin/sh
sudo ip route flush table vpn

# Delete all ip rules that mention this table
while sudo ip rule del table vpn; do :; done

The magic happens due to the ip rule invocation. Basically, it creates a rule that looks like this:

$ ip rule list
0:	from all lookup local 
32765:	from all uidrange 141-141 lookup vpn 
32766:	from all lookup main 
32767:	from all lookup default

This means that any packet originating from UID 141-141 (i.e. transmission) will get routed as according to the table vpn, which looks like this: (as an example)

$ ip route list table vpn
default via 10.128.0.1 dev tun0

`ip` and root privileges

For these scripts to work, openvpn needs to be able to execute ip commands (with root privilege). You could either accomplish this by preventing openvpn from ever dropping privileges (bad), or, as I prefer, using sudo to re-gain access to ip for the openvpn user:

$ cat /etc/sudoers.d/openvpn
openvpn ALL = (root) NOPASSWD: /bin/ip

Note that dropping privileges for OpenVPN is done by adding something like the following to your openvpn.conf:

persist-key
persist-tun
user openvpn
group openvpn

Linux configuration

It’s possible that due to the way source route verification works under Linux, you will not receive any replies directed your way (and e.g. ping as the confined user will fail). The solution to this is setting rp_filter to 2, e.g.

$ cat /etc/sysctl.d/20-disable-rp_filter.conf
net.ipv4.conf.default.rp_filter = 2
net.ipv4.conf.all.rp_filter = 2

followed by sysctl -p.

If it still doesn’t work, you may need to flush the routing cache, i.e. ip route flush cache.

Disclaimer and warning

A word on DNS

If you use a local DNS server (e.g. one pushed by your DHCP server), then DNS lookups from the confined user will fail, because there’s no appropriate route for the local DNS server. There are several solutions to this:

Use a public DNS server that’s accessible via the VPN as well.
Hard-code domains you care about to /etc/hosts.
Add an extra route for your local DNS server to the vpn table.

While #3 seems the most attractive, this is a privacy risk because DNS requests will leak your real IP! Only do this if you’re sure you know what you’re signing yourself up for.

Other sources of IP leaks

It’s possible that all your effort will be for naught and your client will find other ways of leaking your ‘real’ IP to the internet. Unless you have carefully audited and tested your specific program, do NOT take this guide as any sort of guarantee. WebRTC, torrent clients etc. have all found ways to inadvertently de-anonymize VPN users.

One website you can use for testing these sorts of things is ipleak.net, which includes support for testing torrent clients in particular. Handy if you just want to make sure your client isn’t egregiously advertising your real IP to trackers.

The Diablo III paragon system visualized

2017-02-18T00:00:00Z

The Diablo III paragon system visualized

by Niklas Haas on February 18, 2017

Tagged as: games.

Update 2017-02-18: The XP/level curve I was using in the first iteration of this post was based on a previous version of the game. The current XP curve paints a very different picture. I have updated the graphs.

Since it was bugging me, I decided to visualize some of the relationships between playtime, paragon, greater rifts and power levels.

Baseline assumptions

Since we have to use some reference point, I’ll offer my own gear. For ease of comparison, I’m going to ignore everything below paragon 800 and use that as my “starting point”.

I’m pretty decently equipped, and using only the points available to me at paragon 800 I have somewhere in the ballpark of 15,000 dex and 10,000 additional bonus armor from gear:

baseDex = 15000
baseArmor = 10000

At this gear level and no further paragon points, I can do something like 180 billion XP/hour on a good day, speed-farming GR75 or so. (solo)

baseXph = 180

In terms of progress, with this gear level, I can do something like GR 90.

baseRiftLevel = 90

Basic relationships

In order to establish some common relationships, a few basic definitions:

Main stat versus paragon level

This is pretty trivial. Each paragon level is 5 dex more.

dexPara(paraLevel) = baseDex + 5 * (paraLevel - 800)

Main stat versus damage output

Each point of dexterity increases my damage by 1%, stacking additively with itself. For simplicity, we’ll normalize it so that ‘1’ is my baseline damage.

damageDex(dex) = (1 + dex / 100) / (1 + baseDex / 100)

Main stat versus damage mitigation

Twice as much armor = Half as much damage, so we can just calculate this in terms of the baseline. Again, ‘1’ means my baseline damage mitigation.

toughnessDex(dex) = (baseArmor + dex) / (baseArmor + baseDex)
toughnessPara(paraLevel) = toughnessDex(dexPara(paraLevel))

GR level versus mob HP

Each additional GR level increases mob HP by 17%. We can use my baseline as a reference point for how much damage you need to be dealing per GR level, and scale it from there.

riftLevelDamage(damage) = baseRiftLevel + logBase 1.17 damage

Mob damage versus GR level

For each level above GR70, mobs deal 2.34% more damage. So the increase in toughness required per GR level is as follows:

incomingDamage(riftLevel) = 1.0234 ** (riftLevel - baseRiftLevel)

Derived functions

Now we’re ready to look at the first set of relationships between these curves:

Damage output versus paragon level

More paragon = more dex = more damage. Simple enough. Plug one into the other:

damagePara(paraLevel) = damageDex(dexPara(paraLevel))

GR level versus paragon level

Take the previous curve and plug it into the damage <-> GR level curve:

riftLevelPara(paraLevel) = riftLevelDamage(damagePara(paraLevel))

Incoming damage at this paragon level

Of course, at this higher GR level, we’ll also be receiving more incoming damage.

incomingDamageRaw(paraLevel) = incomingDamage(riftLevelPara(paraLevel))
incomingDamageEff(paraLevel) = incomingDamageRaw(paraLevel) / toughnessPara(paraLevel)

Even though the raw damage increase is going up, the actual effective damage (relative to how much armor we gain) is going down; meaning we actually have an easier time surviving than in the lower GR90.

Note: This means that, technically, we could swap out a 50% defensive modifier (e.g. crystal fist) for an offensive piece of gear at para 9000, and still survive. But we’ll ignore this effect for now, for the sake of moving on to more interesting things.

The time axis

All this is well and good, but my main interest lies in how all of these stats correlate with actual playtime. So first, we need to figure out how XP scaling works.

As of patch 2.4.2 (S8), the paragon curve above p800 is subdivided into two halves: There’s a p800-2250 segment, which increases linearly starting from 23 (billion) and ending at 200. After that, it increases quadratically, gaining by 102 thousand per level.

xpLevel(paraLevel)
  | paraLevel <= 2250 = lerp (800, 23) (2250, 200)
  | otherwise         = 200 + 0.229602 * bonusPara + 0.000051 * bonusPara^2

  where lerp (a,x) (b,y) = x + (paraLevel - a) / (b - a) * (y - x)
        bonusPara        = paraLevel - 2250

XP/hour versus paragon level

Obviously, we have to take into account the effects of higher paragon levels allowing you to farm more quickly. So first of all, we need to know how much XP/hour we would expect at each paragon level. To do this, let’s assume we continue farming on the same GR level, but clear the rift more quickly. (This is more or less equivalent to farming at a higher GR level but more slowly, close enough for our purposes)

xphPara(paraLevel) = baseXph * damagePara(paraLevel)

Time needed per paragon level

Here’s an interesting aside that will be useful: How many minutes does a single paragon level take?

hoursPerPara(paraLevel) = xpLevel(paraLevel) / xphPara(paraLevel)
minPerPara(paraLevel) = hoursPerPara(paraLevel) * 60

Paragon level per hour of playtime

To know, therefore, how many minutes/hours of farming time we need to reach a certain total paragon level, we can accumulate the previous curve over time:

paraHours = go 800 0 where
  go level hours
    | level > 10000 = []
    | otherwise     = (hours, level) : go (level+1) (hours + hoursPerPara(level))

To reach paragon 10,000, one has to play for about 30k hours ≈ 3-4 ingame years.

GR level per hour of playtime

Finally, since this is the result I was ultimately interested in, the GR level this translates to, as a function of the time spent grinding:

riftHours = [ (hours, riftLevelPara(paraLevel)) | (hours, paraLevel) <- paraHours ]

Summary

In summary, how much benefit you get out of the paragon system slows down over time, culminating in the point where you need to invest exponentially increasing amounts of gametime to reach the next GR level.

Might be slightly skewed towards the upper end due to the effects of decreasing incoming damage, but I don’t have a good model for that.

If there’s something I’m unsure about, it’s how your XP/hour increases as a function of your paragon level - it seems like 700b XP/hr might be over-estimating things at the high end. Nonetheless, based on figures I’m seeing from paragon ~4000 players, it seems to match the curve so far.

Falsehoods programmers believe about [video stuff]

2016-12-25T00:00:00Z

Falsehoods programmers believe about [video stuff]

by Niklas Haas on December 25, 2016

Tagged as: mpv, video.

Inspired by numerous other such lists of falsehoods. Pretty much every video player in existence gets a good chunk if not the vast majority of these wrong. (Some of these also/mostly apply to users, though)

Falsehoods programmers believe about..

.. video decoding

decoding is bit-exact, so the decoder used does not affect the quality
since H.264 decoding is bit-exact, the decoder used does not affect the quality¹
hardware decoding means I don’t have to worry about performance
hardware decoding is always faster than software decoding
a H.264 hardware decoder can decode all H.264 files
a H.264 software decoder can decode all H.264 files
video decoding is easily parallelizable

.. video playback

the display’s refresh rate will be an integer multiple of the video file’s frame rate
the display’s clock will be in sync with the audio clock
I can accurately measure the display’s clock
I can accurately measure the audio clock
I can exclusively use the audio clock for timing
I can exclusively use the video clock for timing
my hardware contexts will survive the user’s coffee break
my hardware contexts will never disappear in the middle of playback
I can always request a new hardware context after my previous one disappeared
it’s okay to error and quit if I can’t request a hardware context
hardware decoding and video playback will happen on the same device
transferring frames from one device to another is easy
the user will not notice 3:2 pulldown
the user will not notice the odd dropped or duplicated frame
all video frames will be unique
all video frames will be decoded in order
all video sources can be seeked in
the user will never want to seek to non-keyframes
seeking to a position will produce the same output as decoding to a position
I can seek to a specific frame number
videos have a fixed frame rate
all frame timestamps are precise
all frame timestamps are precise in modern formats like .mkv
all frame timestamps are monotonically increasing
all frame timestamps are monotonically increasing as long as you don’t seek
all frame timestamps are unique
the duration of the final video frame is always known
users will not notice if I skip the final video frame
users will never want to play videos in reverse
users will not notice if I skip a video frame when pausing

.. video/image files

all video files have 8-bit per channel color
all video files have 8-bit or 10-bit per channel color
fine, but at least all channels are going to have the same number of bits
all samples are going to fit into a 32-bit integer
every pixel consists of three samples
every pixel consists of three or four samples
fine, every pixel consists of n samples
all images files are sRGB
all video files are BT.601 or BT.709
all image files are either sRGB or contain an ICC profile
4:2:0 is the only way to subsample images
all image files contain correct tags indicating their color space
interlaced video files no longer exist
I can detect whether a file is interlaced or not
the chroma location is the same for every YCbCr file
all HD videos are BT.709
video files will have the same refresh rate throughout the stream
video files will have the same resolution throughout the stream
video files will have the same color space throughout the stream
video files will have the same pixel format throughout the stream
fine, videos will have the same video codec throughout the stream
the video and audio tracks will start at the same time
the video and audio tracks will both be present throughout the stream
I can start playing an audio file at the first decoded sample, and stop playing it at the last
virtual timelines can be implemented on the demuxer level
adjacent frames will have similar durations
all multimedia formats have easily identifiable headers
a file will never be a legal JPEG and MP3 at the same time
applying heuristics to guess the right filetype is easy

.. image scaling

the GPU’s built-in bilinear scaling is sufficient for everybody
bicubic scaling is sufficient for everybody
the image can just be scaled in its native color space
I should linearize before scaling
I shouldn’t linearize before scaling
upscaling is the same as downscaling
the quality of scaling algorithms can be objectively measured
the slower a scaling algorithm is to compute, the better it will be
upscaling algorithms can invent information that doesn’t exist in the image
my scaling ratio is going to be the same in the x axis and the y axis
chroma upscaling isn’t as important as luma upscaling
chroma and luma can/should be scaled separately
I can ignore sub-pixel offsets when scaling and aligning planes
I should always take sub-pixel offsets into account when scaling
images contain no information above the Nyquist frequency
images contain no information outside the TV signal range

.. color spaces

all colors are specified in (R,G,B) triples
all colors are specified in RGB or CMYK
fine, all colors are specified in RGB, CMYK, HSV, HSL, YCbCr or XYZ
there is only one RGB color space
there is only one YCbCr color space for each RGB color space
fine, there is only one YCbCr color space for each RGB color space up to linear isomorphism
an RGB triple unambiguously specifies a color
an RGB triple + primaries unambiguously specifies a color
fine, a CIE XYZ triple unambiguously specifies a color
black is RGB (0,0,0), and white is RGB (255,255,255)
all color spaces have the same white point
color spaces are defined by the RGB primaries and white point
my users are not going to notice the difference between BT.601 and BT.709
there’s only one BT.601 color space
TV range YCbCr is the same thing as TV range RGB
full-range YCbCr doesn’t exist
standards bodies can agree on what full-range YCbCr means
b-bit full range means the interval [0, 2^b-1]
a full range 8-bit color value of 255 maps to the float 1.0
color spaces are two-dimensional
“linear light” means “linear light”
information outside of the interval [0,1] should always be discarded/clamped
all gamma curves are well defined outside of the interval [0,1]
HDR encoding is about making the image brighter
HDR encoding means darker blacks

.. color conversion

I don’t need to convert an image’s colors before displaying it on the screen
all color spaces are just linearly related
there’s only one way to convert between color spaces
I can just clip out-of-gamut colors after conversion
there’s only one way to pull 10-bit colors up to 16-bit precision
linearization happens after RGB conversion
I can freely convert between color spaces as long as I allow out-of-gamut colors
converting between color spaces is a mathematical process so it doesn’t depend on the display
converting from A to B is just the inverse of converting from B to A
the OOTF is conceptually part of the OETF
the OOTF is conceptually part of the EOTF
all OOTFs are reversible
all CMMs implement color conversion correctly
all professional CMMs implement color conversion correctly
I don’t need to dither after converting if the target colorspace is the same bit depth or higher
converting between bit depths is just a logical shift
converting between bit depths is just a multiplication
all ICC profiles contain tables for conversion in both directions
HDR tone-mapping is well-defined
HDR tone-mapping is well-defined if you know the source and target display capabilities
HDR metadata will always match the video stream
you can easily convert between PQ and HLG
you can easily convert between PQ and HLG if you know the mastering display’s metadata
converting from A to linear light to B gives you the same result as converting from A to B

.. video output

the graphics API will dither my output for me
there’s only one way to dither output
I need to dither to whatever my backbuffer precision is
dithering with random noise looks good
dithering artifacts are not visible at 6-bit precision
dithering artifacts are not visible at 7-bit precision
dithering artifacts are not visible at 8-bit precision
temporal dithering is better than static dithering
OpenGL is well-supported on all operating systems
OpenGL is well-supported on any operating system
waiting until the next vsync is easy in OpenGL
video drivers correctly implement the texture formats they advertise
I can accurately measure vsync timings
vsync timings are consistent for a fixed refresh rate
all displays with the same rate will vsync at the same time
I can control the window size and position

.. displays

all displays are 60 Hz
all refresh rates are integers
all displays have a fixed refresh rate
all displays are sRGB
all displays are approximately sRGB
displays have an infinite contrast
all displays have a contrast of around 1000:1
all displays have a white point of D65
all displays have square pixels
all displays use 8-bit per channel color
all displays are PC displays
my users will provide an ICC profile for their display
my users will only use a single display
my users will only use a single display for the duration of a video
all ICC profiles for displays will have the same rendering intent
all ICC profiles for displays will be black-scaled
all ICC profiles for displays won’t be black-scaled

.. subtitles

all subtitle files are UTF-8 encoded
all subtitles are stored/rendered as RGB
I can paint RGB subtitles on top of my RGB video files
I don’t need to worry about color management for subtitles
the subtitle color space will be the same as the video color space
rendering subtitles at the output resolution is always better than rendering them at the video resolution
there’s an ASS specification

It seems a lot of people have misunderstood this one, so let me clarify what I mean: Of course, H.264 decoders (assuming no bugs) will output the same result, but the problem in practice is that you have no guarantee you’ll actually be able to access the decoder outputs unmodified, because APIs like DXVA/DXVA2, D3D11VA (through ANGLE), CrystalHD, VAAPI through GLX and VDPAU (unless you use a terrible interlaced-only hack) will further post process the results before you can access them, either by converting to RGB, changing the subsampling or rounding down 10-bit content down to 8-bit.

There are some APIs which are inherently safe, though, although it usually requires copying back to system RAM instead of exposing it as an on-GPU texture, so you gain extra round-trip bandwidth losses (bidirectional, instead of the one-directional cost you have to pay for swdec). The only exceptions I can think of right now are VAAPI EGL interop and CUDA.↩︎

FFmpeg HEVC decoding benchmarks

2016-11-08T00:00:00Z

FFmpeg HEVC decoding benchmarks

by Niklas Haas on November 8, 2016

Tagged as: mpv, ffmpeg, benchmarks.

Since HEVC software decoding is still very much relevant (especially as hardware decoding chips are both scarce and limited), I decided to compile a few of the benchmark numbers I’ve gotten in the past into a set of graphs.

Performance boost from OpenHEVC intrinsics

These patches still made a very big difference on current git master. Test was done using ffmpeg version N-82299-g0a24587 and this patchset

Interestingly enough, the intra pred SIMD basically made no difference at all, even making the result slightly slower, but the IDCT still helped a lot. Looking at the code, I can’t find an obvious explanation for this - the HEVC intra pred in FFmpeg is still very much C. Perhaps the compiler just does a good job of optimizing here, or perhaps the OpenHEVC intrinsics are just bad.¹

Either way, seems like it’s best to keep this patch off. I have adjusted my own FFmpeg patches accordingly.

Time to decode vs. number of threads

In case you’re crazy enough to buy a 16-core machine for video processing, you’re not going to get great results out of software decoding after the first few cores.

Tests were done using ffmpeg version N-82215-g3932ccc with both OpenHEVC intrinsics patches applied.

BBB from the #ffmpeg-devel IRC offers this explanation:

2016-12-31 16:17:29	@BBB	haasn: the reason intra pred simd doesn’t help is b/c most intra pred is dc, and dc is very trivial in c or simd
2016-12-31 16:17:39	@BBB	haasn: runtime of dc C vs. idct C is like 1:10 or so
2016-12-31 16:17:57	@BBB	haasn: directional intra pred is more complicated, but runs sparsely
2016-12-31 16:18:22	@BBB	(from my memory)
2016-12-31 16:52:10	haasn	BBB: I did a `perf` and the most time was spent in that hevc_cabac function
2016-12-31 16:52:11	haasn	or w/e
2016-12-31 16:55:49	@BBB	haasn: yeah that’s residual decoding, that is normal
2016-12-31 16:55:57	@BBB	haasn: unfortunately not really simd'able
2016-12-31 16:56:03	@BBB	I guess you can try simd’ing the bypass function
2016-12-31 16:56:06	@BBB	but that’s hard
2016-12-31 16:56:10	@BBB	and I Don’t care about hevc ;)

↩︎

How to watch a live stream from the beginning in mpv

2016-11-04T00:00:00Z

How to watch a live stream from the beginning in mpv

by Niklas Haas on November 4, 2016

Tagged as: mpv, tips.

If you try watching a video recording on twitch etc. while it’s still ‘live’, mpv/youtube-dl will play the live video instead of starting from the beginning.

The fix is straightforward: By appending ?t=0m you can force it to start at the beginning:

mpv 'https://www.twitch.tv/example/v/12345?t=0'

This is also useful for a second purpose: seeking. Normally, by trying to seek a live stream like this in mpv you will end up buffering and downloading forever. (I’m not exactly sure what’s going on since the mpv cache is so opaque, but I have to imagine it’s actually trying to download all the data you skipped past)

By changing it to e.g. ?t=20m you can seek to 20 minutes in the stream.

So apparently this is a blog now

2016-11-03T00:00:00Z

So apparently this is a blog now

by Niklas Haas on November 3, 2016

Tagged as: meta, personal.

I just copied the template and code from this guy’s blog because it was the first thing in this list of Hakyll examples that looked reasonably nice (and I was too lazy to look at more).

Why?

Because I keep spending time looking up things I’ve already documented on IRC, usually in the form of “something broke, let me document how to fix it and hope I find it again in the future”.

Hopefully doing it here should make it somewhat easier for me to find my own fixes again.

Why Hakyll?

Why not?

Niklas Haas - RSS feed

GSoC 2018 Project + Results

Project Goal

Current State

Using it

libplacebo

VLC module

How to benchmark mpv's raw throughput

Disclaimer / caveats

Jailing specific processes inside a VPN

iproute2 configuration

Confining your process to a specific user

OpenVPN configuration

ip and root privileges

Linux configuration

Disclaimer and warning

A word on DNS

Other sources of IP leaks

The Diablo III paragon system visualized

Baseline assumptions

Basic relationships

Main stat versus paragon level

Main stat versus damage output

Main stat versus damage mitigation

GR level versus mob HP

Mob damage versus GR level

Derived functions

Damage output versus paragon level

GR level versus paragon level

Incoming damage at this paragon level

The time axis

XP/hour versus paragon level

Time needed per paragon level

Paragon level per hour of playtime

GR level per hour of playtime

Summary

Falsehoods programmers believe about [video stuff]

Falsehoods programmers believe about..

.. video decoding

.. video playback

.. video/image files

.. image scaling

.. color spaces

.. color conversion

.. video output

.. displays

.. subtitles

FFmpeg HEVC decoding benchmarks

Performance boost from OpenHEVC intrinsics

Time to decode vs. number of threads

How to watch a live stream from the beginning in mpv

So apparently this is a blog now

Why?

Why Hakyll?

`ip` and root privileges