Improving libossaudio, and the future of OSS in NetBSD
There's two ways user applications can communicate with the kernel audio layer in NetBSD:
- audio(4) – the native API, based on the Sun API with a number of very useful NetBSD extensions
- ossaudio(3) – a translation layer providing approximate compatibility with OSSv4's ioctls, as also supported in FreeBSD, Solaris, and popular in the past on Linux.
Linux drifted away from OSS and towards ALSA due to licensing disagreements.
Because of this drift, we're seeing increasing problems with OSS adoption today, even if the licensing concerns are no longer relevant, and other implementations of OSS have surpassed the original Linux OSSv3 implementation as far as their feature set and usability are concerned.
So, in NetBSD, it's recommended to use the native API for new code and only rely on the OSS layer for compatibility with existing code.
I spent a while working on third-party software to improve support for native NetBSD audio. These included Firefox, SDL, PortAudio, ffmpeg (working with yhardy@), and more.
However, I've turned my attention to the OSS translation layer. Since a lot of older and less popular software still relies on it, I wanted to go over the OSSv4 specification and iron out surprising differences.
Audacity/PortAudio's OSS usage is strange
I should note that most of these fixes were to enable Audacity to work without patching. Audacity is interesting because it hits a lot of edge cases as far as OSS API usage is concerned. Once I fixed the most notable issues, I made sure Audacity also supported the native API. Writing the necessary PortAudio glue for Sun/NetBSD audio and implementing these fixes took approximately two days.
Incompatibility 1 – SNDCTL_DSP_SPEED
[Out of range sample rates are now handled properly by the OSS layer in NetBSD-current.]
The NetBSD 9 kernel supports sample rates up to 192kHz. Specify anything higher, and NetBSD's audio API returns an error code and keeps the sample rate at its original value, or the legacy default of 8000 Hz (not particularly useful with modern devices).
However, OSS applications expected setting the sample rate to always succeed. The specification states that the actual set sample value may be an approximation and will not always use the exact requested value. So, if the requested value is out of range, NetBSD will now return as if the call succeeded, and set the sample rate to the current configured hardware rate (usually some multiple of 48kHz).
During its startup process, Audacity requested an overly high sample rate of 384kHz. This is well above the maximum supported. I'm still not sure why it does this, because it later configures the audio device to standard CD rate, but it meant that Audacity couldn't properly start without our local patches.
Incompatibility 2 – SNDCTL_DSP_CHANNELS
[Out of range channel numbers are now handled properly by the OSS layer in NetBSD-current.]
This was a very simple fix, similar to that of SNDCTL_DSP_SPEED
. The NetBSD
kernel supports between 1 and 12 channels for audio playback. Most commonly
1 is mono, 2 is stereo, and higher numbers are used with surround sound
systems. The limit of 12 comes from the USB audio device specification.
If an out of range number is specified, libossaudio will now set the channel count to the currently configured number in use at the hardware level.
However, we encounter a more difficult difference between OSS and NetBSD audio
when using the audio device in full duplex (recording from and playing back to
the same device simultaneously). If your mic is mono and your speakers aren't,
how do you set the channel counts to different numbers in OSS? You can't.
There is one ioctl for setting both the recording and playback channels.
In the native API, this is possible by setting info.record.channels
and
info.play.channels
separately. We should ensure that the recording channels
are always duplicated to be same as the number of playback channels.
Incompatibility 3 – SNDCTL_DSP_SETTRIGGER
[NetBSD-current now implements SNDCTL_DSP_SETTRIGGER
.]
SNDCTL_DSP_SETTRIGGER
is a somewhat more obscure part of the OSS API,
in that it's only really useful if you are using poll()
or another
event notification mechanism on an audio device before performing
any I/O, or you're performing I/O via mmap()
, neither being particularly
common in practice. It has the ability to force initialisation of
playback/recording for the device if this isn't already the case.
In terms of the native API, this means that playback/recording becomes unpaused.
Previously in NetBSD, this part of the OSS API wasn't implemented and simply did nothing. However, it became obviously needed due to an incompatible change in NetBSD 9, as discussed on tech-kern.
Basically, we needed recording to be properly triggered without
a read()
so a few applications using poll()
without prior I/O wouldn't
block indefinitely.
Incompatibility 4 – SNDCTL_DSP_SETPLAYVOL
OSSv4 has special bits to manipulate the volume of an individual stream in an application while doing all the maths for this inside the kernel.
We don't support this properly yet (but reasonably could)... so code needs to be modified to do the volume manipulation in the application, or the OSSv4 support disabled.
I've only found a couple of applications that try to use this feature (audacious, qmmp). Currently, they're configured to avoid using OSSv4 and layer the audio through SDL or qt-multimedia instead.
I've at least fixed SNDCTL_DSP_GETPLAYVOL
to return conforming values.
NetBSD audio uses 0-255 for the gain of all channels. OSSv4 uses
a range of 0-100 and encodes two channels into an integer, which
is very odd in my opinion, and also limits surround sound support.
The future of libossaudio in NetBSD?
Hopefully, after my changes, OSS compatibility is in a much better shape when dealing with unusual parameters and uncommon API usage. The quality of the code also improves – in the process of this work, maxv@ pointed me towards a related information leak in the Linux OSSv3 compatibility layer in the kernel, and I was able to deal with it properly after looking at the OSS specification and Linux headers. All the fixes should be pulled up to 9-stable.
However, I'd personally like to eventually reach a point where we no longer need libossaudio. I've been writing a lot of code towards this goal.
In many cases, the applications relying on it could be easily modified or told to use libao/SDL2/PortAudio/OpenAL/etc instead, which all have native NetBSD audio support.
OSS aside...
We probably need to start thinking about supporting 24-bit PCM in the kernel, since I've found a few audio players that can't handle making the samples 32-bit before writing them to the device. The Sun audio implementation in Solaris has supported this for a long time now.
[1 comment]
Posted by meka on April 27, 2020 at 01:49 PM UTC #