penelope's hyperdiary
Stop listening to Taylor Swift and start listening to the yt-dl source code
Posted: 2020-11-03
Tags:
youtube-dl, piracy
A couple of weeks ago, the RIAA forced GitHub to take down the code repository for youtube-dl, a Python program which downloads videos from YouTube (and many, many other sites) onto a user’s computer. Read all about it on ArsTechnica.
What the RIAA wants to stop is people ripping their overpaid stars’ (Official) music videos and audio streams from YouTube. Admittedly, this act does rob the labels - ahem - musicians of YouTube advertising revenue. So their lawyers got together and hatched a genius plan… let’s stop anyone from downloading any youtube video ever again!! that’ll show those mean pirates!!!1
Uh, no? Fuck you?
This blunt act of corporate censorship is wrong for many reasons, not least because downloading a video off YouTube is not automatically a violation of anyone’s copyright. For one thing, there are lots of videos under a Creative Commons copyright licence permitting redistribution of the content, which implies that it’s okay to make an offline copy. The Massachusetts Institute of Technology, for example, has a project called MIT OpenCourseWare, in which lecture recordings from many of its classes are uploaded to YT for anyone to watch and learn from. I suspect this kind of commitment to openness and freedom just melts everyone’s brains at RIAA. They live in a world where you must be losing if you are not wringing the maximum amount of money and control out of your creations, and their behaviour only looks more and more anachronistically bizarre as this world crumbles.
Code As Noise

Anyway, enough huffing and puffing.
Internet users have been taking action.
For example, you can now obtain the full source code for youtube-dl
from the images attached to
this tweet
by following a few commands.
This inspired me to do something similar with audio.
WARNING: TURN DOWN THE VOLUME BEFORE PLAYING THIS FILE. I’M SRS: youtube-dl.flac
If you download the file and install ffmpeg,
you can obtain the latest (2020.11.01) release of youtube-dl
as
follows, assuming you have saved it somewhere as youtube-dl.flac
.
First, run the command:
$ ffmpeg -i youtube-dl.flac -c pcm_s16le -f s16le youtube-dl-2020.11.01.1.tar.gz
This turns the FLAC-encoded audio file back into a compressed archive,
which is how the youtube-dl
project distributes the source code on
its website.
Now all you have to do is decompress the archive to get the contents:
$ tar -xf youtube-dl-2020.11.01.1.tar.gz
The code will now be in a folder called youtube-dl
.
A pre-built executable (also called youtube-dl
, wow) will be in that
folder, and you can run it in order to learn how to use it - for
morally acceptable purposes, of course:
$ cd youtube-dl
$ ./youtube-dl
Have fun.
A very basic form of steganography
Steganography refers to techniques which disguise some important
information in a context that does not suggest the presence of the
information.
The purpose of steganography is to smuggle the information past anyone who is
not expecting it to be there, but also to allow anyone who does expect something
to find it and extract it easily.
In this case, the FLAC file above appears to store sound.
You could rename it something like Analogue TV Static.flac
to boost
this impression, and sever any obvious association with source code.
Nevertheless, the code lurks within, and is accessible to those
who know the commands.
How it works
Every file is a sequence of bytes.
The youtube-dl
source code, although spread across many files,
can be compressed into a single archive file.
Thus all the information making up the source code can be treated as
one stream of bytes.
Digital audio files are also sequences of bytes. Simplifying a bit, these bytes describe a succession of values, each within the range -32,768 to 32,767, which represent the movement of the speakers required to play the audio. There is a multitude of encodings and formats for storing these numbers; one of the simplest is to store the numbers one after another, with every successive 16-bit (two byte) block representing one number. (In technical terms, I am talking about mono, 16-bit PCM.)
Consequently, we can interpret the youtube-dl
archive as 16-bit PCM,
as a series of stored 16-bit numbers,
et voilá! we have raw digital audio.
At this point, we haven’t changed the archive at all.
It is just being interpreted differently, as audio rather
than as compressed text and data.
However, raw audio is not likely to be recognised as audio when
you try to open it in a browser or from a file explorer.
For a degree of extra steganographic security, it is a good idea
to re-encode the youtube-dl
archive in a lossless audio format,
such as FLAC.
The re-encoded version will not be identical to the original;
in fact, it will be a slightly bigger file.
Nevertheless, the contents of the original (i.e. the archive)
are exactly recoverable from any losslessly compressed
version.
That’s what lossless compression means: no information is lost.
The commands from earlier in this post recover an archive file from a FLAC file. If you want to disguise a file of your own as audio, it’s very simple:
$ ffmpeg -c pcm_s16le -f s16le -i super-secret.tar.gz -c flac nothing-to-see-here.flac
The input file does not have to be a .tar.gz
archive, by the way,
but archives do allow you to wrap up as many files as you wish into
one bundle before they get the steganographic treatment.