Yt-dlp questions in relation to documentation and Whonix 18

Hello,

I am new to Whonix-Virtualbox 18 from Whonix-Virtualbox 17, and I am trying to learn the new system.

One of my main use cases for Whonix-Virtualbox is to watch and download youtube videos privately. The documentation at http://www.dds6qkxpwdeubwucdiaord2xgbbeyds25rbsgr73tbfpqpt4a6vjwsyd.onion/wiki/Yt-dlp recommends the following procedure.

sudo apt -t trixie-backports install yt-dlp

Instead of doing that, in Whonix-Virtualbox 17 I used python3-pip. This was because even the backport could take a few days to update, and YouTube could break things quickly due to their changes. Pip allowed me to get a more up-to-date version quicker. But in Whonix-Virtualbox 18, I am open to trying the trixie-backports method again.

But I am a little confused on how I should proceed from here, since the documentation does not address recent changes to yt-dlp. Now, in order to use this program with youtube, “you’ll need an external JS engine such as NodeJS for YouTube” - Debian -- Details of package yt-dlp in trixie-backports . This brings up several questions. First, is it safe to do this, or will the javascript challenges fingerprint me somehow?

Second, yt-dlp recommends that I use deno for the challenges ( EJS · yt-dlp/yt-dlp Wiki · GitHub ). But deno is not available from the debian trixie repository. Should I try using nodejs instead, or should I follow the method at Installation instead to download deno?

https://docs.deno.com/runtime/getting_started/installation/

Third, I am looking at Debian -- Details of package yt-dlp in trixie-backports , and I do not see any mention of yt-dlp-ejs being bundled in with the debian backport for yt-dlp. How do I see if it has been bundled in or not? By contrast, pip bundles yt-dlp-ejs in through the package yt-dlp[default].

Fourth, if I use pip, against the advice of http://www.w5j6stm77zs6652pgsij4awcjeel3eco7kvipheu6mtr623eyyehj4yd.onion/wiki/Install_Software#Best_Practices, should I install and upgrade packages from the sysmaint session or the user session?

Fifth, I notice that systemcheck flags python3-pip as an unwanted package. But systemcheck does not flag pipx as an unwanted package. Is there a reason for this?

1 Like

Valid questions. Something that started simple and was easy to document, use just turned into something seemingly very complex and time consuming.

Therefore this can longer be supported. Hence, unsupported.

Wiki page updated to add this notice just now.

Related search terms:

nodejs dependency risk

python pip dependency risk

pipx: I haven’t looked into it but if it if it internally uses or is similar to pip then it could have the same issues.

Would require research to answer that.

Since NodeJS runs locally, one of the worst risks coming to mind is:

2 Likes

Thank you for your help.

Yes, pipx is essentially pip, except it also automates the creation of python virtual environments.

The pipx project combines the functionality of both venv and pip. It may be necessary to install it first, either with a system package manager, or using pip, as detailed in the documentation.

From Installation - Streamlink 8.1.2 documentation

So it makes sense to list pipx as an undesired package.

2 Likes

yt-dlp recommends using deno instead of nodejs. According to EJS · yt-dlp/yt-dlp Wiki · GitHub

Code is run with restricted permissions (e.g, no file system or network access)

According to https://deno.com/

Deno is the open-source JavaScript runtime for the modern web.

A program run with Deno has no file, network, or environment access unless explicitly enabled.

So it is sandboxed, at least. I would guess that this would make it harder to fingerprint a VM.

However, debian does not package deno. Therefore if you want to use it, you have to install it via the instructions at Installation instead.

curl -fsSL https://deno.land/install.sh | sh
1 Like

FWIW, it’s not necessary to install Deno all the way into the system to use it. One can simply download the zipped binary, extract it, and point yt-dlp to it.

2 Likes

I have been noticing that even with using deno, it is still significantly harder to download videos with yt-dlp. Usually, I will get one of two errors.

ERROR: [youtube]: Sign in to confirm you’re not a bot. Use --cookies-from-browser or --cookies for the authentication. See  https://github.com/yt-dlp/yt-dlp/wiki/FAQ#how-do-i-pass-cookies-to-yt-dlp  for how to manually pass cookies. Also see  https://github.com/yt-dlp/yt-dlp/wiki/Extractors#exporting-youtube-cookies  for tips on effectively exporting YouTube cookie
ERROR: unable to download video data: HTTP Error 403: Forbidden

This is despite me manually using “tor-ctrl signal NEWNYM” to change my circuit repeatedly until I can get an exit node. Using “tor-ctrl signal NEWNYM” has always been necessary, but now it takes much longer to get an exit node which Google allows downloads from.

Sign in to confirm you’re not a bot.

But signing into youtube would be bad for privacy. Even if the account was anonymously created, opening up the browser to get cookies to pass to yt-dlp would subject the user to browser fingerprinting via javascript. I do not even know if you can make a google account without giving a phone number nowadays.

Suggestions are welcome. As things are, people might be better served by taking their chances with a VPN.

1 Like

A forked repository of Cobalt committed last week using ytdlp-nodejs instead of youtubei.js to circumvent YouTube’s various restrictions:

To avoid VM fingerprinting, you can either wait for this commit to potentially be pulled into the main instance, or you can externally self-host your own Cobalt instance (without a static web front-end) as a processing server using this forked repository, then refer to your custom processing server from the main instance’s settings.

I am confused about what exactly cobalt tools is. Is this like an invidious instance?

Regardless, it does not work for me with javascript disabled. And I disable javascript.

I do not know how self-hosting would help. Or how that would be any different than using yt-dlp within whonix.

Someone informed me about https://invidious.nerdvpn.de/. This particular instance sometimes lets me through their “Gandalf” challenge even though I am tor with javascript disabled. And they allow users to download videos as of the time of writing.

However, the only combined video+audio file that they offer is 360p. If you want higher quality, you need to download a video-only stream and an audio-only stream. Then you would have to combine them manually through the use of ffmpeg. I am not a developer, and I do not know how to do this yet. I would need to research it if I made this my primary way of getting videos.

With the help of an AI bullshit generator, I was able to put together a bash script to automatically cycle through “tor-ctrl signal NEWNYM” and yt-dlp until all videos have concluded downloading. This can take hours, but it is automated.

I had just gotten it working, and then youtube decided to change their system. They are now using AI to auto-dub videos and combining them into one video+audio file instead of two separate video and audio streams. When I tried to use --extractor-args “youtube:lang=en” in my script, this did not seem to have any effect. And that lead me to download videos with the wrong language.

Also, the new videos would use the m3u8 extraction method instead of https. My experience with this is that downloading with m3u8 instead of https can lead to janky videos sometimes, at least if there are problems or interruptions with the downloads. And my script from the bullshit generator does not restart the process if I have an error with downloading a m3u8 fragment. If I do not pass --fragment-retries infinite, yt-dlp will skip a fragment and move onto the next one, which is obviously not ideal. So I make it retry an infinite number of times, but since it is not changing the tor circuit, it can take a very long time.

Now though, youtube seems to be offering the old video/audio separated formats again, in addition to the AI translated formats, at least for now.

If people are interested, I will modify the script for general use and post it here. Right now though it assumes that the user is using pip, against the advice of the project. And it assumes that the user is using deno. Maybe I should modify it first to remove those assumptions?

I also created versions of the script to use for odysee and peertube respectively. I may have to rely more on those platforms in the future, as youtube creeps more and more towards requiring an id check or something to watch a video. I cannot do it for rumble, because rumble is cloudflared, and rumble requires me to pass cloudflare cookies into yt-dlp to download a video. I do not consider cloudflare to be trustworthy, because of readme/en.md · master · Crimeflare / deCloudflare · GitLab.