As soon I run it in AWS I get an error from Youtube that says "Sign in to confirm you’re not a bot". I'm guessing Youtube has blocked the IP's from AWS and forcing this, but there are a number of services I've seen that offer some kind of Youtube import (e.g. Opus Clip [1], Clay.ai [2], Veed.io [3] etc.), so my question is how are they getting around this restriction? The only possible solution I can think of is downloading the file locally and then uploading it but I feel like especially for Clay.ai that seems unlikely.
It seems like the suggestions online are to use a proxy server, but I don't trust random online proxy servers for a production application.
[1] https://www.opus.pro/ [2] https://www.clay.com/claybooks/get-youtube-transcripts-in-seconds [3] https://www.veed.io/
It's a great tactic for webcrawling but if you ever get caught they'll block those netblocks and nobody will get away with it again. (Try editing Wikipedia from Tor and see what happens.) I've been on both sides of that one.