How to avoid AI Dungeon-style bills? I’d like to share something similar

Question

Hiya,We&rsquo;ve trained a GPT-2 1.5B model on chess PGN notation. Surprisingly, it&rsquo;s not bad after only a day of training: https://lichess.org/UMyang4z(Or rather, it&rsquo;s not bad up until midgame, at which point it usually blunders. We think it&rsquo;s because it&rsquo;s &ldquo;playing blindfolded&rdquo; due to the fact that it&rsquo;s trained solely on PGN notation, as opposed to encoding the full board state each move.)We&rsquo;d love to release a Colab demo similar to AI dungeon. But as with AI dungeon, our model is 5.6GB. Downloading from a GCE bucket would cost $0.056 per click, if I understand the outgoing bandwidth pricing model.Our options seem to be:1. Download the model via BitTorrent in a colab notebook2. set up a server to power the demo rather than distribute the model to every client3. find a host with low bandwidth fees, and write the notebook to download from thatAll three have tradeoffs, but #3 seems simplest. Anyone know of a way to distribute 5.6GB to ~500k people for less than a few hundred dollars? BitTorrent might be fine if it can deliver the entire model in less than a couple minutes (otherwise people will get bored and leave).

nickwalton00 · Accepted Answer

Ai dungeon creator here. Another option is if you can detect what region the colab notebook is in and have a multi region bucket for each international area you could download from the right region and it may be quite cheap. Our costs were primarily from US GCS buckets downloading to colab servers that were apparently running in Asia and Europe.

p1esk · Answer

Why is the model so large? I mean, how did you go from 1.5B to 5.6B? Have you looked into compressing it (quantization, pruning, etc)?How many playing sessions a server with a single 2080Ti can support? Is it compute or memory bound? I'd plot num_sessions vs latency (time to compute a move), and estimate the costs for target scale/performance.