And if you can share Docker Compose based set-up, please do (I like Docker Compose for its simplicity).
Synonyms of "toy" include: nano, micro, games, something that can be played with on an off the shelf laptop.
Python or JavaScript preferred.
I use devpod.sh and a pytorch dev container I can spin up locally, with the intention of also spinning it up in the cloud to scale experiments up (but I haven't done much of that). Still, can recommend devpods for reproducible environment I don't feel worried about trashing!
If people are interested I can throw the git repo up now, but I have been planning on finding some time clean it up and write up a really short digest of what I learned.
Above anything I can write though, I highly recommend Andre Kaparthy's youtube channel - https://www.youtube.com/@AndrejKarpathy You can follow along in a google colab so all you really need is a web browser. My project started as following along there and then grew when I wanted to train it to mimic my friends and I on some data I had of us chatting in slack, which meant some architecture improvements, figuring out how to pre-training on a large corpus, etc etc
Also wrote about it in my blog [1].
If you're interested in doing something similar, I wrote an extensive guide of how to build it here: https://github.com/maxvfischer/DIY-ai-art.
The guide doesn't include the ML part, so that you will have to learn on your own and integrate into the project :)
We then navigate through the latent space, exploring it by e.g. interpolating between the mean vector for the number 7 and its positive and negative standard deviations. We then decode the latent vectors revealing interesting relations like "the closer you get to one stdev in the positive direction, the more the 7 looks like 9" or "7 is entangled with skew"
I made a prototype of a Dr Mario-inspired game with a new mechanic (the blank half-pill) in Racket. I hand-coded an AI, and then I wanted to see if I could train a NN to predict what that AI would play in any given game-state. Obviously this allows me to generate training data much faster than playing the game manually.
I did get it kind-of working, as you can see in the two most recent commits on this branch: https://github.com/default-kramer/fission-flare/commits/ML/
I learned that the most important factor for me was the size of the training data, e.g. training on 50k games is way better than 10k games. As I recall, I got it to correctly predict the hand-coded AI's move 77% of the time, and when it didn't get it right it usually had a plausible alternate move. I was pretty surprised with a relatively underpowered laptop and a severely undersized data set it was able to get that accurate. (I suppose it is easier to predict a deterministic algorithm's moves than a human player's moves.)
After doing the ML stuff I decided, "okay, enough prototyping, time to turn this into a semi-polished game using Godot." Well it turns out Godot, although it is amazing, is significantly less fun than Racket. So I got this far before I got sick of it and moved on to a different project: https://blockcipherz.com/
I didn't know anything about RL so I wracked my brains and came up with an approach using CNNs with the goal of creating a model that could play at least as well as I could. The project was three separate scripts: collect training data, train a model, and then run that model using Selenium. The model would be trained to predict an action (jump or don't) using images of the game (state), and the training data was generated by running a script while playing the game that recorded the screen and adding what keys I was pressing at the time. The CNN was simple: alternating 2D convolutional layers and 2D max pooling layers and two dense classification layers.
First, after a couple hours, I realized my poor GTX 1070 laptop GPU would struggle with even 640x480 captures. Read some docs, did some input processing, and after a week with a lot of Googling, got things running.
However, the accuracy was terrible. It took a while but I realized my data captured where I had lost the game. I started manually deleting the last ~1 second of images from each session and it worked! What a feeling!
Since the game speeds up over time the model hit a limit quickly. I used OpenCV to literally write in numbers on the saved images to provide some kind of info about the length of the game to the game state, and it worked again!
Then I ran into a new problem- the model made it to a part of the game where black and white are inverted (the palette shifts from "day" into "night") and consistently failed. I hadn't made it that far very often so there was too little "nighttime" data. So I learned about data augmentation without knowing the term for it; with a quick script, I copied all my training data with black/white reversed, and the model ended up besting my top score by a solid margin. Never realized I could have just swapped the image colors when running the model.
It was the most fun I'd ever had with programming to that point and kicked off my passion for ML and AI. Ugly manual steps, poorly written code, using the wrong kind of model - but it was true creative problem solving and I loved it.
Then I wrote up a NN from scratch in Python to train some simple vectors and even got it to train some MNIST characters https://github.com/tbensky/NeuralNetworks/blob/main/ANN/ann.... (but it was slow).
With that basic refresher, I got into PyTorch and worked on training a PiNN: https://github.com/tbensky/PiNN_Projectile (neat video of it training: https://youtu.be/0wlHa1-M7kw).
Now I'm working on understanding kernels with CNNs (this is my question, I'm making good progress on answering): https://ai.stackexchange.com/questions/46180/kernels-on-a-tr....
Having loads of fun!
I wanted to actually build first-hand intuition on all of the choices around hyperparameter choices, activation functions, network architectures, etc. So I've been rigorously exploring them by training and testing models off of the mnist dataset.
Coming up soon: vision transformers, depth-of-architecture on CNNs, batch size investigations, and more.
Let me know if any of you have any suggestions of things to investigate next!
The final result is here: https://github.com/kevmo314/image-orientation-detection
The repo itself is probably not that insightful because it was the actual steps that taught me a lot but finding a problem that I felt like was independently solvable and then solving it with a neural net helped me go from only an academic knowledge of neural networks to being able to confidently implement them.
I can highly recommend trying to find a similar problem (one that isn't just "someone on the internet suggested I do this as an exercise") if that resonates with you since it was going from the feeling of not knowing that it could be done to a final model that taught me the most.
Also, if you're curious, the use case was I needed was a way to detect if video coming from a GoPro was upside down.
After reading many papers and catching up with the ideas, I'm doing some simple little things to learn Python and the interacies of Torch.
The first thing I made was a tiny upscaler to go from 16x16 rgb to 32x32 rgb. Next was an autoencoder to turn a 32x32 rgb into a number of bytes (128 in testing) and back.
Next up is a combination of both to autoencode correction data to correct an upscale.
It's been well worth it for learning the programming interface to the ideas. Wrangling training data has also been valuable experience for something that isn't terribly complex but is easier once you've done it a few times.
Not sure how easily you could train neural networks on it though.
https://github.com/pickles976/LearningRobotics/tree/main/IK/...
Coded everything from scratch, first in elixir, then rewritten some parts in C.
Did this recently.
Whisper was the first, but I can’t say I’m happy because too much complicated C++ https://github.com/Const-me/Whisper
For Mistral I decided to rely on C# as much as possible and I like the result much better https://github.com/Const-me/Cgml/
https://github.com/antoineMoPa/tfjs-text-experiment/blob/mai...
Spoiler alert: they absolutely do not work in their vanilla form (every article that says they do is wrong). But it was a good learning experience.