What are you doing where it serves you well?
Regular Expressions. I hate doing regexps. It is excellent at them.
Wiki articles for an encyclopedia I'm developing. It is great at this, but occasionally I catch it hallucinating articles out of whole cloth because it has access to zero information about what I asked it about, so it just goes from the article title and imagines what it must be about.
I know where to use it right, and I've found my output has not only doubled since I started using it, I am enjoying coding even more than ever because it has got rid of the worst drudgery that would cause me to switch from my IDE to my browser and bring up HN to avoid working.
1. They don't know what they are saying until they have said it.
2. Your inputs and its outputs help make the next message.
3. LLMs are not suited for information retrieval like databases and search engines.
LLMs excel at reasoning and predicting subsequent text based on given context. Their strength lies in their ability to generate relevant and cohesive responses.
To optimize results, outline clear rules, strategies, or ideas for the LLM to follow. This helps the model craft, revise, or build upon the established context.
Starting with a precise query and introducing rules or constraints incrementally can help steer the model's output in the desired direction.
Avoid zero-shot queries as these can lead to the model generating unexpected or unrelated responses.
Be cautious while seeking pre-calculated or non-derived answers. Some instruction-tuned models might output incorrect solutions, as they are trained to respond to certain queries without proper context or information.
also, this is my biggest gripe no fault of ours of course: don't seek pre-calculated or non-derived answers. I've seen some of the demonstration data that people are using to train instruction-tuned models and are being taught to respond by making up answers to solutions it shouldn't try to compute. Btw, the output is wrong.
{ "instruction": "What would be the output of the following JavaScript snippet?", "input": "let area = 6 * 5;\nlet radius = area / 3.14;", "output": "The output of the JavaScript snippet is the radius, which is 1.91." },
https://github.com/sahil280114/codealpaca/commit/0d265112c70...
These models are good for ideation, scaffolding, and prototypes. It's currently clumsy to fully build an app with an LLM, but they are quite useful for certain tasks.
I have designed and implemented 8 operators with between 90-100% of the code auto generated.
I use it to generate mermaid diagrams that implement the first 3 layers of the C4 model, sometimes needing some editing or guidance to modify it, then have it generate the code.
I generate diagrams with high temperature, and code with low.
That’s my experience. I have a coworker using ChatGPT Plus (gpt-4), and they fail to get anything working. It’s not zero effort, but how I think generally aligns with the model I think.
I love having a partner to talk to about thoughts and design ideas while on walks using the ChatGPT app w/ transcription.
I have never felt more productive.
It's general world knowledge is also vastly impressive though.
It can draw a 2D representation of a keyboard key
_____________
| |
| Keycaps |
|___________|
| |
| Key Switch|
|___________|
| |
| Circuit |
| Board |
|___________|
| |
| Base |
|___________|
explain what happens at each layer when I press the key but more impressively also what happens if I spill cola over it: _____________
| |
| Keycaps | --> Coca-Cola makes the keys sticky
|___________|
| |
| Key Switch| --> The switch could become sticky, preventing it from registering key presses properly
|___________|
| |
| Circuit | --> The liquid could cause a short circuit, damaging the keyboard
| Board |
|___________|
| |
| Base | --> Coca-Cola pools at the bottom, potentially leaking into the circuit board from below
|___________|
Developing symbolic reasoning further would likely be a much better use of researchers’ time, even if it takes longer. But the incentives in the short term just aren’t there, sadly.
It's saving me time (sometimes 2x speed up on certain, well-specified, tasks), and I enjoy using it. I wrote a blog post with some details on how it has helped me code the database: https://www.philipotoole.com/what-did-gpt-4-find-wrong-with-...
That said, the most recent release of GPT-4 seems a little more buggy[2].
2. For more complicated tasks it requires good prompting. Example: Tell me three ways to fix this error, then pick the best way and implement it. 3. It's pretty good at writing microbenchmarks for C++. They always compile, but require some editing. I use the same prompting approach as (2.) for generating microbenchmarks. 4. It's pretty useful for explaining things to me that I then validate later via Google. Example (I had previously tried and failed to Google the answer): The default IEEE rounding mode is called "round to nearest, ties to even". However all large floating point numbers are even. So how is it decided whether 3,000,003 (which is not representable in fp32) becomes 3,000,002 or 3,000,004?. 5. It can explain assembly code. I dump plain objdump -S output into it. The main limitation seems to be UI. chat.openai.com is horrible for editing large prompts. I wrote some scripts myself to support file-based history, command substitution etc.
I'm now using the optimized Go code in production.
While there is some GPT4 in there, it's mostly ChatGPT and a small handful of LLaMA solutions.
That project is a contrived scenario and not realistic, but I wanted to experiment with _exactly_ what you are talking about.
Very often I could have done things a lot faster myself, but there is one aspect that was actually helpful, and I did not foresee it. When inspiration gets a bit low and you're not in the "zone"; throwing something into an LLM will very often give me a push to keep at it. Even if what is coming up is mostly grunt work.
The other day I threw together a script to show the commits in a reverse order and filter out (most of) the human commits (glue) over at https://llemmings.com/
I probably could have spent more time framing the query to get better results.
For example, I saved 2 hours. I told it to generate a GraphQL resolver after inferring a Zod schema.
It followed the code conventions from other file.
It generated it beautifully.
Every time there's boilerplate, to ChatGPT it goes.
The resulting code worked. Took an interaction or two to add usage info and tweak things, but it’s a neat little utility. Due to the libraries used, it was also simpler than I expected and would have been easy to write if I knew about those libs, so I also learned something.
It’s just one or two steps away from just saying “Go to this YouTube URL and extract the video between 3:20 and 3:27 into an animated GIF named ‘CatAndRaccoon.gif’” and having it write, debug, and execute the code.
Easily the best rubber ducky though. Copy pasting big blocks of my own code and asking about it gives new perspectives to problems I am stuck on. Huge time saver in that regard.
Computing how to dilute a concentrate into a fruit punch, how to create and render a template in golang, how to parse an int in golang, what makes a chord progression a "lydian" chord progression... it goes on and on.
So I simply dumped the code examples into ChatGPT and said "Given these examples that display text and boxes on a screen, can we write a simpler interface and straightforward small functions for the display code?"
And it was done.
This wasn't code I wanted to mess with, I really just wanted to build my application rather than spend any time messing with the code to interact with this proprietary display. It was fantastic!
Also found its great for regex, I'm pretty good with writing these, but recently came across a pretty complex one in our codebase that wasnt commented. Pasted into GPT4 and asked it to explain it, it broke down each bit in detail, and in the end even generated an example string that would match it.
Coincidentally, SO seems to be worse. But then again, I only use it 10% of the time so I might be wrong
I only have a little coding background, and ChatGPT didn't make me an expert. But the collaborative Human+AI process did allow me to complete a project end-to-end, including figuring out where to host it and how to do that.
I found that it helped me with 6 "superpowers": 1. Choosing between options (e.g., AWS vs. GCP vs. Zapier) 2. Walk me through it (e.g., how to set up a Firestore database) 3. Text-to-code (including simple nuisance calculations and code-to-code changes) 4. Help me out! (i.e., fixing broken code based on error messages) 5. Teach me (e.g., learning the difference between let, const, var, etc...) 6. Check my code (e.g., it caught errors before I even ran the code)
Check out the post for more details if you'd like!
There's another post on building a website from scratch where I also tried Replit's Ghostwriter. Yes, I faced a lot of frustrations in the process, but going from "I can try struggle through this on my own" to "I actually have some help here that's always available and usually right" is amazing IMO.
For example, a utility to use the bitbucket api to dump all of the environment variables configured for a pipeline.
I had a bunch of PDF files with coded file names, like ABC-123-abc-999-001.pdf, where each section of the file name was meaningful. Inside the PDF were several form fields. I needed to insert records in a database for each of the 97 files. Easy, but tedious.
I prompted it with a description of the file name breakdown, where the text to grab was in the PDF, and then asked for a Python program to find the files (in subdirectories), extract the PDF text, and write a text file with the SQL Insert statements for all the files.
It took two or three minor iterations, but less time than it would have taken me to write from scratch because my Python is rusty. Regular expressions, PDF processing libs, file system traversal, and SQL generation, and it all compiled and worked from the start (the iterations were to tweak a few things I didn’t specify).
This is the kind of thing that I think is perfect for these tools. With the new tools that let the LLM compile and execute code, it will be cool (and potentially dangerous).
It’s basically a fuzzy inverted index for public docs and code. “What’s the normal way of doing X”-type queries work best, with quality falling off quickly as complexity increases. Stuff like “what is a ‘git log’ command that only shows commits containing a particular snippet in the diff, limited to merge commits on master”, for example.
For more complex tasks, a trick I’ve seen work well is “give me an outline for how to do large task X” followed by “let’s go through each step in the above outline. For each one, I’d like a description of how it should be solved, including example code. Let’s start with the first step.” But that trick is not totally reliable and has its own complexity limit.
* If you can write it from memory, go ahead and do so. Do not consult GPT-4
* If you know what to do but need to look a few things up - put your best effort into GPT-4. It will flesh it out
* If you're using a library that is new, you can copy paste the library examples into GPT-4 and then describe what you want to do. It will give a great starting point
A big client asked us to fill around 200 questions in an excel sheets, regarding our company security.
Then he asked for a cybersecurity standard document (like a big thing around 50 pages)
I took the previous excel sheet, removed noise, anonimized it. The n pass it to gpt3.5 to summarize for each security category (access management, code source security,...) the bulletpoints answering how do we implement that
To finish, I passed the bulletpoints for each category to gpt4, to write a nice bullshit document which sound more professional than me
I have a set of tools which build prompts describing the environment and the output of vulnerability scans. The tool then requests a shell script to disable/fix/update the vulnerability. The script is submitted as a PR which has actions that run integration tests. Human intervention is sometimes needed, but the focus is on better engineering of prompts (and by proxy tooling).
Describing the environment relies heavily on my CMDB (Combodo’s iTop) so this is not a one-size-fits-all approach and this is functioning entirely in my personal lab of ~100 servers. That said, ChatGPT has given me the best results compared to locally run LLMs
Maybe because Lisps have very low syntax, I find I can just copy off function/macro definitions as context and GPT3.5 can rewrite the code into something I can use... And by testing in a REPL (in :dev) I can instantly see which code has hallucinations and which work perfectly.
Tbh I find it hallucinates mainly when dealing with mutable states (e.g. atoms) or with pipelines of complex maps transformed by multimethods -
Just using small data bits and normal pure functions make GPT3.5 work perfectly because it doesn't need to take into account code outside the 2048-4096 tokens it's thinking right now?
I then used it to generate a more robust Python script processing these large XML files instead.
I like using it to generate scaffolding and for debugging but I haven't had to touch a legacy C++ code base for a few years.
I’ve used this enough that I wrapped some cli glue around it and wrote https://github.com/radoshi/llm-code
I’ve used this mostly to write Python and bash, with some Makefiles and Dockerfiles thrown in.
GPT-4 is better, albeit slower, than 3.5-turbo. HTH!
The prompt asks for specific aspects from a podcast - people, dates, locations - score them by relevance and count the occurrences. Now, GPT can't count for a damn, but it is a useful proxy. The relevance score is pretty good.
There are existing services that can do this, but with GPT and the API I don't need to read a manual and I define exactly what format I want back.
This is what is exciting - GPT will format its responses to _my_ requirements, not the other way around.
Fair warning, the queries were not always perfect on first try, but it was a lot easier than parsing replies to somewhat similar questions on stack overflow. Now I mostly write my own queries but it really helped me get started.
I've had so much success that I built my own command line utility (https://github.com/0xmmo/codemancer) to use in the VSCode terminal and my side projects are now ~70% written by LLM.
Sometimes I ask it to write complete classes for me. My longest prompt was full page long and it wrote a county-city autocomplete from a database: the back end in Go, the front end in plain vanilla JS with a non-jQuery based lightweight library (I asked it).
I've converted from an excel wiz to python, but making charts was always the bane of my existence in python, until GPT. And I personally use 3.5 more than 4 because of the speed, but I used 4 when I need something critical or I know I need it to balance multiple thoughts.
One thing that helps is telling it it's wrong or missing a case or whatever. It'll type out a fix (assuming it understands, which it frequently does) faster than I can.
* NixOS development being pretty high paced, [with certain developments (like flakes) post-dating the original GPT cutoff still?]
* Documentation being slow to show up on the internet.
* Just in general lower popularity (than windows or ubuntu or etc) leading to less data for ChatGPT to pick up on.
I just use Amazon CodeWhisperer as a nice autocomplete but that's it.
Me: I want to make an svg editor, give me some suggestions. I mainly want something with mobile support.
GPT4: gives me some options
I look over the options and choose fabricjs
Me: Start by loading an svg at a predefined url
GPT4: Me: Ok now implement a save feature, send the json to this url ... GPT4: Me: The text is loaded as an image, I want the text to be editable GPT4: Me: I'd like to add some google fonts to the text editor GPT4: Me: The fonts aren't loading, I think we need to load the fonts first before initializing the canvas. GPT4: Me: Ok add a undo/redo feature GPT4: Me: Let's add some clickable buttons instead of hotkeys, here is the html.. GPT4: I probably could have done this myself, but frankly it would have taken me a long time to figure out the fabricjs api. It probably saved me at least a week making this thing. here's the live app: https://tinyurl.com/2dhh58cn and the code: https://tinyurl.com/2tu4xrtn you can tell the GPT generated sections by the (overly) verbose comments
I just didn’t feel like looking up the language specific regex syntax I needed and poring over the verbose examples for an hour.
Worked perfectly.
Trick is you tell it you are going to import/export with XML files and it works in them.
Essentially it is an advanced autocomplete.
Getting it to also serve HTTP, it fell into quite a few issues. Part of it was not telling me I needed to enable a feature, and part of it was that it's knowledge was quite a bit out of date.
I actually filmed that whole interaction here:
https://www.youtube.com/watch?v=TFsbMGSOeCY
I also was able to get it to make a working (though extremely basic/naive) SAT solver in J. J is pretty far out of the mainstream, so I had to go through MANY rounds of correcting it. (That was the only time I used up all my ChatGPT4 prompt quota for the 3-hour period.)
Since then, I've stumbled on the technique of presenting it with a rough plan or idea and then iteratively having it ask ME questions about what I posted, and summarizing everything we've agreed so far, rather than just immediately writing code. I find that it's actually pretty good at pointing out things I hadn't considered (security and scaling questions, for example), and asking for clarification.
Most recencly, I've started using it to help me get past the learning curve in languages where I'm not fluent at all (making an animation in Mathematica, and discovering how to do some simple things in smalltalk).
In general, I try to ask it for the most minimal/general example it can give me that shows what I actually want to know. For example, building on the rust web server thing, I asked it to give me the structure for building a restful API with certain endpoints (which "we" worked out "together" using the iterative design discussion method) but just leave the implementations blank, because they would be un-necessary detail for it, and I already knew how to do that part.
Aside from that, I've used ChatGPT 3 in a non-chatting context through GitHub Copilot, and that is a whole other ballgame: it's basically a plugin for Visual Studio Code that acts like a super-smart autocomplete.
It doesn't always guess what I'm about to type correctly right, and the wrong suggestions are occasionally annoying when I'm pausing to think through how to word a comment... But very often now when I start to write a function, several whole lines that were just a vague idea in my head suddenly appear on my screen exactly as I would have written them. (And I mean exactly, including my sometimes unusual code formatting choices...)
I'm still on the waitlist for copilot chat, which presumably is just ChatGPT but insta-trained on your codebase... I'm very much looking forward to trying it, though.