HACKER Q&A
📣 simonebrunozzi

Are you using a GPT to prompt-engineer another GPT?


I tried with combinations of GPT 3.5, 4 and Bard. Results are interesting.

It made me think that the obvious way to learn prompt engineering is… to not learn it, but to use another LLM to do that for you.

Any experience with this? Happy? Unhappy?


  👤 chalsprhebaodu Accepted Answer ✓
I’ve commented it before, and surely it’s something I’m doing wrong, but I cannot believe system prompts or GPTs or any amount of instructing actually works for people to get ChatGPT to respond in a certain fashion with any consistency.

I have spent hours and hours and hours and hours trying to get ChatGPT to be a little less apologetic, long-winded, to stop reiterating, and to not interpret questions about its responses as challenges (i.e when I say “what does this line do?” ChatGPT responds “you’re right, there’s another way to do it…”).

Nothing and I mean NOTHING will get ChatGPT with GPT-4 to behave consistently. And it gets worse every day. It’s like a twisted version of a genie misinterpreting a wish. I don’t know if I’ve poisoned my ChatGPT or if I’m being A/B tested to death but every time I use ChatGPT I very seriously consider unsubscribing. The only reasons I don’t are 1) I had an insanely impressive experience with GPT-3, and 2) Google is similarly rapidly decreasing in usefulness.


👤 knrz
You should check out https://x.com/lateinteraction's DSPy — which is like an optimizer for prompts — https://github.com/stanfordnlp/dspy

👤 nl
Yes, I've had great success with this in a few cases.

There's the obvious "create a stable diffusion prompt with all the line noise of 'unreal engine 4K high quality award winning photorealistic'" stuff which is pretty obvious.

Less obvious is using it to refine system prompts for the "create your own GPTs" thing. I used this approach for my "Chat with Marcus Aurelius, Emperor of Rome and Stoic philosopher"[1] and "New Testament Bible chat"[2]

I'm particularly happy with how well the Marcus Aurelius one works, eg: https://chat.openai.com/share/27323fe8-56e2-4620-8e4a-3ebf69...

For both of these I started with a rough prompt and then asked GPT4 to refine it.

I found the key was to make sure to read the generated prompt very carefully to make sure it is actually asking for what you want.

More recently I've been using the same technique for some more complicated use-cases: creating a prompt for GPT-4 to rank answers and creating prompts for Mistral-7B. The same basic approach works well for both of these.

[1] https://chat.openai.com/g/g-qAICXF1nN-marcus-aurelius-empero...

[2] https://chat.openai.com/g/g-CBLrOOGjA-official-new-testament...


👤 nbardy
Yes. I deploy prompts professionally for work and I almost always iterate with chatGPT.

It requires a bit of back forth but you can get great results. It lets you iterate at a higher level instead of word for word.

I also find that the prompts work better. Prompt engineering is often about finding magic words and sentences that are dense keywords from the training data and another LLM is going to be good at finding those phrases because it knows those phrases the best.

Here’s an example dialogue I was using recently to iterate on a set of prompts for generating synthetic training data for LLM training. (Inspired by phi-2)

https://chat.openai.com/share/51dd634b-7743-4b5e-9c3f-3d57c6...


👤 CrypticShift
On a related note, with the (tens of) thousands of "custom GPTs" coming up in the next few years, it would be interesting if the chat would automatically recommend using any one of them in response to a particular query. In a way, it is as if it is directing you to a (human-made) better engineered (pre) prompt.

👤 ReDeiPirati
We recently open sourced an agent framework [1] for automating data processing and labeling where the agent's prompt is refined trough iterations with the environment and then asking to an LLM to revise the prompt according to its performance (i.e. automatic prompt tuning). We tested it on the Math reasoning dataset GSM8k and where able to improve the baseline accuracy (GPT4) by 45% -> 74% using 25 labeled examples (I'll put the notebook and blog post linked below [2][3]). Results are definitively very interesting, if not surprising with some skills, and we see more and more of our open source users and customers showing interested in the framework for automating labeling / having it as a copilot.

[1] https://github.com/HumanSignal/Adala

[2] https://github.com/HumanSignal/Adala/blob/master/examples/gs...

[3] https://labelstud.io/blog/mastering-math-reasoning-with-adal...


👤 alfozan
Checkout Magic Prompts https://magicprompts.lyzr.ai/

👤 FinalDestiny
Yes, I just used GPT-4 to create a prompt for GPT-3.5-Turbo based on some loose rules that I laid out. It helped me fill in the gaps and write it in a concise format.

The prompt gave much much better results than than the one I wrote.


👤 calrueb
My PoV is that it's an open question whether this is a fruitful approach. If you search for "meta-prompting" you'll find some discussions/papers on the topic.

👤 andrewedstrom
You may be interested in a recent AI safety paper by Redwood Research.

In it, they have GPT-4 generate solutions to coding problems, but instruct it to insert backdoors into the solutions some fraction of the time. Then, they explore different ways to use a weaker model (GPT-3.5) to detect these backdoors. Pretty interesting.

[1] https://arxiv.org/abs/2312.06942


👤 free_bip
My experience (n=1) is that current LLMs are just not good at prompting either themselves or other LLMs, and that if you have enough information to make a meaningful meta-prompt, you also have enough information to make a regular ol' prompt. I just don't think it's something that designers of current LLMs are prioritizing, so it's not very good at it.

👤 Der_Einzige
I wrote a paper about using big "LLMs" as art directors for the little LLMs within Stable Diffusion: https://arxiv.org/pdf/2311.03716v1.pdf

👤 User23
I have a basically unsubstantiated intuition that there is some analog of the recursion theorem for LLMs, if it’s not itself applicable. If so it should be mathematically impossible to prevent prompt “hacking.”

👤 minimaxir
I'm tempted to build a tool that uses DAGs to orchestrate sequential prompt engineering, but typing that out makes me feel dirty.

👤 danielmarkbruce
Yes. It's hard. Prompting is hard. Prompting to prompt is hard.

👤 galaxyofdoom
GPT 3.5 will happily write jokes about Jesus of Nazareth but will adamantly refuse to write jokes about the Prophet Mohammad. I can't see why people cannot recognize this technology as a complete abomination that will gravely impact society for the negative. Total and complete political correctness that never wavers and never relents.

👤 huytersd
ChatGPT already does that to generate images for DALLE 3