But with LLMs, no one knows their workings once they are trained. The other day, Bark model[0] for text to speech, the team itself has following to say on details:
> Below is a list of some known non-speech sounds, but we are finding more every day. Please let us know if you find patterns that work particularly well on Discord! [laughter], [laughs], [sighs]....
So that's the team themselves not knowing what their model is capable of then how come prompt engineering is any engineering at all?
(In many countries, the word "engineer" is regulated -- you can't call yourself an engineer without professional qualifications and oversight.)
https://en.m.wikipedia.org/wiki/Regulation_and_licensure_in_...
For example, Google defines the second meaning of "engineering" as:
2. the action of working _artfully_ to bring something about. "if not for his shrewd engineering, the election would have been lost"
(https://www.google.com/search?q=define%3AEngineering)
Merriam-Webster has:
3 : calculated manipulation or direction (as of behavior), giving the example of “social engineering”
(https://www.merriam-webster.com/dictionary/engineering)
Random House has:
3. skillful or artful contrivance; maneuvering
(https://www.collinsdictionary.com/dictionary/english/enginee...)
Webster's has:
The act of maneuvering or managing.
(https://www.yourdictionary.com/engineering)
This is a well-established, nontechnical meaning of “engineering”.
https://mitchellh.com/writing/prompt-engineering-vs-blind-pr...
Most of what we see on Twitter or YouTube is Blind Prompting. However, it is possible to apply an engineering mindset to prompting and that is what we should call prompt engineering. Check out the article for a much more detailed framing.
Dair AI also has some nice info and resources ( with academic papers) about prompt engineering.
* Regulated by a profession and associated legislation
* Work first to ensure the safety and welfare of the public
* Perform only within their area of competence
* Act as faithful agents or trustees for their clients
* Avoid deception and represent matters in an objective and truthful manner
* Invest in continuous professional development
* Can be subject to penalties for malpractice: removal of licence, fines, or jail
It's hard to see any of these applying to LLM input entry people. It's a even a stretch to consider them non-professional engineers.
We've seen this before. Back when Google was a search engine, you'd eventually get a feel for how things worked, and build a vocabulary of "spells" that give the results you want, without using the common words with many overloaded meanings. (Example: Annotation instead of mark up, when trying research the Marking Up of Hyperlinked Texts)
Similar things proved to be true of Stable Diffusion and the image generators. It seems quite reasonable to conclude the same will be true of GPT4 and its kin.
The idea of engineering being backed by solid mathematical understanding is a relatively post-calculus idea. For the longest time, we built bridges a certain way, because those are the bridges that stood. We were flying planes long before we understood flight.
At the end of the day, engineering is the cycle of "identify a pain point -> launch experiments -> observe the delta -> apply the solution". Prompt engineering meets all of those requirements and therefore, is engineering.
It is funny to hear this, because 20 years ago : "is software engineering really engineering" was a rather common phrase around STEM circles.
Yes, software can be engineered, but that's not what the vast majority of so-called software engineers are actually doing at their jobs. The title mostly exists to inflate the importance of a programmer with years of experience under their belt. In reality, most of them couldn't explain to you what engineering itself is, and their job primarily consists of duct-taping and building features expediently. It's like taking a carpenter whose job is to crank out barely adequate sheds for a shed company and calling them an architect.
Don't even get me started on "computer science."
Prompt engineering is a legitimate area of study, and is obviously a practice demanded by LLMs, but you gotta just ignore the "engineering" part. It's the same skill as being a good communicator. Take a room full of "software engineers", tell them "build me an app that will let me sell gadgets", and they'll do their best to build one, but chances are it won't do what you want unless you communicate with greater specificity. It's hilarious how many people think LLMs suck just because they don't do the right thing given a single shitty sentence.
I'm sure there will always be some people who are better at this than others, but if the interface is sufficiently chatty and 'smart', that could significantly reduce the gap between newbies and seasoned prompters.
What does matter is having stable, predictable abstractions that you can rely on, so in software engineering it would be a mistake to rely on system-dependent undefined behavior when using a language like C++. You also don't necessarily need to know what kind of low-level optimizations the compiler and linker are using to generate your executable binary, though in some cases it could be important.
With LLMs, however, it's not really clear if the same prompt always gives the same kind of output, and you can do 'regenerate response' with something like ChatGPT to see this in action.
In some sense, 'prompt engineering' can be more like 'interviewing an engineer' in that LLMs seem to provide better outputs if you start with a broad, general question and then narrow down to your specific interest over a series of questions. Helping the LLM out by defining context, asking it to expand on a specific output it generated, pointing out where it may be hallucinating etc. all appears to improve the quality of output but I don't know if this kind of iterative process is really 'prompt engineering', maybe 'prompt optimization' is a better word for it?
In other words, writing a series of prompts to an LLM that gets you the answer you need feels quite unlike writing a Python script to automate some task. You can plan the latter out from start to end, but the former is this back-and-forth process.
I do think that it's not broad or deep enough (right now) to merit a completely separate area of expertise on its own, but it can be too big to fit within the minds and areas of expertise of people who already have their work cut out for them in their existing workdays. Think about most desktop computer users, they really have no clue how the computer works, and even asking them to change a resolution on a mirrored display might be too far removed from their knowledge base (and that is fine). This also applies to tokenisation prompt inputs; it might take too much for someone to simply tack that on to what they already do and know.
This is then were we get job postings and those need to be easily identified, so people can search for them, and this is where we get nonsense terms used to distinguish the desired applicants from the pool of work-seeking people.
Just because we don't know all the workings of something doesn't mean we can't effectively use it, or measure how well that approach of use is valuable or repeatable for similar uses.
For example, while we have a good understanding of the basic principles of magnetism, there is still much that we do not fully understand about how magnets work, and ongoing research is focused on unraveling these mysteries and expanding our understanding of this fascinating natural phenomenon.
That's not to say we don't know how to use magnets to make sound, however.
Prompt engineering, in my opinion, is a process of optimized querying of a frozen model. The approach can be augmented by using "hot" data from vector databases or other types of text storage engines. The approach to picking the right content for prompt building is not "snake oil", but based on well known processes including autocomplete, synonyms and related terms, personalization, and query expansion.
Is prompt engineering 'real' engineering? It seems easy enough to test whether knowledgeable, self-proclaimed prompt engineers can outperform a random person with only moderate experience requesting information from an AI in a reproducible way.
If they can't, then there is probably no engineering involved in prompt engineering, at least at present.
If they can, then it seems like they're probably not doing it with magic, so there is some set of reproducible techniques involved. At that point, would it be fair to call it engineering?
Back then text just continued onwards, and many papers found ridiculous gains on conforming with intended output (and therefore properties of output) by finding a suitable widespread online format.
Example: I think all Question/Answer forums, and places like StackOverflow, used a certain format like 'Q:' - that worked much better for question content with high quality answers. One paper, "Ask me anything" showed that formulating tasks as questions-answers resulted in much more effective prompts: https://arxiv.org/abs/2210.02441 - another for reasoning is the "Lets think things through step by step" example. It was a challenge to output JSON data with more than 3 correct keys, unless you finetuned.
However, the instruct models and recent 3.5/conversational RLHF are just so good at doing tasks, that such "engineering" - which supported/improved a wide base of prompts across domains and functionalities - broadly entirely stopped.
Now prompt engineering sounds so cool and was also used in the way of specific use cases back then, that everybody uses the term for prompts to get certain outputs. This however, is not really the same level of engineering and to some expert, reminiscent of old-school knowledge system finesse, that we saw before 2022.
Prompt engineering feels like a quite appropriate name because there is a lot of experimentation and refinement to find good ways to interact with these models (I.e. good prompts!).
It doesn't perfectly map to "engineering", but it maps pretty well. What else would you call an iterative process of experimentation and refinement?
A little tongue in cheek but it did get the noodle working
If you want repeatable results, you get GPT4 to write you code. Then use that code. If you run into cases it doesn't handle, or have new ones, then you can revise the code, perhaps with the help of the AI once again.
The problem is, that the bullshit artists out there want repeatable results in areas for which it can't write you working code. For instance, a problem like taking a textual product description and producing a glib advertisement.
They want to be able to write a prompt where you can just substitute different product descriptions and have quality ads come out that nobody has to proofread (so they can be immediately served), just by "turning a crank".
This is likely a reflection of the data used to train the model - TTS models are trained on labeled audio, one form of which is subtitled audio streams. In subtitles / closed captioning, bracketed words are frequently used when there is something audible that is not speech.
Based on this insight, it should be possible to inspect the training data and extract a set of non-speech sounds the model is likely to generate well - but that doesn’t drive engagement of your users like asking them to experiment themselves does ;)
For example, closed captioning adds music notes (♪) when music plays or when people sing. According to the Bark docs, adding ♪ causes the model to output things as music.
You can certainly get better at it by learning the basics of LLM models, as well as finding and applying certain methods that improve the results.
You can actually see quite a lot of difference between someone trying out some things, or someone who has spent quite a bit of time trying to make it do non standard things.
What you call "engineering" is of course up for debate, but it's not snake oil.
In other words, you also have to understand the text that you're getting back from your prompts, and then "engineer" the prompt to fish for better results. AI itself can do this to an extent, but over-engineering is also a thing, as is the fact that AI itself does not actually know what you want.
That's all there is to it. Refine, revise, engineer - all just buzzwords that lead to the same end result: a game of ping-pong between you and the LLM.
But that it’s a bit further down the line. Today, like previous hype cycles, opportunistic types will cash out on the money train. And tomorrow you will do whiteboard interviews for them (just like the last cycles..)
So the relevant question for the thoughtful geek isn’t “Is crafting prompts engineering?”. No. The Q is “should I sit out this hype cycle and then end up doing leet code monkey dance for the opportunistic types who made it big by riding the hype cycle, yet again?”
I'd say even in this example, there's more a sense of searching through a range of different options to achieve specific properties. This is the sense of "prompt engineering": you use skills to create prompts -- in part by searching through various alternatives -- that have specific properties.
[1] https://dictionary.cambridge.org/dictionary/english/engineer
I've already considered myself a prompt engineer instead of a software engineer, because I've been coding 15 years and now 80 percent of the code is generated for me, so I really don't think I'm ever going to want to go back to engineering without the aid of ai.
If you're a software engineer and you use chatGPT to help you code, you've switched careers from software to prompting.
Anyway, if you're trying things iteratively it becomes more a science. Engineering is more like designing stuff and building them without trying lots of things first.
Probably makes it an apt comparison, all told. :)
For me, prompt engineering was about getting it to stop moralizing and adding trigger warnings.
Prompt engineering is programming but with natural language. Even between us humans being able to communicate clearly is essential part of understanding each other. It is also a form of programming.
Prompt engineering, or whatever you want to call it, will be an important part of how we communicate with machines in the future.
Technical expertise. It is technical in nature, or only one level removed from having to look at code or internals. Unlike customer support--strictly speaking--you have access to additional tools or authorization to investigate the matter.
Domain expertise. You know that some phrases just don't help. You know about temperature and other jargon. You can look at someone's paragraph of prompt and immediately suggest something.
Stakeholder safety. You act in a manner that eliminates or reduces stakeholder harm. You care about your work.
If any of these are missing, it's not "engineering." Not to mention the importance of measuring things, data, etc--but I would hazard that being able to stare at graphs all day does not an engineer make.
At the end of the day, if your job title is "Prompt Engineer," who will object?
Versus professional (certified) engineers: https://news.ycombinator.com/item?id=35669226
Engineering as artful action: https://news.ycombinator.com/item?id=35670444
Tools of the AI Engineer: https://news.ycombinator.com/item?id=35669249
it's not snake oil, it's literally just a joke.
$$ and demand don't change the skills?
I mean, Anakin _WAS_ good with machines.
Works for the time being while there's low hanging fruit, but researchers will use those exploits to help build more robust systems.
Eventually there'll be no point in doing it anymore.
Don't dismiss it as "snake oil".
Approach it as any scientific topic - with curiosity and open mind.
Be rational with your skepticism.
> So that's the team themselves not knowing what their model is capable of then how come prompt engineering is any engineering at all?
This is why.
sure you can be good at googling, and it helps, but it isn’t a profession