From what I understand, in JavaScript at least, putting `await foo()` inside an async function, splits the calling function in two, with the 2nd half being converted to a callback. (Pretty sure this is full of errors so please correct me where I'm wrong)
Why can't non-async functions use await()?
I've also read that await() basically preempts the currently running function. How does this work?
Update: I'm re-reading http://journal.stuffwithstuff.com/2015/02/01/what-color-is-y... and I think the answer lies somewhere in this paragraph, but I can't wrap my head around it yet:
> The fundamental problem is “How do you pick up where you left off when an operation completes”? You’ve built up some big callstack and then you call some IO operation. For performance, that operation uses the operating system’s underlying asynchronous API. You cannot wait for it to complete because it won’t. You have to return all the way back to your language’s event loop and give the OS some time to spin before it will be done. Once operation completes, you need to resume what you were doing. The usual way a language “remembers where it is” is the callstack. That tracks all of the functions that are currently being invoked and where the instruction pointer is in each one. But to do async IO, you have to unwind and discard the entire C callstack. Kind of a Catch-22. You can do super fast IO, you just can’t do anything with the result! Every language that has async IO in its core—or in the case of JS, the browser’s event loop—copes with this in some way. [...]
I don't get the "You cannot wait for it to complete because it won’t." or the "But to do async IO, you have to unwind and discard the entire C callstack" parts.
I'm also using these resources, they help but I'm not there yet:
- https://stackoverflow.com/questions/47227550/using-await-ins...
I think what OP is assuming is a multi-threaded or multi-process environment, where the calling function can just block whatever execution context it's running in and wait until the async function returns.
The problem is that many environments are effectively non-multi-threaded, especially the (usually single) thread/queue/process that draws the UI and responds to user input. So if you block the UI thread, your whole app (at least from the standpoint of the user) stops responding.
Still, this should work in principle, but threads are more costly in terms of memory and context switching time than continuations, so it makes sense to allow the thread to continue to handle other tasks while waiting for e.g. I/O to complete.
This is what the author of the classic https://journal.stuffwithstuff.com/2015/02/01/what-color-is-... settles on as the best way of handling async tasks, but I recall there being some pushback on that here on HN.
Making a function support being paused and resumed requires changes in how the function is run and the data that is maintained while it is executing. In addition to behaving differently than sync functions when they execute, the results are also handled differently. Async functions always return a promise - whether they `return` a literal value, return another promise or throw an exception.
Due to these differences, it makes sense that the special nature of async functions and generators must be declared up-front, with the `async` keyword or `*` for generators. It would be possible to design the language such that the function type was determined by looking at whether it contains `yield` or `await` keywords. Python does this with generator functions. However this makes an important aspect of behavior less explicit.
In C#, you can't use the "await" keyword, but you can use the result of a Task
In the first case I am telling an executor what it has to do (it has to execute the thing then continue inside my function)
In the second case I am myself blocked into a state that will be unblocked when the thing returns.
That is in the first case my function must be of a type that can be stopped and continued and I need to save its state somewhere.
In the second case I don't need the ability to stop and continue my function.
That makes for two kinds of functions at some level. Low level languages will expose that. High level languages will hide it.
But there is a growing request for functions that can be both. I think that zig has them
Well, I would say that it's the opposite of preemption. It's cooperative multitasking. Your async task is split into two tasks, with the await in the middle. When you call await, the first task finishes. The second task starts running once the await is done.
> Why can't non-async functions use await()?
In C#, they can. You can call `.RunSynchronously()` on a `Task`. C# supports lots of different TaskSchedulers that handle this differently.
C# is almost pathologically flexible here. Other languages typically assume that there is only one async task scheduler, and it's handled by the runtime. This task scheduler may be less flexible, and there are various design tradeoffs.
When you call await the function doesn't necessarily have to be preempted (maybe the async function you called already returned) but if the async function hasn't returned then it has to be.
In most systems like that there is a scheduler that keeps a list of things that are being awaited on and keeps track of which ones are ready to return. When you call await on one function the scheduler looks at that list and chooses an await that is ready to continue and executes it.
Normal functions can't do that. If they wait for something they block, they can't give control back to the run loop.
Can it be done? sure:
let bar = await foo();
return bar + 1;
can be converted to: let resultPromise = foo();
while (!resultPromise.isComplete()){
sleep(10);
}
let bar = resultPromise.value();
return bar + 1;
the issue is that this is terribly inefficient as you are asking every 10ms whether or not the result is ready.One of the reasons this is so inefficient is that you have to spend time checking (which is wasteful if it isnt ready) and waiting. If you check more frequently, you're wasting CPU. If you spend more time waiting, you can waste up to that much time if the result is ready right after you go to sleep.
The alternative is to have the operating system tell you when its ready. This is done through the event loop. So this would turn the code transformation into something like the following:
let resultPromise = foo().then(function(bar){
return bar + 1;
});
but what do you return? you have to return a promise (i.e. a callback). Therefore, all callers of this function have to be ready to handle this.
When you write (Python):
await sock.write("fooooooo")
write will, after several layers of abstractions and wrappers, look like this (on a reactor/readiness-oriented system): def write(self, data):
while data:
try:
written = self._socket.write(data)
except EWOULDBLOCK, EAGAIN:
yield 'writable', self._socket
data = data[written:]
"yield" here means to pass a bit of data up to the event loop (neé coroutine scheduler), and that event loop will watch (using select/epoll) for that socket to become writable and then hand control back.Because this inversion of control doesn't happen unless you (usually) explicitly wrap a coroutine (e.g. asyncio.run_until_complete) with an event loop, you can't just call a coroutine. It doesn't know when to resume in these designs, and your synchronous function doesn't know how to, either.
1. For something to be `await`ed on, it means that the calling function has to have the ability to be paused and continued later 2. Non-async functions cannot be paused right in the middle and resumed later. They're regular functions that will execute from start to finish by the main thread 3. In a hypothetical JavaScript implementation where a non-async function could `await`, the main thread would not be able to pause the calling function but instead it would keep executing it, blocking the thread, because there would be noone to execute that function that we would be `await`ing on (since there's only 1 thread and it's blocked). So it's a deadlock.
So, the difference between async and non-async functions in JavaScript, is that the former can be paused in specific points (pointed to by `await`).
To await means to wait for a Promise to resolve asynchronously. That's why async functions return a Promise instance.
async function asyncFn() {
const result = await fetch(url)
const json = await result.json()
return json
}
asyncFn().then(json => console.log('Got', json))
console.log('Fetching result..')
Calling asyncFn() will immediately return a Promise instance. Then each await in the function waits for a Promise to resolve, before continuing to the next instruction. While that's happening, other code can keep running, like the last line in the example above.In contrast, a non-async function is synchronous, which means it's expected to run all instructions one after the other, then return a value. No other code can run while that's happening.
If await was allowed in a non-async function, it would have to block execution of all other code until the Promise is resolved/rejected.
function syncFn() {
const result = await fetch(url) // Stop the world
const json = await result.json() // Stop the world
return json
}
console.log('Fetching result..')
console.log('Got', syncFn())
This is how a synchronous HTTP request works: https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequ...To avoid stopping the world, a non-async function can use Promise callbacks instead.
function promiseFn() {
return fetch(url)
.then(result => result.json())
}
promiseFn().then(json => console.log('Got', json))
console.log('Fetching result..')
If that's true, does the "function coloring" problem apply to Promise-based code and callback-based code too? (Talking about JavaScript here)
Careful, it doesn't work the same way in every single language that has these keywords.
> Why can't non-async functions use await()?
Because in Javascript at least, it only makes sense to await a promise, nothing more. A function marked as async will always return a promise. You can't await on anything that is not a promise.
async function myFunction() {
a();
await b();
let d = c();
return d;
}
becomes something like this: function myFunction() {
return new Promise((resolve, reject) => {
a();
b().then(() => {
try {
let d = c();
resolve(d);
}
catch (err) {
reject(err);
}
}).catch(reject);
});
}
If you didn't mark `myFunction()` as async, it wouldn't set up the `new Promise()` scaffolding, and there would be no `resolve()` or `reject()` to call.Note that you can call an async function from a non-async function. Assuming b() is async and a() and c() are not, a manual approach looks like this:
function myFunction(callback)
a();
b().then(() => {
try {
let d = c();
return callback(null, d);
}
catch (err) {
return callback(err);
}
}).catch(callback);
}
When I mark a function as `async`, I'm specifying that this function will be performing some form of asynchronous I/O. Any caller of that function (or its callers' callers) will therefore also be doing I/O. Such functions may take a while to "return" and are more likely to throw an exception.
Function coloring makes these properties explicit. By forcing all callers to also be `async`, it becomes easy to see which parts of the codebase are doing (somewhat risky) I/O work and which parts will be near-instant. No one ends up surprised when a call to `getFoo()` ends up getting `foo` from the network rather than the local object.
From generator then it was to see on how to implement async await using them. This was how 1.x version of koajs did (https://github.com/koajs/koa/blob/master/docs/migration.md).
Another way was to understand how polyfill for an implementation works. eg. https://github.com/facebook/regenerator
I am no longer familiar enough with C#, so I will take NodeJS as an example.
In NodeJS everything that you do happens on one thread, which is great because you will never, ever, need to worry about locks and threads.
The downside is that you can only serve one client at a time. To avoid this, whenever you start reading a file, or send a network package, or something else that takes a long time where you are not doing anything, Node stops running your code and runs some other code for a while. Then it will come back to you when it has a result and start running the code again.
It is easy to understand for us what come back means, but how to express that in an language? First attempt was with callbacks - node would come back to whatever you put in the callback. The downside of this is that you get very nested code, you have to repeat code to handle errors and it becomes a mess fast, especially if you have to deal with collections.
The next attempts are promises. These are a lot easier to deal with than callbacks, because you get some value back than you can treat like just another object (put in containers, etc) but in practice your code still ends up having to stop and pass values through functions all the time, which puts in visual noise and is a pain to deal with.
Next step then is async/await. Instead of coming back by executing a function, you use the keyword 'await' to tell the compiler where to await to. As for why this can't be used inside a function, await is a legal identifier in plain Javascript - await(someValue) - has a very different meaning if you have your function marked as async vs if you don't.
I use await a lot at work, and my mental model is just that it is the same as promises, except that instead of having to write an annoymous function, invent a new name for the result of the promise and showel my code into the anonymous function, the compiler does so for me.
Thus I get to write:
let a = await fs.readFile....
let dirContent = await fs.readDir(...
return dirContent.contains(a)
Instead of: return fs.readFile....then(fileContent => fs.readDir(...).then(dirContent => dirContent.contains(a)))
Which is, to me, borderline unreadable.
In JavaScript, you only have one thread. You really don't want to block it for any length of time. IO tends to take time. So IO is always async. There's no way to turn something async into something synchronous without blocking your thread waiting.
Other languages do let you block a thread to wait for an async task to complete, JavaScript doesn't mostly because there is only one thread.
hyperscript does away with the distinction between sync and async code by moving it all to the runtime:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
https://medium.com/front-end-weekly/callbacks-promises-and-a...
const a = async () => "hello world"
is the same as
const b = () => Promise.resolve("hello world")
> From what I understand, in JavaScript at least, putting `await foo()` inside an async function, splits the calling function in two, with the 2nd half being converted to a callback
I wouldn't say that's wrong, but it may be a slight over-simplification if your mental-model isn't complete. For example you can await from within a loop which would make function-splitting kind of a strange thing to think about. But yes, every `await` is a suspension point that will be resumed and the function carries on, so effectively the rest of the function becomes a callback, but let's call it a "continuation", which is the callback that continues from where we left off.
Of course it's implementation specific, but at a high level these continuations are basically normal functions which may return either the "incomplete" pair (new_awaitable, next_continuation) for `await new_awaitable` statements, or the "complete" return-value for `return` statements. (Maybe other things like exceptions, but let's not go there)
It's the event-loop/task-scheduler which keeps track of these callback/continuations: when one returns with a complete-value, the scheduler will go find the continuation that was awaiting it, and call that with the new value. If it returns a new (awaitable, continuation) object, the loop will start the awaitable and repeat until it returns a value, then resume the continuation with the value.
So you can think of await as "return to the event loop with awaitable and keep calling the callbacks until a final value is returned".
This is where the function coloring comes from: async functions need access to that event loop to keep returning to and continuing from, they know they're being called in a special way and return these special values for await/return that the event loop knows how to deal with; normal functions don't have access and just return values in the "normal" way.
Of course, to reap any benefit from this you have to have multiple tasks running "at the same time". To just await one thing which awaits one thing is not much different than a normal function call/stack. But when you're listening on sockets and fetching HTTP resources and making database queries and waiting for user responses, each executing simultaneously but on one thread (so no thread contention), that's when async shines. But note: they all must be talking to the same event loop which manages all these coroutines, running one at a time.
But what happens if "normal" functions could await?
Put another way: effectively de-color functions by making them all "async" and thus able to await. (Unless I'm misinterpreting the initial question)
From what I said above, all we have to do is change the implementation such that when a function returns, internally it actually this special "complete" variable type (with the real returned value inside). Then, with an event loop you get normal behavior and forward to whatever function had called it. Then you just add the awaits which return "incomplete" variables and you get waiting behavior.
This would work, BUT it introduces an overhead of going to the event loop checking if the return is "complete", finding the caller, and forwarding it for EVERY function call. Rather than the usual stack popping/register storing of standard languages. Optimizable? yeah probably, but it's not zero-cost, especially true in languages like Python which doesn't have an implicit event loop like JavaScript, so you're making huge changes to the implementation to begin with.
We could approach from the other side: returns stay the same, but the act of calling await creates an event loop or some object which does the "callback-the-callbacks" routine. I guess this would be the stop-the-world approach? This one function would keep looping through the callbacks until a final value reached. On second thought, this isn't different than a standard function call, because any internal `awaits` would setup their own loops and you always end up with a single returned value (or infinite loop). There's no way for multiple loops to running the continuations.
If you really don't like the keyword "async" before your function definition you could implement implicit async functions: just make any function in your language which includes an `await` implicitly become "async". But this only kicks the can down the road, as if you use await, that means you're returning continuations, and any callers need to await your value, and therefore your function is now implicitly colored.
And how do you implement the equivalent of
async function() { return 42; }
maybe? function() { return 42; await; }
Implicit event loop.Maybe this is the core of the question: for a language like JavaScript which already has an event loop, why can't we do
function foo() { let x = await whatever(); return x; }
console.log(foo());
The await in a non-async function could be implemented such that it goes out to the global event loop, "registers" itself as dependent on
the Promise returned from "whatever()" then sit and wait for the event loop to return the value. This should sound familiar to you as
the behavior of the standard async function / continuation business, but now we're trying to say that foo doesn't return a stream of continuations,
(it just takes a long time to run).While this may be doable, a problem arises if foo was called from within an async function. Part of "contract" of async programming is your function is run as usual between calls to await; at those points you return to the event loop and let other continuations run, but until the next await, shared/global variables will remain untouched because nobody else is running. If you called foo() which then starts running `whatever` on the event loop again any guarantees about ordering of events or state of variables may change unexpectedly.
JavaScript makes (1) the default. For any kind of object access (dog.legs[1].paw), assignment (const q = 5), math (Math.floor(4.2)), or DOM access in a browser (document.write('yay')), the interpreter will run each statement in sequence, completing each one before running the next.
Historically, the way ask the interpreter to "get back to me when you're done" was to use the "callback pattern," which was little more than passing in a function:
setTimeout(4000, () => console.log('second'));
setTimeout(5000, () => console.log('third'));
setTimeout(3000, () => console.log('first'));
When it came time to construct sequential-looking programs that involved lots of these calls, things could quickly get messy: http.get(catalogUrl, (err, catalog) => {
if(err) {
handleError(err);
}
http.get(catalog.products[0].url, (err2, product) => {
if(err) {
handleError(err);
}
document.write(product.name);
});
});
Promises flipped this callback style into an object. Async/await morphed those promise objects and methods into language syntax that looks like one thing happening after another: catalog = await fetch(catalogUrl).catch(handleError)
product = await fetch(catalog.products[0].url).catch(handleError)
document.write(product.name)
If the original question were "why are some APIs async and some not?" the answer would be: "JavaScript engine authors (web browsers, node) have decided that some things like network access are inherently too slow to pretend they happen instantly, and waiting on them would mean that the UI would have to become unresponsive, which would make users think something is broken. Other APIs like DOM access are fast enough to get away with freezing the UI and nobody notices."But the question was, "why can't you call an async function from a synchronous (non-async) function?" The answer there is more like, "Because synchronous functions are expected to return a result without waiting, and if you could call an async function but pretend you didn't, there's not a good solution for what should happen in composition with other non-async functions. Would you proceed without having a result? Would you force a wait anyways? None of those are acceptable answers because they break the model of 1) some things are nearly instantaneous and 2) some things take longer. We still need a mechanism to work with things that take a long or unknown amount of time, without making the UI freeze."
As lolinder points out in a sibling comment, this decision to split things into 1) fast and 2) not fast was ultimately a semantic decision of JavaScript engine/API designers. The downside of this decision is that it takes programmers some time to be comfortable with the unintuitive model. The upside is that it enables programs to be written by (fallible, mortal) programmers and not freeze up all the time.
So async/await work by having the compiler do a mechanical transformation of the function. If you have code that looks like this:
async function foo() {
let x = await bar();
await baz(x);
}
The actual code that gets generated is something like this (this is off the top of my head, forgive any transcription errors): function foo() {
let x = undefined;
return Promise.resolve(null)
.then(() => bar())
.then(function (val) {
x = val;
return baz(x);
});
}
In other words, the async and await keywords work together to tell the compiler how to properly transform the function into the necessary async API: the async keyword says that the function needs this mangling, and the await keyword tells the compiler where the possible re-entry point is.So, with that in mind, let's answer your main question:
> Why can't non-async functions use await()?
The shortest answer is that the semantics of doing the necessary transformation to support await in a function which doesn't opt into that transformation are hard to design and even harder to use.
Fundamentally, adding async to a function changes the API of the function [1]. In JS, it causes a function to return Promise In principle, it's possible to support such a scenario by making such an await just "run" the promise to completion. But here you run into issues with how the runtime handles asynchronous tasks. Is it possible to spin a nested event loop? Not necessarily, it depends on the runtime. If your synchronous call is already on a task (so there's basically an async function on the callstack), you would really want to basically inline all of the function calls so that the await causes the same transformation were it to be called directly in that async function. But such inlining is impossible (due to dynamic function calls). To really delve deep into the details here requires a fuller understanding of how asynchronous runtimes work in general, and of the particulars of the various implementations. Async/await is a wonderful innovation in that it allows programmers to write asynchronous code state machines as if they were generally writing synchronous code. But it doesn't do anything to change the deeper issues of how the task runtime system actually works, and in particular, the "correct" way to execute an asynchronous task in a synchronous context, which has multiple answers that are not wrong. [1] The fact that async functions have a fundamentally different API and ABI makes the "what color is your function" argument somewhat specious: these functions have color because they're different, and you have to be aware of the difference. Only by basically forbidding async (or, in some languages, sync) functions entirely can you make the difference appear to go away, but you'll notice that that doesn't exactly "solve" the issue.