Nevertheless, I feel they are sometimes overused, and maybe we need to think about guidelines, so we can have a better balance between clarity and verbosity.
We learned from experience that clear code doesn't need comments. Functions and variables with good naming do not require comments. For example, is the following comment useful?
# Adding a and b and putting results in c c = a + b
Not saying that comments aren't needed at all. In fact they are essential for the subtle details not that clear. Or as John Ousterhout put it, “Comments augment the code by providing information at a different level of detail.”
Now with type hints, I want to argue the same, obvious stuff don't need to be annotated. Here are some examples.
- args and *kwargs: We all know that these are lists and dictionaries respectively.However, I've seen code where they are annotated still. - dictionaries: Like 99% of the time, the dictionary keys are strings, yet I've seen them being annotated with Dict[str, float] or such. Sure, maybe you are dealing with a case where the dict keys are something else, but then I'd argue that the reader needs to know more about that dict then and what it carries then its types. - When a function doesn't return anything, should we still annotate it with None, or we are kinda stating the obvious here. - When the variable name is obvious. Do I still need to tell the type if the variable is called timestamp, days_since_creation, is_valid, num_item_sold, currency, or amount_paid? - All data scientists use df to refer to DataFrame, the name df should be enough to infer the type and a type hint here will be just noise?
The point is, I want to know what do people smarter than me think? Maybe, I am wrong about it, and type hints still offer some value that I miss here.
Finally, one might argue that obvious types hints need to be stated still to be enforced by tools like mypy or so. That's a different argument, and personally, I feel like type checking is a way to hack a dynamic language like Python to become Java. So, I'm only interested here in the documentation value of type hints.
I’ll figure out the bug eventually either way, but having the compiler give me the clearer warning earlier in the development process (sometimes even from pure static analysis) is a significantly more pleasurable development experience and a potential time saver
And 50X that, especially the time saved, if I’m integrating with code you didn’t author.
You mention that you're not interested in typechecking, but to be honest, why bother with restricted format of stuff from `typing` if you don't care about automatic processing of these? You could just as well emulate math notation with ASCII, use LaTeX or simply descriptions. Though at that point, are they actually more than a traditional documentation?
I'm the author of Robust Python, and the first quarter of my book is all about type hinting, so suffice to say I've thought about it a lot. What follows is strictly my opinion on things.
I absolutely agree that there needs to be a balance between verbosity and clarity. Obvious things such as some variable names probably don't need to be annotated, but I think we disagree with what an obvious name is. When I see a variable named timestamp, is that an int, or a datetime? When I see days_since_creation, is that an int or a timedelta? When I see amount_paid, is that an int, float, decimal.Decimal or some custom type? The only way I can answer these questions is look at how it's used. For variables that may be clear, but for things like parameters in a function it's less clear. I have to look at calling code to see what people can pass in, and I have no guarantee that all the calling code is even visible to me. In these cases, I will happily trade off the verbosity of a type annotation so that I can have a better chance of correctness as more people work on the code over years. Now, if the project is small, maintained by a single developer or something like that, then I would say type hints can just be added noise. (but if at anytime you need to change the name of a variable to be more clear, such as "amount_paid" -> "amount_paid_decimal", I would encourage a type annotation instead).
I also think it's a mistake to treat type checking as a hack to feel more Java-like. I feel like the more appropriate counterpart is TypeScript (Guido has talked about it before: https://developers.slashdot.org/story/21/05/22/0348235/what-...). I believe type annotations (along with type checkers) was introduced to Python to solve an absolutely real problem with dealing with the robustness of codebase when you have multiple people working on it (across multiple years, even decades even).
All in all, yes, I do not think you should be type annotating everything, but I also am wary about type annotating too little. All of this advice is context-dependent; there is no one right answer depending on project size/value. But if you want to increase communication and lessen the amount of mistakes someone can make as they modify your code (especially if they never get the chance to talk to you), I whole-heartedly endorse type checking and type annotations. After all, your code is going to be read much more often than written, and I'd rather optimize for the unfortunate souls who have to read my code later on. I'll happily pay the penalty now for writing out a few more characters.
> I want to argue the same, obvious stuff don't need to be annotated
The first misconception is that type hints are something optional: they are not. Every single Python variable, function argument and return are annotated - it's only that, if there is no explicit annotation, it is assumed to be `Any`.
As we know, Python is strongly-typed: every object has a clearly defined type, even if it's the default `object`. However it is also dynamically-typed, meaning that variables do not constrain which types they can contain; so every variable can contain anything.
> Python type serve as extra documentation
The second misconception is that the annotations are there to document which types a variable is supposed to contain. I mean yes, that is definitely a welcome effect, but their primary purpose is to enable static analysis, i.e. to validate that there are no unexpected collisions between types when executing a program.
The main difference between Python and statically-typed languages like C++ or Java is that, in Python, the static analysis step is optional - if you don't run `mypy` or another type checker before deploying, your program will still happily execute as if there were no annotations at all.
> balance between clarity and verbosity
Another misconception is that type annotations inevitably make your code more verbose. In fact, you can have all the benefits of the typing annotations without having a single type hint in your code.
Python typing specifications allow you to expunge all your typing annotations into separate files, and keep the main code clean of them. For example imagine a source file, let's call it `foo.py`, which looks like this:
def foo(bar, baz):
return f"result: {bar + baz}"
class Bla:
def lalala(self):
return 42
Now you can have a separate file, named `foo.pyi`, which looks like this: def foo(bar: int, baz: int) -> str: ...
class Bla:
def lalala(self) -> int: ...
Now, when you run `mypy`, it will use that second file to validate the typing information. This is called a "stub file", and you can read more about them in PEP 484 [0].A few more misconceptions:
> - args and kwargs: We all know that these are lists and dictionaries respectively.However, I've seen code where they are annotated still.
Annotating args and kwargs are not about annotating those variables themselves, they are about marking their contents*. So if you annotate `kwargs` with `dict`, you're telling `mypy` that you expect the value of each keyword argument to be a dictionary, which is probably not what you wanted.
> When a function doesn't return anything, should we still annotate it with None, or we are kinda stating the obvious here.
Yes, because without an explicit annotation the default is `Any`, which is different from `None`. Specifically, as explained in the `mypy` documentation [1], an unannotated function is considered to be "dynamically typed", and some internal typing errors might go unchecked.
Again, for the purposes of static analysis nothing is "obvious".
> When the variable name is obvious. Do I still need to tell the type if the variable is called timestamp
Even if we look at annotation as simply documentation - does your variable contain a datetime object, a numeric timestamp (e.g. Linux epoch), or maybe an ISO 8601 formatted string?
> I feel like type checking is a way to hack a dynamic language like Python to become Java
As described above, the only difference between Python and Java (when it comes to typing) is that the latter enforces static analysis at compile time. Python, being dynamically interpreted, would require this analysis to happen on every run, which would make most programs completely unusable, so this analysis is completely optional. Yes, at runtime a Python function can be passed a type it does not know how to handle; but this is also perfectly possible even in statically typed languages.
Static analysis is a powerful tool, and many static language proponents insists that they give you enough correctness validation that you don't need unit tests. But it is only a tool, and it is a great benefit to have an option of having it when desired and ignoring it when practical; most languages don't have that luxury.
[0] https://www.python.org/dev/peps/pep-0484/#stub-files
[1] https://mypy.readthedocs.io/en/stable/getting_started.html?h...