HACKER Q&A
📣 wpietri

Let's compare Python dependency managers


A year or so ago I looked at python dependency management options and did a small bake-off. I ended up picking pipenv, but as our project is getting more complicated, am now questioning that choice. What are folks using successfully? What are its strengths and weaknesses for you?


  👤 the__alchemist Accepted Answer ✓
I use my own: Pyflow (https://github.com/David-OConnor/pyflow)

I use it because I'm not happy with the install-on-the-fly dependency resolution used by the others. This uses package-specified deps when avail (most wheels), and a custom database otherwise.

Also manages Python version / installations, and is a simpler process.


👤 fermigier
Been using Poetry for the last 2 years IIRC.

It works quite well for the use case covered by the alternatives.

I have great hopes for the upcoming versions, specially the "poetry bundle" command: https://github.com/python-poetry/poetry/issues/1992

See full roadmap here: https://github.com/python-poetry/poetry/issues/1856

Kudos to @sdispater (Sébastien Eustace) and the contributors !


👤 wpietri
Just to kick the discussion off, I'm building web stuff, and ML-related back ends with some CLI tooling. Docker's in the mix, too. I liked Pipenv's clear expression of dependencies and its separate lockfile so that we could be clear about intent versus specific dependency version used.

However, lately we've had real struggles with pytorch, which admittedly is an odd participant in the Python ecosystem, in that it has a variety of installables depending on the CUDA version you want. And lately I've lost many hours trying to figure out how internal ML library A and dependent CLI tools package B can express their dependencies in ways that they can be installed as regular pip packages from where they live in our private GitHub repos.

The most recent serious comparison I've seen since March is this one: https://frostming.com/2021/03-26/pm-review-2021/

That includes pipenv, poetry, and PDM. One of our data science folks has suggested using conda. That gives me the willies, as it seems too heavy and too differently targeted, but that's not a real reason to say no.


👤 monkeybutton
>figure out how internal ML library A and dependent CLI tools package B can express their dependencies in ways that they can be installed as regular pip packages from where they live in our private GitHub repos.

Does this mean that you are keeping forks of the projects in your repo?

What I've been using but is probably not best practice is: - A private PyPi instance (Nexus) for caching external packages - All our projects, services, everything live in a monorepo - All our code is developed as packages with a setup.py that lists the minimal required dependencies for each project - "builds" of services are docker images that copy the necessary sources and do a `pip install -e `

The downside of this is if requirements are changed, updating your personal development environment means trashing and remaking the environment (relatively quick) or rerunning `pip install -e` for the given project.

I would only recommend using conda if you are developing on Windows. Even today there's still packages missing binary wheels and compiling other people's code on Windows to build/install a package sucks.

At another company I've seen: - Each service and internally developed package had its own git repository - Projects used dependency links[1] to get the packages from the git

I didn't like this approach because it split up tasks and created a lot of PRs. In the first setup with a monorepo you could see all the changes in one PR.

[1]https://stackoverflow.com/questions/36544700/how-to-pip-inst...


👤 postpawl
I’d recommend using pip-tools. It doesn’t require any special tooling to install requirements. However, pip-tools will allow you to specify only your top level dependencies in separate files, and it will compile those into requirements.txt files with all dependencies (including sub-dependencies) locked.

More details: https://nvie.com/posts/better-package-management/

I’ve heard a lot of recommendations for poetry too.


👤 lazypenguin
We use poetry after pipenv stopped receiving updates for a while. Pipenv was just insanely slow at dependency resolution after a certain number of packages that we had to abandon it. Poetry gets the job done for us.

👤 PaulHoule
A bake-off between Python build tools would be like the US government having a bake-off between people who were trying to fly before the Wright Brothers.

The Wright Brothers didn't benefit from advances in engines and materials, rather, they were the first ones to realize what the requirement actually was to fly in an airplane and not tumble uncontrollably in the air. Most people either had no idea or they fooled themselves into thinking they could succeed without addressing a particular issue. The Wrights looked at the problem squarely and didn't think they could get away without doing something essential which is the main reason people fail to succeed at something new.

I think the problems that pipenv solves aren't the real problems. (Kinda the way that you were not in control of your environment and you added docker and now you have two problems.)

Poetry's dependency solving approach is almost right, but the fact that you can't really know what an egg depends on without running setup.py keeps it solidly in the 'almost' category.

I use poetry a lot, my only real complaint is that it is undisciplined to not be able to put your source code under "src/" and that poetry corrupts itself periodically, especially when you try to update it.

Pip's dependency solving approach is outright wrong. You can't do

  pip install A
  pip install B
  ...
an unlimited number of times because eventually the system will have installed package C4 which is incompatible with M3 and it would have to go back and work out the effects it has on the build and follow a much more complex path of installing, reinstalling, uninstalling. I suffered with a long time before I finally realized exactly why it couldn't possibly work.

Pyflow mentioned in this thread is getting close.

---

Something missing from this is that Python still has footguns that can defeat you and your tools.

For instance the user-install directories are visible to all the Pythons you have installed so if somebody feels entitled to do

   pip install --user A
that will trash all of your Pythons (virtualenvs, conda, whatever...) unless those Pythons have the toxic userdir function enabled. (Wasn't toxic in the day when you could live with one "python" and there was no "virtualenv".)

Also the industry is finally realizing that it is toxic for the default charset for Python/Java/C#/... to be configurable (not utf-8 and only utf-8 and always utf-8 and never anything else) because if it is configurable it will inevitably be configured wrong sometimes.

Related to that there is a flag that will crash your program if you ever "print" a character which doesn't exist in UTF-8. Too many people will say "easy, be careful what you print()" but if you

  pip install enough-things
some of those things will use print and someday your program crashes and you'll be wondering what you can do about.

The stock Python is getting better in these regards but there was a time I was looking into building it myself so I could remove these featurebugs. (That's my beef with pipenv is that unless I have a non-defective Python to install with it, it will install a defective Pytjon)