1 - Python shipped with OS X/Ubuntu
2 - brew/apt install python
3 - Anaconda
4 - Getting Python from https://www.python.org/downloads/
And that's before getting into how you get numpy et al installed. What's the general consensus on which to use? It seems like the OS X default is compiled with Clang while brew's version is with GCC. I've been working through this book [1] and found this thread [2]. I really want to make sure I'm using fast/optimized linear algebra libraries, is there an easy way to make sure? I use Python for learning data science/bioinformatics, learning MicroPython for embedded, and general automation stuff - is it possible to have one environment that performs well for all of these?
[1] https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793
[2] https://www.reddit.com/r/Python/comments/46r8u0/numpylinalgsolve_is_6x_faster_on_my_mac_than_on/
Simply using pipenv and pyenv is enough for me (brew install pipenv pyenv). You don't ever have to think about this, or worry you're doing it wrong.
Every project has an isolated environment in any python version you want, installed on demand. You get a lockfile for reproducibility (but this can be skipped) and the scripts section of the pipfile is very useful for repetitive commands. It's super simple to configure the environment to become active when you move into a project directory.
It's not all roses, pipenv has some downsides which I hope the new release will fix.
When I need a temporary environment to mess around in, then I use virtualfish by running `vf tmp`: https://virtualfish.readthedocs.io/en/latest/
- brew python for my global environment.
- Create a virtual enviroment for each project (python -m venv).
- Use pip install after activating it (source venv/bin/activate)
If you need to work with different versions of python replace brew python with pyenv.
For data science/numerical computation, all batteries included. It also has fast optimized linear algebra (MKL) plus extras like dask (paralellization), numba, out of the box. No fuss no muss. No need to fiddle with anything.
Everything else is a "pip install" or "conda install" away. Virtual envs? Has it. Run different Python versions on the same machine via conda environments? Has it. Web dev with Django etc.? All there. Need to containerize? miniconda.
The only downside? It's quite big and takes a while to install. But it's a one time cost.
alias cvenv='python -m venv .venv && source .venv/bin/activate && pip install --upgrade pip setuptools > /dev/null'
Also, as others have mentioned, if you're not in a virtual environment, only ever install pip packages to your user location, i.e. `pip install --user --upgrade package_name`But the first thing to do before any of that is to make sure you know which python binaries you're running:
which python
which python3
which python3.7
etc...
You can use the following command to set your default Python:
sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.7 9(that sets python3.7 as the default when you just type `python`)
https://docs.python.org/3/using/unix.html#building-python
Next, install pip for your user!
`curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py`
`python get-pip.py --user`
Now you should have `python` which is the system one and `python3.x` for alternate versions. Installing modules from pip can also be harmful, so always use `pip install --user` for tools written in python (like aws cli). Those tools are installed in ~/.local/bin, so make sure to add that to your PATH.
Next, you can use virtual environments to prevent your .local/bin from becoming bogged, if you are dealing with many dependencies from different projects. A nice virtual environment management tool helps a lot. Take your pick, virtualenv + wrapper, pyenv, pipenv, conda... whichever you choose, stick to virtualenv for development!
In a nutshell
1. never touch your system python!
2. install tools from pipy using pip's --user flag
3. install different python versions using make altinstall, if building from source
4. virtual environments for development
This should be enough to keep your Linux installation healthy while messing around with python.
Also, docker containers could simplify this whole process, although this is another tech you are adding to your stack
Don’t use the system Python, brew-installed Python, it the official Python installer, if you can avoid it (you’ll want your Python to be up to date and self-contained in your home folder as much as possible)
I recently had to reinstall a windows dev environment. I didn’t even install standalone python, just conda.
For each new project I create a new conda env, activate it and the. Install stuff with pip. I use pipreqs to create requirements files for the environment before checkin to git to help others who may want to checkout and install.
I just prefer the cli and syntax of conda over the other environment managers. I use pip to install packages over conda as it has more packages and I don’t want to remember the Chanel names that may have the package I want.
Linux: Typically using the package manager, or the AUR if I'm not using the latest version. At the moment, I'm using 3.6 as my "universal" version since that's what ships with current Ubuntu.
Windows: Good ol' fashioned executable installers, though there's really no reason I don't use chocolatey other than sheer force of habit.
macOS: I don't do enough work on macOS to have a strong opinion, though generally I just use the python.org installer. I don't think there's that much of a difference from the Homebrew version, but I could be wrong on that front. As an aside: IIRC, the "default" python shipped with macOS is still 2.X, and they're going to remove it outright in 10.15. So I wouldn't rely on that too heavily.
As for other tooling, IME pipenv and poetry make dealing with multiple versions of Python installed side-by-side much easier. I have a slight preference for poetry for a variety of reasons, but both projects are worth checking out.
Finally, at the end of the day, the suggestions in here to "just use Docker" aren't unreasonable. Performance between numpy on e.g. 3.6 vs 3.7 or clang vs GCC likely aren't that significant, but if you create a Docker environment that you can also use for deployment you can be sure you're using packages compiled/linked against the same things.
If all of this sounds like an unreasonable amount of effort for your purposes... probably just use Anaconda. It's got a good reputation for data science applications for a good reason, namely removing much of this insanity from the list of things you have to worry about.
This doesn't (seem to) distinguish your system python (i.e., the ones that'll run in a default execution environment if you run "python" or "python3") and project-specific pythons.
For the latter, I guess it's common to use something like virtualenv or a container/VM to isolate this python instance from your system Python.
Personally, I use Nix (the package manager and corresponding packaging language) for declaring both my python system and project environments.
Nix still takes some work to ease into, but I think your question suggests enough understanding/need/drive to push through this. Costrouc (https://discourse.nixos.org/u/costrouc) may be a good person to ask; I can recall him making a number of posts about data science on Nix. I also noticed this one, which seems related to what you're looking into: https://discourse.nixos.org/t/benchmarking-lapack-blas-libra...
(FWIW: there are tools to create Nix expressions from requirements files, so it isn’t necessarily all manual work.)
Every couple months I nuke all my environments and my install and start fresh out of general frustration. It’s still the least bad for a scientific workflow IMO (having different environments but all globally accessible rather than stuck in specific directories is nice).
I wouldn't use the python shipped with osx because it tends to be out of date rather quickly, and it doesn't ship with libreadline.
Usually the python shipped with Ubuntu is good if it has the version you want.
If you compile it yourself from source you are in for a bad time unless you know how to install all the dependencies. Make sure you turn on optimizations when running ./configure or it will be significantly less fast.
Once you have python installed, install pip. Try the following until one works (unless you use anaconda, then you won't use pip most of the time, and I think you can just `conda install pip` if it's not there by default):
$ sudo pip install -U pip (for python3: sudo pip3 install -U pip)
$ sudo easy_install pip (easy_install-3 or something like that for python3)
(after, if those don't work, that you need to get it from somewhere else, most likely by installing python-pip via apt or whatever the package is in brew or whatever you are using)
When you want to install something globally, do NOT do `sudo pip install ...`, this can break things in a way that is hard to repair, do `pip install --user ...`, and make sure the path that --user installs are installed in is added to your $PATH. I think it's always ~/.local/bin/
If you are working on a project, always use a virtualenv, google for more details on that.
- If building a one off system it really doesn't matter, so it's whatever happens to be easiest on the system I'm deploying it on.
- If I'm adding software to a production system, especially one that will be exposed to the outside world then it's use the distro libraries - no if's or butts. The reason is straightforward: I _must_ get automatic security updates to all 3rd party software I install. I do not have the time to go through the 10's of thousands of CVE's released each year (that's 100 a day), then figure out what library it applies to, then track when they release an update.
- The most constrained is when I am developing software I expect others I have never met will be using. E.g. stuff I release as open source. Then I target a range of libraries and interpreters, from those found on Debian stable (or even old-stable occasionally) to stuff installed by PIP. The reason is I expect people who are deploying to production systems (like me) will have little control over their environment: they must use whatever came with their distro.
In addition, I use pyenv (pyenv) for development. I always install the newest Python version with eg `CONFIGURE_OPTS="--enable-optimizations" pyenv install 3.7.4` which IME consistently produces a ~10% faster Python compared to the various package managers. This makes it really simple to include in each project a `.python-version` file that tox will happily use to facilitate testing on multiple versions. As above, each project gets its own venv from the pyenv-based Python (I usually do the development using the most recent stable, but if you wanted to use a different version for the venv and development and just test against the most recent stable with tox that should also work).
However, my personal struggle with Python projects is when I work on projects that I also use my self on a daily basis. I have a tool that I wrote for automating some administrative stuff at work, and every so often I make some minor fixes and updates. So, I both use it and develop it at the same machine. I've never figured out how to properly do this with a virtualenv. I.e., I want the tool, say "mypytool", to be available and run the most recent version in any shell. At the same time, I would prefer not to install the requirements globally (or for that matter, at the user level). I would love to hear some suggestions on how to solve this use case.
- Install Python through Brew.
- Upgrade pip using pip. Install all further libraries using pip. Never install Python packages through brew.
- Use a plain virtualenv per project/repo.
A long time ago I wrote some bash that I keep in my `.bashrc` that overrides the `cd` command. When entering a directory, my little bash script iterates upwards in the directories and activated any virtualenv it finds. I also override the `virtualenv` command to automatically activate the env after creating it.
I am aware there are existing tools out there that do this. Pyenv does this right? But I never got used to them and this keeps things simple enough for me. I cannot forget to enter a virtualenv.
As I said, I am probably nuts. I also don't use any form of IDE. Just plain VIM.
For the specific use-case of installing executable Python scripts, pipx [0] is the way to go. It creates a virtual environment for each package, and lets you drop a symlink to the executable in your `$PATH`. I've installed 41 packages [1] this way.
[0]: https://pipxproject.github.io/pipx/
[1]: https://gitlab.com/Seirdy/dotfiles/blob/master/Executables/s...
If you’re on a Mac, just use the brew installation (brew install). If you’re on some type of prod/containerized setup, use apt’s python (apt-get install).
I would not recommend building Python from source unless you _really_ know what you’re doing on that level, as you can unintentionally shoot yourself in the foot quite a bit. From there just using a virtualenv should be pretty straightforward.
In this way, you’re letting the package managers (written by much smarter people than you and I) do the heavy lifting.
What I've settled on is to use a python version manager (pyenv is the least intrusive balanced with most usable) and using direnv to create project-specific python environments. Adding `use python 3.7.3` to an `.envrc` in a directory will make it so cd-ing into it will create a virtualenv if it doesn't yet exist and use the pyenv-installed python at 3.7.3.
- pip + requirements.txt I find is more than acceptable
- wheels > source when it comes to installing / distributing packages
I was a huge fan of pipenv in the beginning but it seems to have stagnated as 2 years later it's still slow as hell. I now deal with pip & pip-compile to pin versions.
I use it as pvenv project_dir_name
function pvenv() {
mkdir $1
cd $1
python -m venv .
source ./bin/activate
pip install --upgrade pip setuptools
}
Separately I also use anaconda if I need all the scientific packages + Jupyter etc.
2. pipenv (with PIPENV_VENV_IN_PROJECT env var set) for managing project dependencies / venv
That's basically about it.
1. Pick Python binary / Official Docker Image
2. Use virtualenv for Developer Boxes
3. Well defined Pipfile / requirements.txt
4. Avoid most binary package building. Prefer those with readymade wheels. But if it has to be built, I prefer to make static libraries, use them to make static ELFs / modules, as opposed to /path/to/some.so, and bundle them / ship them as wheels for both development / production.
Again, you can set up simple Dockerfiles / scripts to do this for you.