My current list includes:
1. Numerical Optimization by Jorge Nocedal Stephen J. Wright
2. Algorithms for Optimization - introduction to optimization with a focus on practical algorithms
3. Algorithms for Decision Making - a broad introduction to algorithms for decision making under uncertainty
[2] https://algorithmsbook.com/optimization/
[3] https://algorithmsbook.com/
I'm a Ph.D. student in operations research (OR). My suggestion would be to first build a strong foundation in linear programming. This will introduce you to the notion of duality, which is heavily emphasized in many mathematical programming courses. Here's a good open-source book on linear programming written by Jon Lee, the current editor of Mathematical Programming A: https://github.com/jon77lee/JLee_LinearOptimizationBook
Then I'd suggest studying more general methods for continuous and convex optimization. The book I see mentioned a lot is Convex Optimization by Boyd and Vandenberghe, although we didn't use this in our coursework. Instead, we used a lot of the material presented here: http://mitmgmtfaculty.mit.edu/rfreund/educationalactivities/
If you read the above (or any other two books on linear programming and convex optimization), you'll probably have a better idea of what you want to study next and how you want to go about it. The next natural step would be to study combinatorial (i.e., integer or mixed-integer) optimization. (Jon Lee has another book on this subject; I've also heard good things about the Schrijver book.)
"Convex Optimization" by Boyd & Vandenberghe.
However, beware that both are grad textbooks. They can be tough going at times. Unfortunately, I never found undergrad textbooks I liked much, for theory.
If you're interested in discrete optimization too (the other half of math optimization), the classics are:
"Optimization Over Integers" by Bertsimas & Weismantel
"Integer and Combinatorial Optimization" by Nemhauser & Wolsey
I also read "Practical Optimization" by Murray/Gill, which is interesting because it has a lot of conversational coverage of e.g. corner cases, stuff that most textbooks won't cover.
That will cover the expected baseline of almost everything you'll encounter in the convex smooth continuous domain. I don't have great answers for moving past that.
This is the canonical introduction book recommended by Gurobi. I've found it to be great for those getting into the subject. It has math of course, but the focus is on going over the basics of LP, integer, nonlinear, Mixed-Integer...etc, followed by lots of examples. I think it's the best book to start with to get a feel for the subject of OR, before diving into the harder books.
[1] https://github.com/CamDavidsonPilon/Probabilistic-Programmin...
- R. Schneider, Convex bodies: the Brunn-Minkowski theory. The first two chapters are an excellent introduction to convex geometry (plus a little bit extra!) if you have some undergrad-level analysis.
- Hiriart-Urruty and Lemarechal, Fundamentals of convex analysis. This book has been highly recommended to me but I've never used it. Might be an easier go than Schneider for convex geometry.
- Golub and van Loan, Matrix computations. Excellent book on numerical linear algebra.
- Bonnans, Gilbert, Lemarechal, Sagastizabal, Numerical optimization: theoretical and practical aspects. This book has a detailed description of bundle methods, which are important and in my opinion underutilised.
- I. Maros, Computational techniques of the simplex method. This is the only book I'm aware of that discusses how to build a working implementation of the simplex method for linear optimisation.
I'm not aware of any books that cover line search algorithms in detail. These are important in implementations but, beyond discussing the Goldstein and Wolfe conditions, generally glossed over in prose. Even in the absence of stalling and numerical difficulties, you can see an order of magnitude speedup from replacing a bad line search with a good one. One line search algorithm I've had success with is described in More and Thuente, Line search algorithms with guaranteed sufficient decrease.
Lots of tacit engineering knowledge goes into building a fast and robust optimisation code. Some of that knowledge gets forgotten when code is rewritten or ported from one language to another.
Mercifully, a lot of that engineering knowledge has been encoded into freely-available optimisation code. Quite a bit of that code is pretty readable. Off the top of my head, I've learnt things from:
- Liu and Nocedal's Fortran L-BFGS implementation,
- The CUTEst problem collection,
- Chih-Jen Lin's LIBLINEAR and LIBSVM,
- Lin and More's TRON,
- Csaba Meszaros's BPMPD,
- Jacek Gondzio's HOPDM,
- The GNU Linear Programming Kit,
- and probably quite a few other sources!
It is the clearest, most in depth, introduction from "zero to hero" for optimization. Mostly from a math perspective, useful for many things outside of ML too!
https://www.amazon.com/Foundations-Applied-Mathematics-Appro...
A couple of other suggestions:
Nesterov's Introductory Lectures on Convex Optimization. This one is pretty tough sledding, but I found the perspectives in the first chapter particularly to be enlightening. It seems like there's a newer Springer book which is probably an expansion on this.
Bertsekas's Nonlinear Programming. Bertsekas has written a lot of books, and there's a fair amount of overlapping going on. This one seemed to be the one that has the most nuts and bolts about the basics of optimization.
EDIT: If you want more understanding of convexity beyond what's presented in these books, Rockafellar's Convex Analysis is helpful.
https://epubs.siam.org/doi/book/10.1137/1.9781611973655?mobi...
It was easily one of the top 3 courses I took and heavily based off of this text book:
https://www.amazon.com/Spreadsheet-Modeling-Decision-Analysi...
(all generalizations are wrong to some extent and the delineations are not strict, but I have noticed they have different cultures -- analogous to how Breiman observed that there was two cultures to statistical modeling [1])
In the OR camp, the focus is primarily on linear programming/mixed integer linear programming. The types of problems solved include transportation, assignment, scheduling type problems. OR folks tend to go very deep into the theory of linear programs (matroids, Benders decompositions, etc.) and the literature is absolutely rich with advances in linear optimization. OR folks tend to be linear optimization specialists and mathematicians. A good practice-oriented intro book is Winston's "OR: Applications and Algorithms". Chvatal's Linear Programming is also good. But this is not my space, so I'll leave OR book recommendations to others.
In the Engineering camp, while folks do use linear programs, they also tend to go more in the direction of convex programs and general nonlinear/nonconvex programs (including mixed integer nonconvex nonlinear programs). There are some strong theoretical results for convex programs, but the results (naturally) aren't as strong for nonconvex problems -- global optimality never guaranteed. In the nonconvex camp, practitioners tend to concern themselves with heuristics/techniques like finding good initializations, understanding SSOCs, etc. The types of problems solved range from anything from ML problems to dynamic plant/machine optimization. Most folks in this camp tend to be engineers rather than optimization specialists (though some do become specialists eventually). Nocedal and Wright is a classic for general nonlinear programming, but also look into Biegler's Nonlinear Porgramming. Boyd and Vanderberghe is a classic for convex optimization. Murray and Gill's Practical Optimization is a bit outdated (so don't rely on it for state-of-the-art knowledge), but it has tidbits of insights about optimization that aren't found in other books and that continue to be useful.
I agree with thxg, there are few undergraduate textbooks that I've liked.
Alexander Schrijver [1] has lecture notes on Combinatorial Optimization on his website [2]. He also has an affordable 1800 page three volume set of books "Combinatorial Optimization - Polyhedra and Efficiency" [3], although I would say it is better suited as reference material because it is quite densely written.
There is also the classic book "Combinatorial Optimization - Algorithms and Complexity" [4] by Papadimitriou (Bill Gates' MSc thesis supervisor) and Steiglitz that is a nice introduction to the topic as well.
"In Pursuit of the Traveling Salesman - Mathematics at the Limits of Computation" by William J. Cook [5] is a more popular science book on the history of the Traveling Salesman Problem, that also explains how linear programming is used in the state of the art solvers, but is of course focuses on a very specific problem. There is also a book that contains all the scientific and mathematical details by Applegate, Bixby, Chvátal and Cook [6] if that is preferred.
In recent years, there is a trend that mathematical optimization researchers work more with ML. In particular Dimitris Bertsimas has done some work on the intersection of those area's in recent years [7] and apparently has a book on the topic as well [8] (but I am not familiar with it).
[1] https://homepages.cwi.nl/~lex/ [2] https://homepages.cwi.nl/~lex/files/dict.pdf [3] https://www.springer.com/us/book/9783540443896 [4] https://www.amazon.com/Combinatorial-Optimization-Algorithms... [5] https://press.princeton.edu/books/paperback/9780691163529/in... [6] http://www.math.uwaterloo.ca/tsp/book/index.html [7] https://dbertsim.mit.edu/papers/ [8] https://www.dynamic-ideas.com/books/machine-learning-under-a...