HACKER Q&A
📣 bilekas

Tutorials written with heavy dependencies


I am quite stubborn in a lot of ways but one in particular is when I'm guiding a team member on something I like to explain the workings so that there is a high level understanding of how things are working.

In the last couple years, I've been dipping my toes into other areas for 'hobby' time and wanting to know how things I use & like work.

A great example is the machine learning: An immediate google gets you as far as 'install these 10x libs' then write this.

When you dig into the OS of those libraries they're overwhelming and the documentation is never focused on the underlying functionality, which I personally am giddy to learn.

I find myself having to go with trial and error, I hate this because the wheel has already been invented. Maybe I'm missing a resource.

It feels these tutorials are just tutorials in libraries.

I know source code IS the workings, is there a resource other than source code I'm missing ?


  👤 PeterisP Accepted Answer ✓
In many fields you're expected to work at some higher abstraction level above "how things really are working", in which case all the tutorials will use the primitives of that abstraction level.

In this case, if you want to understand how stuff works, you should explicitly look for things that are not labeled "tutorials" - often textbooks will be a decent example, covering the principles and theory behind these abstraction layers which you'll then use in practice.

Like, in ML there are books which work through a basic implementation of all the algorithms using just the matrix multiplication primitives of matlab or numpy, and that works well as a learning exercise, but in practice everyone would rather use a highly optimized (but thus complicated less understandable) library maintained by others.

Similarly, in cryptography, there are textbooks which will work you through an implementation of the core algorithms, but again, a tutorial teaching how to do stuff in practice would not (and definitely should not) cover making your own implementations for cryptographic operations, but rather describe how to use a properly verified library.


👤 gcanyon
This is one of the reasons I still use LiveCode despite middling performance, near-zero library support, and a language spec that was reasonable in 2000: it has zero dependencies. You can:

    Go to the web site
    Download a single installer
    Run that to produce a single file executable
    Run the app
    Create a new project (you get a window for free)
    Drag a text box from the tool palette onto the window
    Switch to the arrow tool
    Click the text box
    Type "Hello world"
    Save the project to disk
    Select Standalone Application Settings on the File menu
    Check mark to build for MacOS, Windows, and Linux
    Select Save As Standalone Application on the File menu
...and you get single-file executables for three platforms. That was maybe 13 steps from nothing to multiplatform Hello World. It stuns me that other environments/languages make it harder than that.

Many years back I used to do demos for LiveCode at trade shows where I would build a stopwatch timer while holding my breath.

These days it would be much better in so many ways to be working in Python. But the lack of an environment like LiveCode is a major pain point.

https://livecode.com in case anyone is curious.


👤 ChrisMarshallNY
I remember attending a meetup, once, where it was supposed to be a tutorial on GraphQL. I was barely familiar with GraphQL, when I went in, and left, almost exactly the same.

The tutorial was actually all about a couple of JS libraries that you could use as GraphQL abstractions. I am a Swift programmer, and was a lot more interested in the actual GraphQL interface, which was barely mentioned.

Lot of that stuff, going around...


👤 blahblah1234567
OP, An analogy for your question/concern:

You want to add 2 numbers.

So, I hand you a calculator.

That calculator has about 1000 dependencies.

I didn't hand you petroleum oil and copper wire, and say "Oh, first, do all these prerequisite processes to manufacture the inputs you'll use to build a calculator. Next, build a calculator. Ok, now you can add two numbers together"

(We could even go a step back from that-- I hand you some steel components, a forge, and some people (labor and engineers), and give you a tutorial on how to make a foundry to create the parts to create and oil rig, and to use to create components for bull dozers, which you then manufacture and use to mine copper ore, which you then process into wire... and so forth)

Do you want to:

- build a foundry to build tools for mining

- mine minerals (iron for tools to make other tools with, oil for plastic, copper ore for wires, etc.)

- build a factory for making calculators

- manufacture a calculator

Or do you want to

- Take this calculator, and add 2 + 2?

Perhaps you'd rather build the tools. Or perhaps you'd rather use the tools to solve a business problem.

Personally, I'm the developer who prefers to solve the business problem, as business strategy & product management interests me more than hardcore science/engineering.


👤 steeps
Machine learning is an enormous field, and if you are after a explanation/exploration from the ground up, then Kevin Murphy's Probabilistic Machine Learning: An Introduction is very good, but a bit of a tome.

If instead, you want to focus on neural networks, I found Michael Nielsen's Neural Networks and Deep Learning an excellent resource for implementing them from first principles (available at http://neuralnetworksanddeeplearning.com/).


👤 pixel_tracing
This may not be a sexy idea but one example I can thing of Tom Mitchell’s Machine Learning book, it gives the bare bones of what machine learning is (albeit it is missing some of the newer architectures) but that gives a great overview of how to build these architectures from scratch

👤 thiazikara
I can certainly relate to it. I used to follow tutorials and stuff on ML when I was near end of high school but I just could not shake the feeling that I don't know lot of things going around

It's been 2.5 years into college, and yeah I did laze at many times (college standards here are not particularly rigorous), and I am nowhere nearly capable to reason about numerical calculations considerations, Statistical methods, architectural development at the hardware level for scientific computation, etc. There's so much that goes into it, there's so many layers, components, theories, etc that you get lost very soon

At end all you can really do 1. Grab a statistics book and get into ML theoretically

2. Learn about numerical computations, I particularly enjoyed the Handbook of Floating Point Arithmetic, but I never really finished it

3. Libraries have lot of optimizations in it pertaining to the specific architecture, and in fact, I remember that someone gave a demonstration of using numpy and he faced an error, which had to do Windows itself : ) You will get to see lot of such exceptions of course in the code too... idk what to recommend here really, just, read more?

4. Documentations can be wrong at times, or fail to mention some assumptions, or there might not be one at all really. Software just didn't see massive adoption of rigorous frameworks, like in many other disciplines, and soon got surrounded by business needs and customers' complaints. But even if you can somehow get an insight to philosophy, the values they put into their code, etc., it's a huge help imo. Books provide it at times, for example I was struggling with SYCL specification, so I grabbed this "Mastering DPC++", tho it also assumes a bit of experience


👤 landosaari
For ML not in python: maybe these videos will help [0] the datasets are here [1]

The author walks through the basics (Linear regression, Naïve Bayes, etc) using Julia. The parameters and output are better explained than what I have found with python equivalents.

[0] https://www.youtube.com/playlist?list=PLhQ2JMBcfAsi76O13sJzk...

[1] https://github.com/fabfabi/julia4ta_tutorials/tree/master/Se...


👤 Joel_Mckay
In general, most documentation is infamously:

1. outdated, and filled with deprecated syntax or abandoned bugs

2. version dependent, and thus pointless to read or write

3. platform dependent, and thus also falls under point #1 or #2 with time

4. poorly written, as most rarely read/update the documentation

While tools like Doxygen attempt to fill the reference holes, in general a lot of effort is made to create unit-tests/examples of how software should be integrated.

The experience can be unpleasant if you are new to a large library. The major downside of Open-Source projects is usually the RTFM dismissive attitude, as many people are not here to provide "free" support rather than solve/share there own use-cases. If you can show a unit-test that highlights a specific issue, than there may be mutual interest from project members... but you need to prove you are not asking people to search google for you.

In general, if you are at the "trial-and-error" stage, than looking at another active/well-documented project that uses the same key library in a similar use-case to your own will often be faster (example: search deprecated kernel api calls in utilities for compatibility details).

Another point I will mention, is thinking about the long-term sustainability choices for a project. In general, splitting up dependencies into small piped/ipc/rpc/ClMPI/AMQP utilities is wise. That way when someone unwisely permutes a library API like they often do for various reasons (rarely good reasons for a shared object), the affected area needing maintenance is minimized (i.e. the next person only needs to read 3 or 4 familiarly structured documents to securely refactor the module.)

If you are more interested in the algorithmic side, than an optimized library is probably the wrong place to start. Rather, pull the published paper(s) for the algorithm, and look at the published history (i.e. the datasets and ROC curves especially detail what to expect). Prototyping languages like Python/Julia/Octave/MatLab are often the language of choice in this area.

Best of luck =)


👤 tmtvl
There is a source you may be missing: the human factor. Try talking to people involved with the project and ask them for any insights they're willing to share.

👤 PartiallyTyped
> is there a resource other than source code I'm missing ?

First, you should understand the math you need to be able to implement the algorithms, then you should learn the algorithms. You may "get" the code, but you will never understand it if you don't understand the mathematical objects it represents.

Simple example, linear regression, we can compute the solution without iterative improvements as in Neural Networks, why? How do we actually code that? What is the pseudo-inverse that we are computing doing there?

From here, what's the relationship of that with Newton's method for numerical optimisation? Why does alternating least squares even work?

The code will _never_ explain the underlying mathematics, it can only represent them.

I am not really sure what you are expecting for such a vast field. I recommend reading Elements of statistical learning or Bishop's book or Murphy's Probabilistic ML.


👤 simne
I think, this is partially, back side of open source, as semi-commercial activity.

What I mean, nothing in this world free, and when You making something, You anyway need some close loop, to make it good, and to retain people (consumers and developers) around Your project.

Next in this logic happen strong tie of tutorial to Your library, so tutorial in reality work as ad for You.

Must admit, it is possible to be above this primitive scheme, but for this must be magnitudes better than competitors in Your niche.


👤 itsamy
Fast.ai courses are great resources for leaning how modern ML/DL models work under the hood

https://course.fast.ai/


👤 henning
what about Data Science from Scratch, Deep Learning from Scratch, and similar books where they just use numpy or something and do everything from mostly first principles?

👤 eachro
For ML tutorials, I feel like I do see quite a lot of tutorials in pure numpy. A fun thing I've been doing with ChatGPT is to ask it to show me some starter code for some topic. Usually it shows me some code that imports sklearn and then I'll followup asking it to give me everything written without these dependencies. It's worked out pretty well so far.

👤 eurticket
You could patchwork together some of those library tutorials that relate to the first tutorial that encompasses them all.

__Okay so you've installed the 10 libraries, and see how powerful they can be together, here are individual things you can do per library used.


👤 TylerE
95% of programming is plumbing.

👤 forrestthewoods
> A great example is the machine learning: An immediate google gets you as far as 'install these 10x libs' then write this.

This is because ML mostly uses Python. And Python is an absolute clusterfuck of dependency hell. https://xkcd.com/1987/