HACKER Q&A
📣 crtlaltdel

Why do so many startups claim machine learning is their long game?


I work with and speak to many startups. When I ask questions around the product value, especially in context of defensibility, they assert that their "long term play is using machine learning on our data". This has been pretty consistent in the last few years, regardless of the nature of the product or the market for which the product is targeted. This typically comes with assertions such as "data is the new oil" and "once we have our dataset and models the Big Tech shops will have no choice but to acquire us".

To me this feels a lot like the claims made by startups I've encountered in past tech-hype-cycles, such as IoT and blockchain. In both of these areas there seemed to be a pervasive sense of "if we build it, they will acquire".

The question I have for HN is in two parts:

1. Why is it that a lot of startups seem to be betting the farm, so to speak, on "machine learning" as their core value?

2. Is it reasonable to be highly skeptical of startups that make these claims, seeing it as a sign they have no real vision for their product?

[EDIT]: add missing pt2 of my question


  👤 peterwoerner Accepted Answer ✓
Because there is a real moat with data ownership and pipelines. If you want to do any analysis you quickly find that learning to properly use scikit-learn and tensorflow (or your machine learning algorithm of your choice) is atleast an order of magnitude lower of work than getting the data. For instance, I wanted to build a machine learning algorithm which took simple data from the SEC filed 10-Q and 10-K, which are freely available online and predict whether stocks were likely to outperform the market average for over the next 3 years.

Time to setup up scikit learn and tensorflow algorithms to make predictions: 4 hours. Time to setup python scripts which could parse through the excel spreadsheets, figure out which row corresponded to gross profit margin, and a few other "standard" metrics: ??? unknown because I gave up at about 80 hours trying to figure out rules to process all the different spreadsheets and how names were determined.

I had a professor who was doing machine learning + chemistry. He was building up his own personal database for machine learning. He spent ~5 years with about 500 computers building the database so that he would be able to do the actual the machine learning.


👤 tlb
There are surely some startups for which this is bullshit. But the good version of it is:

- take some valuable task that's never been successfully automated before

- do it manually (and expensively) for a while to acquire data

- build an automated system with some combination of regular software and ML models trained on the data

- now you can do a valuable task for free

- scale up and profit

The risk is that it's hard to guess how much data you'll need to train an accurate, automated model. Maybe it's very large, and you can't keep doing it manually long enough to get there. Maybe it's very small and lots of companies will automate the same task and you won't have any advantage.

I think there'll be some big successes with this model, and many failures. So be skeptical -- ML isn't a magic bullet. But if a team has a good handle on how they're going to automate something valuable, it can be a good bet.

As an investor, you may well face the situation down the line "We've burned through $10M doing it manually, and we're sure that with another $10M we can finish the automation." Then you have to make a hard decision. With some applications like self-driving cars, it might be $10B.


👤 smacktoward
You forgot the second part :-D

I think the answer is pretty simple: it sounds good, and it's hard to challenge. It's essentially a promise that they will invent a black box with magic inside. Since nobody can see inside the black box, it's hard to argue that there isn't actually magic in it.

The long-term problem, of course, is that there aren't that many actual magicians in the world. Most of the people who bill themselves as magicians are either people who just think they're magicians, or people who know they aren't but don't mind lying about it.


👤 tetha
Ugh. Maybe I'm in a different world by now, but I dislike such statements on multiple levels.

> This typically comes with assertions such as "data is the new oil" and "once we have our dataset and models the Big Tech shops will have no choice but to acquire us".

Maybe it's me, but I dislike the attitude to work to be acquired. Interestingly, this is a rift I see quite a bit if I interview more development oriented guys, or more infrastructure oriented guys. Tell me I'm wrong, but development oriented guys tend to be more fast paced and care less about long-term impacts. Infra inclined guys tend to be slower paced, but longer-term oriented. Build something to last and generate value for a long time.

> 2. Is it reasonable to be highly skeptical of startups that make these claims, seeing it as a sign they have no real vision for their product?

From my B2B experience over the last few years, and working towards a stable business relationship with large European enterprises, yes. My current workplace is moving into the position of becoming a cutting edge provider in our place of the world. This is a point where machine learning and AI becomes interesting.

However, we didn't get here by fancy models and AI. We got here by providing good customer support, rock-solid SaaS operation, delivering the right features, strong professional services, and none of those features were AI. It's been good, reliable grunt work.

Different forms of AI are currently becoming relevant to our customers, because we have customers that handle 5k tickets per day with our systems and they have 3-4 people just classifying tickets non-stop. We have customers with 30k - 40k articles in their knowledge base, partially redundant, partially conflicting.

This is why we entered a relationship with a university researching natural language processing among other - and they will provide us with a big selling point in the future. And they are profiting from this relationship as well, because they are getting large, real world data sets they couldn't access otherwise. Even with a good amount of pre-processing by the different product teams.

But as I maintain, nothing of that form has brought us where we are.


👤 thecolorblue
I agree with other points here, but I also think nobody wants to wake up 2 years from now and be the only company that was not investing in machine learning. It could turn into nothing, or it could be 100x the time and money put in now.

So, of the 4 outcomes:

(1) buy in now + worthless

(2) buy in now + 100x

(3) don't buy in + worthless

(4) don't buy in + 100x

(4) is a terrible position to be in, (2) is a good position to be in, and (1) only costs a little, and (3) is even. So when you are deciding to buy in or don't, you are deciding between good + little loss or terrible + even.


👤 jacquesm
Because they believe it will increase their chances of getting funding. The way ML is dragged in by the hairs in some propositions really is just painful. The funny thing is the few start-ups that I've seen that actually needed and used ML properly did so quietly because it gave them a huge edge over their competition who had not yet clued in to that fact.

👤 floatingatoll
Machine learning is a cost reduction multiplier for staffing costs, which are perhaps the highest cost for technology firms until they become successful.

Google and Facebook use algorithms to try and minimize dollars spent on moderating their sites, with limited success. If they can avoid paying human beings to make judgement calls, they save billions of dollars a year.

It’s reasonable to be skeptical of startups who claim that they’ll use ML someday. Ask them how they’re using human labor today to perform the tasks that they’d like ML to do, and what their runway is for that work at their current burn rate. If they don’t deliver a viable ML labor reduction by that time, they will either collapse or worsen their service.


👤 _5659
The real moneymaker is what's under the hood of many machine learning models: data laundering and labor externalization.

Consider this: a good deal of open datasets are stolen. People had their faces taken without consent. The data was tagged and labelled for pennies per image on some server farm. Wrapping it up in machine learning can effectively black-box what it is you're actually doing.


👤 joddystreet
Software products can be layered into 3 parts -

- data collection (frontend, APIs) - data storage (database, backend) - data visualization (dashboards, analytics, reports)

To bootstrap a startup, the primary people you need would be - a frontend person, a backend person, and a product person. In the beginning, you would be dreaming about the possibility of using ML but would not invest in hiring a data scientist at that stage. The second stage would be to hand over the reins to operations people and let them optimize the internal and external processes and get ready for the launch (growth). These phases can take-up any amount of time, from 2 to 5 years. Finally you handover the reins over to the salespeople and go back from product to services mode (especially true for enterprise software). During this stage, you would be hiring a data scientist and a team of data analysts. After, around 6-8 years of bootstrapping a company. Unless your product relies heavily on the ML algorithm, it can and would always wait.

Figuring out where to first use ML is another challenge. Hiring a data scientist and ask them to tell you what to do is a futile effort.

ML is a useful tech and if you have to keep thinking about utilizing this technology to improve your product and processes. It has to be a part of your long term efforts.


👤 warrenronsiek
Most companies aren't interesting enough to invest in without some kind of secret sauce. Claiming that they are going to use 'AI and ML' is a way of saying that 'yes, we are a company with tons of competition and little competitive advantage, but we will have a secret sauce eventually. We don't know what that is yet, so we are using jargon as a placeholder.'

Call me a cynic, but I think the talk of building a data moat is mostly nonsense. Of the companies that make these claims, how many of them are actually hiring expensive data engineers to build the moat? If don't have a team working on it, then its a ruse.

Unless the startup is founded by people with deep experience in ML and have been working on using it extensively from day 0, it is unlikely that they will be able to deliver on this vision. They won't be collecting the data correctly if they are doing it at all. If they are collecting it correctly (they aren't), they won't be able to get it to a useable form. If they get that far (they won't), they then need to build out the ML Ops to deliver their models. Now, finally, they can `from tensorflow import *`. Engaging this process post hoc takes YEARS.


👤 yters
This reminds me of Altman's justification of his AI venture. Once he solves AGI, then all the monies is his, and whomever else was lucky enough to invest in his 'lightcone of the future'.

It seems AGI is the perpetual energy device or philosopher's stone of our era.

Enough so that I'd consider starting a hedge fund to short all the AGI companies :)


👤 OnlineCourage
Because rule based software is now a commodity, and therefore has lower margin. ML is more risky to achieve, may not work on a given dataset, and requires scarce specialized thinking, therefore is worth betting capital on. ML is the new stock while software is the new bond.

👤 winrid
Because they are probably temporarily using mechanical turks. If they admit that their evaluations will be terrible because that kind of company cannot scale like a SaaS company - it'll be valued like Professional Services which is "not good".

So if you say you're going the ML route your perceived value is much higher.


👤 paulcole
The key is that they’re not betting the farm on ML. Instead they’re betting someone else’s farm on ML.

👤 solidasparagus
It is one of the two obvious economies of scale for pure software companies (along with network effects). I can't think of a company that should not have ML in their long-term plan.

Some people probably talk about it without truly understanding it (it can take years before you have enough data to build good ML models, particularly if there are seasonal trends), but I certainly wouldn't judge a company poorly for seeing ML as an important long-term moat. I would judge a young company that sees it as a short-term moat - if you can acquire valuable data quickly, so can your competitors and it's not really an economy of scale that gives you defensibility.


👤 kgiddens1
The reason this is the case, is that one of the biggest gaps in the market is not the technology itself (most of this is open source or variations there of) but of proprietary data sets. This is valuable when transformed (as my company does at www.edgecase.ai ) to annotated data. What we see is that companies that a) invest in acquiring data and b) transforming said data and c) building a model that is useful to customers (and acquires their data) is the way of the future.

"Ai with unique datasets is an amazing moat"

So in sum:

1) in some cases this is true (but most of the time there is no unique data) 2) if they truly have unique data this is something to take notice of.


👤 LoSboccacc
because VC money is hunting for ai startups, so everyone does what they did when it was data analysis, predictive intelligence, augmented reality and whatnot, they attach the buzzwords to the pitch in hope to get a foot in

👤 ebrewste
ML does two things, ideally: - Makes meaning from data (this is customer value) - It insulates the raw data from competitors. So the customer gets their actionable insights from your algorithms and your competitors can't run algorithms on your raw data, trying to do a race to the bottom with you. This works in two scenarios: !) lifestyle businesses where it isn't worth it for would be competitors to generate their own data and 2) big projects where the first mover gets an unfair advantage from huge data sets.

👤 grumpy8
My take on it is you can add so much more value to a product with ML, but to succeed you need to have a lot of data.

So you're getting into a "We need to grow really fast to get more data than our competitors so we can add ML and create more value to the product so we can grow faster and get ahead of competitors".

I.e. ML is a competitive advantage which is often hard to come by for startups.


👤 gsich
From my experience if someone says they are doing such things it usually boils down to:

Machine learning == statistics

Deep learning == machine learning


👤 rubyfan
1. I believe the term is “hand waiving” 2. Yes, or maybe worse they have no business model.

👤 sys_64738
It’s the buzz phrase that requires being part of every corporate’s mission. Even my trash company is into machine learning.

👤 buboard
It's like the gold rush except the gold (data) is easy to make, and very very cheap. hmm better sell shovels.

👤 RocketSyntax
Aggregate some kind of data during your normal business operations. Then produce insight from that data.

👤 sjg007
Collect and sell the data is a strategy.

👤 1_over_n
because VCs believe it gives them defensibility

👤 copperfitting
Strongly agree!

👤 codesushi42
Why is it that a lot of startups seem to be betting the farm, so to speak, on "machine learning" as their core value

Because it is the current overhyped thing, and there are enough dumb investors to fund it.