HACKER Q&A
📣 kylebenzle

Am I too late for the “Data Science” wave?


I did my MS and PhD in agriculture but am looking to pivot into something that gets me out of the lab.

I'm now doing the NYC Data Science course (https://online.nycdatascience.com) but am getting concerned I may have missed the boat. What does HN think?


  👤 st1x7 Accepted Answer ✓
I work as a data scientist and have some perspective on this. There's no boat to miss, you'll probably be fine. Just keep a couple of things in mind

- The fundamental skills that you need are mathematics and software engineering. Depending on your background it might take years of additional studying.

- There is a big oversupply of people for the junior-mid level data science jobs. There are more people who want to get in the field than there are jobs. If you drastically switch careers, you'll take yourself out of a field where your skillset is incredibly rare and your competition is limited and put yourself in a place where everyone else wants the same job.

- The fact that you have a PhD is going to help you. Personally, I don't think that a PhD in a field other than mathematics/computer science is that relevant but employers tend to favor applicants with PhDs mostly because there are too many candidates for any given job and asking for a PhD is just a strong initial filter. There are also research jobs within data science for which a PhD requirement (in a relevant discipline) makes more sense but these are a small proportion of all the data science jobs.

- If you're already employed with your agriculture PhD, there must be a number of opportunities for you apply the techniques that you're currently learning wihout leaving the industry. That's probably the path that I would suggest - it would allow you to expand your skillset without taking big risks and you'll have more options in the future. Use the career capital that you already have and explore your options instead of making a sharp turn in your career direction that might leave you disappointed.


👤 tomhoward
Have you explored opportunities to apply data science to agriculture?

I'm interested in this space; I do some work with agricultural data acquisition hardware and software (e.g., soil moisture, environmental conditions, sap flow, plant/fruit growth monitoring), irrigation, fertiliser application) and I'm interested in ways this data could be used in predictive models, but I'm not at the stage of being able to focus on that aspect yet (still getting the core data logging/display tech working well, though we’re nearly there).

Feel free to get in touch (email in profile) if you'd like to connect and discuss.

To address your question, I think the world is still mostly at a very basic stage in its use of data analysis and statistics. Most of the talented people are employed on big salaries by a relatively small number large companies with huge budgets for specific applications (e.g., ad targeting, risk assessment, algorithmic trading).

But outside of that, not much is happening, so I think there are big opportunities to apply data science in new fields and make the benefits more widely distributed.


👤 droaak
Definitely not. Let me put things in perspective. There are two types of companies, Company A - statisticians working as Data scientists, good engineers deploying models in production.

Company B - have no clue what ML or AI is and feeling the heat. They could be a multi million dollar company or a small SMB.

You will always find both these A & B atleast until ml and AI is well democratised. It is not, not even close. We are at the early stage of the curve still, but moving forward there will be rapid growth in the next 5-8years.

You have few options: 1. Start with sql. It’s not hard, join as an analyst and learn to code. Make sure the team or product you join deploys models. 2. Learn basic python and some orchestration tools (airflow, spark or aws/azure equivalent) . Join as data engineer along with basic sql skills.


👤 serioussecurity
It's been a big deal for hundreds of years, most notably in economics and the military. Any reasonable company worth their salt was doing "data science" in the 1950s but calling it statistics or logistics or business intelligence to grab a phrase from yester decades.

What boat do you think you're catching besides a reliable middle class job?


👤 thom
Genuinely shocked by this question. There is no part of the data science stack that is in any way settled. Even more broadly I don’t think statistical best practice is at all established, or at least well distributed.

You haven’t even missed the boat in terms of being able to make money off raw buzzwords and zero skills.

At the very basic technical level, there’s infinite work to be done optimising machine learning systems. This includes not just the fashionable issues of faster more accurate (or even less accurate in terms of floating point!) deep learning, but also moving Bayesian approaches like MCMC to multiple cores and GPUs.

There’s infinite work to be done on finding the right topology for a machine learning system. This applies not just to neural network layers but also to traditional stats (i.e. multiverse analysis).

There’s infinite work to be doing in understanding, cleaning and preparing datasets. Something as clunky as tidyverse can’t be close to the final form here. We’ve only just started talking about feature stores etc.

There’s infinite work to be done improving notebooks, integrating better software engineering practise into the workflow but also in terms of productionisation of models created therein.

All this is just platform stuff as well. It doesn’t even touch on the fact that businesses everywhere are terrible at formulating questions to be answered by stats, terrible at communicating those answers and terrible at even knowing this is a valuable endeavour in the first place.

I cannot imagine a boat harder to miss.


👤 qxmat
I've worked at productionising data science models for the past 4 years. I'm currently responsible for delivering technology platforms to ~180 data scientists.

I find the data scientist label misleading.

Roughly 70% of of the data scientists I've encountered are actually Excel analysts with little experience outside of a Windows desktop bar Facebook on a Mac. They're unable to use basic software engineering tools such as git, vscode and python. Excel users and their managers are hostile to solutions that aren't excel-like. They will fist-fight you if you restrict them from downloading and exploring data on their computer. Few understand their compliance/legal obligations.

Another 20% are familiar with a wide range of tools - such as Matlab, R, Jupyter notebooks and various ML/AI toolsets. As developers they're unaware of the tech stack, short of "I installed ananconda and it doesn't work" but are happy to work in the cloud and learn new tech. They understand PII requirements and memory/cpu limits but don't always demonstrate the latter in practice. Nonetheless they produce the bulk of your analysis, having studied classification, and reasonably cost efficient if you pair with a SWE.

The final 10% have mastered containers, venvs, wheels, cloud sdks and how to configure their software in environment independent way. They require help to achieve production quality but are great self-starters. Given enough time and support they're able to quickly replicate this effort and teach others. As relative superstars they're in high demand which makes capacity planning difficult. This pushes up their premium.

IMO the best data scientists are 1 in 10. Because we're desperate for quality almost anyone can assume the title meaning the market open to new comers - you just need to be skilled in Excel (harder than it seems - most developers can learn a lot observing an analyst/consultant use Excel).

To answer your question: No - you're not too late. Just by posting here I expect you'll be in the top 30% - an asset in demand.


👤 pritovido
It's up to you. You should have much more knowledge in the agriculture field than most people.

I believe it makes no sense that you discard the knowledge you already have. You can apply data science methods with the knowledge that you have going to work or creating a new company that works in agriculture using new methods.

Do you want to spend your life doing surveillance and spying on people like everybody else? This is fashionable but people are starting to resist and develop antibodies for it as they understand it more. The TV or the phone that I bought spying or me is not acceptable.

Agriculture will grow enormously in the future with things like LED and other methods to give energy to plants, or plankton or whatever. Drones controlling pests or humidity or temperature. Using natural insects predators for bio farming. Growing materials like cotton directly from cell's cultures.

The methods that are used today for growing marijuana indoors will be applied for more common things when prices go down.

Nobody better than you to identify the markets that will grow in the future. It is also a very good idea if you know (or associate with someone who knows) economics and marketing and selling.

Reason from First principles is extremely useful for identifying new waves that will carry you in the future with no effort. https://www.youtube.com/watch?v=NV3sBlRgzTI

All those things are hard problems. I have worked on those in the south of Spain and in Holland as engineer and entreprenour.

The real life is not Academia, your title means nothing if you can not apply it and give results, but means a lot if you can. So you will need some time to adapt to a different mentality.


👤 pugio
I work in ed-tech – currently building an intro to data science course – you have by no means missed the boat. The field will only continue to grow, rapidly, for quite some time.

On top of that, you have something incredibly valuable to a budding data scientist: domain expertise. Being able to manipulate data is great, but to most effectively solve real-world problems you have to understand (or communicate very closely with someone who understands) the main problems in a domain. I can't count the number of times I've heard scientists frustrated by their lack of data skills, and data scientists frustrated by some arcane domain fact that stymies their model production.

Far from being a liability (or just a sunk cost) your background in agriculture will make you extremely valuable as a data scientist.


👤 bvcvbuiy
If I may give you an advice : Python, R and deep learning are sexy but the most important skill to start in data science is SQL. It will help you get your first data role and will be your main tool to solve 97% of the problems you will ever face.

Bonus point it is very easy to learn.


👤 jschulenklopper
About the best time to ‘invest’ in anything that’s close-to-certain of yielding results: “The best time to plant a tree was twenty years ago. The second best time is now.” – Chinese proverb

Advanced data analysis / machine learning isn’t dated or old-fashioned at all, and I guess will continue to stay (or: become even more) relevant at least another decade. Not all ships haven’t left the harbor.


👤 sf_rob
As with any role/skillset, the explosion of data science demand has lead to an explosion of "data scientist" supply. When demand was greatly exceeding supply, you naturally saw the definition and requirements of data science roles loosening. When I contact companies about DS roles, I like to ask what they mean by "data science". Often times you would find roles that were some mix of: SQL analyst (no DB admin skills required), Python hacker (no "real" software eng skills required), and basic stats (can you calculate a confidence interval?).

What I've seen in more recent years with growing supply and maturation of departments is the need for specialization. Can you do hardcore statistics? Are you an ML practitioner? Are you a data architect? Basically, the blended role DS hacker is more commonly (and correctly IMO) relegated to a various analytical and strategy roles.

Honestly you haven't missed the boat and you don't need a formal education, but I would highly recommend having a depth of skills in one or more areas of data science generally with an example or two to back it up. Basic skills are just table stakes at this point.


👤 cambalache
1970- Hey Guys I want to start programming with the PDP-10.Did I miss the boat?

1980- Guys I want to learn about micro-controllers, is it too late?

1990- GUI programming

2000- Linux, Internet, you name it

2010 -. Javascript

In 30 years time (at the very least) there will be still Data Science. So if you are really up to it, id does not matter if you should have started 5 years ago or now. If you suck at it or really dont like it, it would make no difference either.


👤 artembugara
For those here who are just "making a DS course": you have definitely missed the boat (IMO)

For those with solid Math, Stat, Tech skills that require years to master: it is your time to shine


👤 jofer
As someone with a PhD in a different field who's made that switch, I actually have to disagree with most of the advice here.

This is a bit too bitter and jaded (I'm not quite this pessimistic), but I think it needs to be said to counter a lot of the rosier advice.

Having a PhD in a non-CS field is a _massive_ negative in the eyes of potential employeers. Even if you're looking at moving into a role where your domain knowledge would be immensely relevant (lots of ag + remote sensing startups these days), you will be seen as underqualified compared to someone with only a BS. You'll be seen as underqualified compared to someone with no BS or a BS in an irrelevant field. You need to be strictly better at multiple roles than anyone they could hire to be considered, and even then you'll be expected to work for 1/3 to 1/2 of what they'd pay someone with only a BS and no experience. Folks usually despise domain experts because they see the role of their company as "disrupting" all of the prior knowledge. You represent what they want to replace and you're likely to disagree with them about key approaches.

You will be much more effective, but no one cares about how much you contribute to the company's bottom line. People only care about appearances.

The appearance companies want is a "self taught college dropout". That holds for data science and machine learning positions every bit as much as it does for developer positions, in my experience.

The upshot of this is that you likely know how to learn quickly. Pick up multiple additional skillsets.

You won't get hired because you're an expert in X field that the company needs + a data scientist. You'll get hired because you can throw together a crappy web app on short notice, or debug their crazy duct-tape-and-glue CI system, or save a lot on their AWS bill by switching some things around. You need those skills _on top of_ being a domain expert and a data scientist.

You have to be able to do more than anyone else they could possibly hire for the role to even be considered. Otherwise, you'll never overcome the fact that you have a PhD in their eyes.

Again, that's the bitter/jaded view. Take the above with a grain of salt.


👤 markus_zhang
If you are already a PHD do you have chance to do something with big data in the domain? I'm sure there are some big data projects you can do in agriculture. PHD in DNA research tend to have much more relevant experience than those who just go through a few training camp courses as they have to build models for PB level data, which forces them to use CLI and optimize their algos and use advanced tools such as Hadooo, spark etc.

👤 rramadass
As many people have already recommended, you should absolutely look into Applying Data Science to problems in your Agricultural field. Your domain expertise combined with skill in computing technology would be a killer combo.

PS: You might find the book Cloud Computing for Science and Engineering useful - https://cloud4scieng.org/


👤 morelandjs
No, I don't think you are too late. The predictions that data science will be fully automated in the near future are, in my professional opinion, unrealistic.

I see the future of data science benefiting from better software design, e.g. transformative frameworks like pytorch and sklearn, which are powerful tools but hardly fully automated. We'll continue to need skilled workers who are current in the latest software stacks.

It also benefits from what I'll call the "lotto effect", where data scientists will occasionally multiply the bottom line by 10x or more. This is of course rare, but companies will continue to chase that fantasy and hire data scientists because it's too tempting to ignore.

My only advice would be lean heavily into the software side of things. There's too many data scientists who are novice programmers.


👤 refactor_master
Most definitely not. It’s just that the focus is perhaps more on domain expertise and the production-side of things these days, rather than manually putting Tensorflow models together, which might give the impression that every problem has already been “solved”.

👤 Taylor_OD
No. Data Science is still a very immature field. Like DevOps 5ish years ago. Everyone wants it. Few people know what they could actually do with it. Lot's people people hire for it and then let those they hire figure it out.

👤 jariel
'Data Science' is a secular shift in how companies work, not cyclical.

It's a 'new kind of job' that's going to be here for the foreseeable future.

While competition might increase for jobs, the number of jobs is likely only grow over time.


👤 iwd
Agriculture PhD plus even modest data science skills is hugely in demand at the giant ag-biotech company where I work. Many of the data engineers I lead are former science PhDs. Literally all of our people strategy discussions in R&D are about how we need more people like you. There are only a handful of big science companies left in ag, but I bet the other ones have similar needs.

👤 YeGoblynQueenne
It's not too late, no, but can I ask- why are you intersted in a data scientist role? Is that the only, or best way to "get out of the lab" given your background?

Because I must confess that I don't see the immediate connection between an MSc and PhD in agriculture and a (assumingly generic) data scientist job. You should be perfectly capable of performing data science tasks in an agriculture context, but it seems to me you are asking for something different, a "pivot".

At the risk of sounding rude (for which I apologise in advance) are you asking simply whether there's still space on the current bandwagon? If so, I must advise you against it, because employment bandwagons are awful things to get on. Crodwed, badly paid, poorly understood, not that useful, scarcely productive- in short, short-term and not very fullfilling.

Is it just a matter of making lots and lots of money with the skills you clearly have and that you must have worked hard to acquire? Well then, there should be much, much better placements for you, outside the lab, in the sector you studied about.

My concern is that seeking to jump on the data science bandwagon right now will only flood industry with more and more semi-skilled, half-baked professionals, who don't really understand and don't really care to understand their subject matter, similar to what has happened with software development. The world is full of bad devs who are "passionate about javascript" or something like that and who are struggling to promote their personal brand because they have no other skills than the promotion of their personal brand. Don't allow yourself to fall that low.

Edit: in the interest of full discolosure, I'm a CS graduate with an MSc in data science and studying for a PhD in AI, but I'm not looking for data scientist jobs and am not interested in them, because I find them boring, unproductive and unfullfilling. I have actually worked as a (freelance) data scientist for a while.


👤 ralphc
A lot of people are saying to leverage your agriculture PhD, but I don't see people asking if you like agriculture. Is the interest in data science because you want to get far away from agriculture, or just concern that there isn't much future in it?

👤 enriquto
I do not understand the concept of "being too late". If you are too late, what about the future generations? You can only be too late if the subject is already dead and not used anymore.

👤 karimtr
There is huge potential around better data usage & analytics in agriculture & crop science. You're domain expertise would place you in a unique position compared to other data scientists with more of a generic background.

I've personally spoken to a few large enterprises in the agricultural sector who are just beginning to build out their data science department. It seems like the industry as a whole is just getting started in data science.

Also, there are many promising startups that are emerging in space. E.g. Vertical farming


👤 tejtm
Climate change and carbon.

Ag, farmers, lawn owners are all errr... "ripe for disruption" as their genetic systems lag and only innovate at a the paltry rate of once per season.

There are many ways to play the field (no pun intended) from predicting and capitalizing on misfortune to minimizing the same. I hope some will be most interested in using our talents to mitigate crop failures, maximize sustainable nutrition or find the optimal low risk carbon sink.


👤 giardini
As others have noted, many data scientists work in Excel. This book, which teaches data science, does just that:

"Data Smart: Using Data Science to Transform Information into Insight" by by John W. Foreman

https://www.amazon.com/Data-Smart-Science-Transform-Informat...

This is likely the quickest way to start.


👤 tornadofart
Big companies started data science, AI projects, etc. 5 to 10 years ago. Medium and Smaller ones are starting just now. So I'd say you're fine.

👤 teleforce
Yeah it's too late now, people already moved to data engineering ;-)

Joking aside, this is a very good overview of data science and engineering for the year 2020:

https://github.com/datastacktv/data-engineer-roadmap


👤 j45
This might be shinging a light on something surprising -- After taking in a Data Science course at Microsoft last year..

Try to remember the very beginning of Data Science is often cleaning up data in Excel, and then learning to do it with excel functions.. and then learning to do the same in Python.


👤 aborsy
I wonder with schools producing CS graduates at such high rates, how people want to distinguish themselves and find jobs?

The market seems to be flooded with students with CS skills. Actually, people from all fields — mechanical engineering, computational biology, EE, math etc — are entering data science. It is worrisome.


👤 cowanon22
My background: I am a tech architect/manager on IoT projects who designs and helps develop analytics systems. I do some basic data modeling in my role, but defer to experts to build a better model once we find some area of interest.

The actual skills of light programming (R, Python), data literacy/manipulation and some basic modeling (statistical/machine learning, Bayesian methods, time series, etc) are useful for many job roles, and I think will be considered basic skills for college graduates in the near future. This isn't new - operations research, particularly Six Sigma and quality control, have used statistics and some light programming to solve business problems for decades.

By itself, I don't see Data Science evolving the way promised by schools and boot camps. Most of the positions named "Data Scientist" (at non-tech companies) are really just senior business analysts; I work with a group of them at my company and 90% of their day to day is just extracting and analyzing various reports for other managers and directors. When an interesting (and potentially lucrative) business problem does come along, they usually outsource to a specialized analytics firm and the data scientist helps coordinate that project. (If you have a good data dictionary, a clear outcome in mind, and some basic knowledge of the field it's relatively easy to outsource the advanced work.)

st1x7 had the best advice below -- learn the basic methods and then apply them to your field. If you google "agriculture iot research papers" you'll find tons of examples of people using sensors for data collection and then analyzing the data to improve some process.

TL;DR I see Data Science melting into other roles, but the basic skills/data literacy are useful for almost everyone.


👤 elwell
> am looking to pivot into something that gets me out of the lab

Will data science get you out of the "lab"?

Maybe you can become a farmer, and iterate on patentable new agri-tech you will inevitably develop along the way; then sell that.


👤 sjg007
There are plenty of data science problems in agriculture. Big ag, small ah, automated ag. Look for problems people have in your domain that could be solved with data science.

👤 pkrotich
I remember feeling the same way about CS in 98' - how silly of me then! You'll be just fine.

👤 a-dub
Look at Google X. They have some workstreams going on right now that combine agriculture and ML.

👤 strawberrycheez
Definitely not

👤 coldtea
There will always be a next fad.

👤 baq
data science is just a buzzword for statistics where you have more data than you know what to do with instead of working hard on getting more quality data. more or less.

👤 dhiraj8899
No Not at all, there is lot to come in this field. so this is the right time to enter this field

👤 blackrock
I think this is the crux of data science: extrapolating

https://xkcd.com/605/

Also known as Linear Regression.


👤 NoCanDo
No, never too late to learn.

👤 afrojack123
Yes, your too late.

👤 glovink
Yes, you are too late. But if you retrain and become a data critic, you are definitely ahead of the curve ;) The robot armies of AI experts and big data scientists are doing incredible harm to society. There is a real task for you there.

👤 glovink
Yes, you are too late. Instead, focus on data critique. There is a big market for that as the robot armies of AI experts are doing incredible harm to society.