HACKER Q&A
📣 agnosticmantis

How to go from data science to software engineering?


I’m a data scientist working in Tech, and have decent programming skills for a data science position, but don’t have a software engineering background. I’ve found that I don’t quite enjoy the ad hoc nature of the profession, moving from one small project to another completely different one. I’d rather work on a project that builds something and the effort accumulates into something more tangible than an analysis report. What’s the optimal way to learn the necessary software engineering skills and move to such a role?


  👤 jstx1 Accepted Answer ✓
> I’d rather work on a project that builds something and the effort accumulates into something more tangible than an analysis report.

There are data science jobs in which your output is at most a report, and there are data science jobs in which you're responsible for ML experiments, prototyping, building and maintaining services, model deployment etc. Sometimes that's called ML engineer but a lot of the time it's data scientist too (which title is used more kind of varies by country too from what I've seen).

You might like the second type of DS job more, or if you already have that kind of job then switching to SWE is easier because you're basically already a SWE and you're just switching domains instead of breaking into SWE.


👤 apohn
>I’d rather work on a project that builds something and the effort accumulates into something more tangible than an analysis report.

Before deciding to become a Software Engineer I'd recommend you think about what that "something" is that you want to build.

There's a really broad range of jobs that require you to build ML based things that are deployed in a production environment, meaning your output is mostly code. In one type of job you are really more of an infrastructure building person. From what I've seen at some well known tech companies, a lot of ML Engineers are taking models built by other people and finding a way to run them at scale. So what they need more than modeling skills is the knowledge of building a scalable data and scoring pipeline that can meet a certain SLA and cost.

If you want to be more of an ML Infrastructure or ML Engineer (as defined by a company like Facebook), you'll probably want to focus on learning traditional software engineering, just with a focus on the parts of the stack relevant to data. You'll get asked algorithm and system design questions in interviews so probably it's good to focus on that.

There's another type of role where you build things. I work as a Data Scientist at a large non-software company that builds complex pieces of machinery that are sold to other businesses. The machine collect a lot of data and the team I'm on builds models to detect issues with those machines. Our team also builds the infrastructure to run those models. Every single person on my team is expected to be able to go from a conversation with an SME (e.g. a mechanical engineer) to python code that is running in our prod environment. My deliverable is code, but I certainly do a lot of analysis to get to the point where I'm ready to build the model and code.

I've done similar roles in Medical devices, Pharmaceutical Manufacturing, Oil & Gas, and Industrial Manufacturing (e.g. car part factories). IMO what you need is a broad exposure to ML models, the ability to write Python/R scripts that won't break when they run on a schedule (e.g. every hour), and the ability to demonstrate that you have the communication and thinking skills to take a vague problem statement from a business person and turn it into a production model. I've interviewed for a lot of jobs in this area and I've never gotten any system design or algorithm questions. Usually it's ML based questions (e.g. what metric would you use to evaluate a clustering algorithm) and questions that try to see how you think about business problems.


👤 presheaf
I'd say learn about software architecture patterns and how to manage complexity. A good starting point is http://www.aosabook.org/en/index.html. Most backend engineering is about managing complexity and that book has a lot of good information on how to do that.