HACKER Q&A
📣 kfk

AWS is telling us using Python for ETL is an antipattern, true?


Me and my team happily use Python + Dask for various things, one of them is ETL jobs. Most of what we do with Python is reading data via ODBC drivers and bulk uploading it into AWS S3 and Redshift. Now AWS is telling our IT that this is an antipattern. It seems Glue or Talend would be the best practice. I am confused as to how is having version controlled Python code to do ETL an antipattern? I can read and manage code much better than UI's. I always thought of tools like Talend as good to haves, but not as necessary if you have a solid team that can do Python. What is your experience?


  👤 based2 Accepted Answer ✓
Did not know this ETL that produce GIFs by using Prefect calls: https://examples.dask.org/applications/prefect-etl.html

-> https://www.prefect.io/about/company


👤 ldng
Glue as in AWS Glue ? Of course they'll want you to buy in their solution. The guy is just trying to upsell you their product.

BTW, Talend uses an embedded Python for scripting purpose.

UI are good if you want to delegate the monitoring of the process to less qualified people. That is usually the case in big companies.


👤 brodouevencode
Are you sure it’s python specifically they are talking about? Glue does simplify a lot of that data moving for you, and probably what they are recommending.

👤 nunez
That sounds...incorrect. From my last exposure to Talend, what it was being used for could absolutely be done with Python

👤 gshdg
It’s quite common and not an anti pattern at all. Wtf, Amazon?