2/12/2024 0 Comments Python prefect vs airflowSo I decided Prefect, the newer kid on the block, was the way to go for us. But I ultimately decided this would have required too much infrastructure-ing by our team, when we’d really rather just be writing Python. I considered rolling our own system again, using a custom scheduling app and UI, and leaving the heavy lifting to Google’s PubSub, Tasks at al., probably using Terraform to manage the whims of our infrastructure needs. Airflow (since quite recently) lets you specify parameters via the web interface, but provides very little visibility into the fact that a new run with new parameters is not the same as the previous one. Our use is loading/training/predicting for different areas, using different models and different imagery sources, and often with no schedule. In classic ELT/ETL tasks, you want to run the same DAG, more or less the same way, every x minutes/hours/days. The main reason I rolled my own that time, and am not using Luigi/Airflow now, is their lack of support for very heterogenous job runs (points 5 and 6 above). I’ve played around with Luigi and Airflow in the past, and even wrote my own very light pipeline manager, using class-based DAGs and some hacky Redis stuff to keep track of things. Ability to trigger jobs via GUI or by HTTP.First-class support for running jobs with different parameters.Handle variable parallelism at multiple levels.Have a good git-centric version control system.Bonus: run on AWS too without starting from scratch. Over the last two weeks, I went deep figuring out the best way to orchestrate some relatively involved satellite imagery and deep learning tasks. Scaling out with Prefect - Chris Arderne CHRIS ARDERNE
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |