Why Slack Removed SSH from 700 Data Pipelines

TL;DR: Slack modernized its data platform by removing SSH access from over 700 data pipelines. The move to a new REST-based system significantly boosts security, reliability, and makes the whole process easier to monitor.
Key facts
- Category
- Infrastructure
- Impact
- High
- Published
- Source
- InfoQ
Full summary
Slack overhauled its data platform, replacing direct SSH access for over 700 jobs to improve security, reliability, and system observability.
Slack has successfully modernized a core part of its data infrastructure by eliminating the use of SSH for its data processing pipelines. The company migrated over 700 automated jobs, managed by the workflow tool Airflow, from an SSH-based execution model to a new, custom-built system called Quarry. This new orchestration layer operates on Amazon's EMR (Elastic MapReduce) service and uses a REST API to manage tasks. Instead of scripts logging into servers directly to run commands, jobs are now controlled through standardized web requests. This change centralizes job management and allows the entire lifecycle of a data task to be handled on the server side, creating a more robust and streamlined process.
This architectural shift delivers significant benefits, particularly in security. By removing direct SSH access to its production data clusters, Slack has drastically reduced a potential attack vector and simplified its security management. The move also enhances reliability, as an API-driven system is less prone to the connection issues and script failures that can plague SSH-based automation. Furthermore, the new platform greatly improves observability. With a centralized API, it becomes much easier to log, monitor, and trace every job, allowing engineers to quickly diagnose and resolve issues. This project serves as a valuable case study for other organizations looking to secure and modernize their own cloud-based data pipelines.
Slack's migration is part of a wider industry trend of moving away from direct, imperative server management towards declarative, API-driven orchestration. This approach aligns with modern DevOps principles like Infrastructure as Code, where systems are managed through auditable and version-controlled definitions rather than manual commands. For technology leaders, this highlights the importance of identifying and replacing legacy processes that rely on direct server access for automation. Investing in a centralized, API-first architecture not only strengthens security but also builds a more scalable and maintainable foundation for future data operations.
Related on Notifire
Primary source: InfoQ