DataOps: How to Develop and Scale Data Intensive Projects
As we build Tinybird, we work hand in hand with many data and engineering teams. In the process we are discovering new ways to develop, maintain and scale data intensive projects.
Anatomy of a modern Data Team
If you are into development you have probably heard of the DevOps culture: a set of practices and tools that allow development teams to improve their productivity and collaboration when building high quality software products.
DevOps is also key for teams that need to iterate faster on their quest to find the right thing to build.
Things like automated testing, continuous integration and deployment, monitoring, configuration and change management… enable the development and operations teams to work as a single team, with end-to-end ownership of the product they are building.
When it comes to data teams things are starting to change. There have been typically three groups in a data team:
- Data scientists: which most of the time work locally running experiments and analyses, or machine learning models that may later need be productised.
- Data engineers: which write and maintain data pipelines.
- Infrastructure engineers: which are in charge of the “big data” infrastructure.
They used to be siloed groups, with long development cycles and most of the time their outputs are cascaded to the next group. Even more, their final product needed to be integrated by a separate team of developers which built the data product for the end users.
The technology and tools that support data intensive applications are only good if they are applied such that it is possible for several people in an organization to collaborate around the same context (the data and the business), iterate on the problem, and continuously deliver high quality solutions.
DataOps: working with Data as if it were Code
A similar culture to DevOps can be applied to data teams: it’s known as DataOps.
DataOps is a set of practices and tools that allow data scientists, data engineers, infrastructure engineers and also developers to collaborate together having full autonomy, ownership and accountability of the data product.
The goal is enabling data teams to handle requirements, develop, deploy and support the data product. With tools that allow them to measure performance, latencies or control SLAs.
In the end, making data teams work with data as if it was source code, so they can iterate faster towards high quality data products.
Continue reading to learn about 10 principles of DataOps we make available for data teams.
What are your main challenges when dealing with large quantities of data? Tell us about them and get started solving them with Tinybird right away.