Introducing Versions: Develop data products using Git. Join the waitlist

Selective data deletion: a new feature for data quality management

Selective data deletion lets you delete rows of a Tinybird datasource that match a specified delete condition. This is important for an API-first platform.
Jorge Sancha
Co-founder & CEO
Aug 27, 2020
  min read

Data deletion operations are pretty common in transactional databases where your operational data lives. Often due to a data quality process in your operational database you will also need to update or delete your analytical data in Tinybird.

That’s why we already talked about how to update your analytical data selectively but there are other times when you need to fully delete certain data to continue providing reliable enterprise analytics.

Selective data deletion is pretty common in data reconcilliation processes like making your real-time analyses GDPR-compliant

Whether some of the applications ingesting your operational data were buggy, a transient error operating the production database or a change in some regulation, you might need the capability to delete unneeded data influencing your analysis in Tinybird.

A new API endpoint for selective data deletion.

Selective data deletion allows you to delete rows of a Tinybird datasource that match a specified delete condition. For an API-first platform like Tinybird, this operation translates into a secured API endpoint that developers can easily integrate in their real-time data quality management flows.

How to delete data selectively in Tinybird

In Tinybird data is organized in Datasources. Whether you have a CSV file locally or remotely accessible via HTTP(s) you can seamlessly ingest it in a datasource to start analyzing it, building and publishing real-time API endpoints.

Data deletion works by firing a {% code-line %}POST{% code-line-end %} request to the delete API endpoint providing the name of one of your datasources in Tinybird and a {% code-line %}delete_condition{% code-line-end %} parameter, which is an SQL expression filter.

Let’s say we want to delete all the rows from a {% code-line %}transactions{% code-line-end %} datasource for the country {% code-line %}ES{% code-line-end %}. We’d send a POST request to the delete enpoint like this:

The auth token used needs to have the {% code-line %}DATASOURCES:CREATE{% code-line-end %} scope, that way your data is protected from applications and/or users that only have read access to it.

The POST request to the delete API endpoint is asynchronous. It returns a Job response, indicating an ID for the job, the status of the job, the {% code-line %}delete_condition{% code-line-end %} and some other metadata.

Although the delete operation runs asynchronously (hence the job response), the operation waits synchronously for all the mutations to be re-written and data replicas to be deleted.

You can poll periodically the {% code-line %}job_url{% code-line-end %} with the given ID to check the status of the deletion process. When it’s {% code-line %}done{% code-line-end %} it means the data matching the SQL expression filter has been removed and all your pipes and API endpoints will continue running with the remaining data in the datasource.

Beyond data deletion in data quality management processes

While real-time analytical databases are optimized for SELECTs and INSERTs we keep fully supporting other operations needed in data quality management processes. We do that by hiding the complexity of data replication, partitions management or mutations rewriting, so you just have to worry about your data engineering flows and not the internals of real-time analytical databases.

We recommend you check our API docs for more information on how to:

What are your main challenges when dealing with large quantities of data? Request access to Tinybird and get started with real-time analytics right away.