![]() In MySQL, all operations occur in a transaction. commit makes sure the delete statement is executed right there.autocommit is set to false to avoid log flushing which slows down the query.it’s important to split a large number of rows into smaller chunks before deleting - limitis used for this.The Code set innodb_lock_wait_timeout=1000 set autocommit=0 /* define procedure to run loop to delte 100000 rows at a time */ DELIMITER $$ CREATE PROCEDURE DeleteActions() BEGIN DECLARE counter INT DEFAULT 1 REPEAT delete from actions_action where action NOT IN ('like', 'subscribe') limit 100000 commit SET counter = counter + 1 SELECT SLEEP(2) UNTIL counter >= 200 END REPEAT END$$ DELIMITER /* call the procedure */ CALL DeleteActions() set autocommit=1 ![]() run a loop that deletes rows (using a delete query):.set autocommit to false (this basically releases the buffer bit by bit instead of waiting at the end of the command).Nothing fancy here, we keep it simple, silly! The steps we want to follow: “ Wash, rinse, repeat until zero rows affected”. It may make more sense to directly fetch and delete via the MySQL shell. The idea works in theory, fetch chunks of 10k (or more) at a time and delete rather than deleting all 20 million at once. The script leads to a MySQL timeout error. But for large data sets, it fails to execute. Using this strategy in a loop makes for a decent bulk delete solution. We fetch all primary keys (ids) of the actions to be deleted, 10,000 at a time and use these keys to fetch and delete from the table. One way to do it would be something like this: # fetch all ids of the first 10,000 actions ids = (action='page-load').values_list('id', flat=True) # delete all actions (id_in=list(ids)).delete() Python scripts based solutions, however, are not scalable for bulk data. It’s worth mentioning a python based solution wherein you would write scripts via the web framework (Django in this case) to delete application data. We had to delete around 20 million rows from the actions table, subject to a WHERE clause on a char field. The actual columns are not very relevant to this discussion. Our database had an actions table which had a few columns including char fields, integer fields and foreign keys. Since it’s a user-facing application, it makes sense for us to track all actions performed by users on items (videos and forum posts in this app). The app is essentially a video content streaming and forum application. The backend is written with the following stack: I was working on a backend for a live application (SparkTV), with over a million users. TLDR this article shows ways to delete millions of rows from MySQL, in a live application.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |