MySQL query optimization will always be one of life’s greatest mysteries to me. When there are three different ways to accomplish the same thing, each one will perform wildly differently from the others.
Example: Table “widgets” has a column “gadget_id” that is an unconstrained foreign key to the primary key on table “gadget.” We need to delete gadgets based on a selection of widgets. In this simple example, a join is the easiest and best way to accomplish this. But in a real-life more complicated scenario, writing the join can by itself be rather time-consuming. In those cases, I am tempted to shortcut development time by using a temporary table or selecting the list I need and then programmatically running through it. However, the extra processing time can be greater than the time I thought I was saving by taking a shortcut with the query.
Test code in Python:
import MySQLdb from timeit import default_timer as timer conn = MySQLdb.connect(host=HOST, port=3306, user=USER, passwd=PASS, db=DB) cursor = conn.cursor() print('Join:') for x in range(3): t1 = timer() cursor.execute( "DELETE gadget.* FROM gadget " "INNER JOIN widget ON widget.gadget_id = gadget.id" " AND gadget_added = 1 AND widget_group = 647" ) t2 = timer() print(t2 - t1) conn.rollback() print('Temporary table:') for x in range(3): t1 = timer() cursor.execute( "DELETE FROM gadget WHERE id IN " "(SELECT gadget_id FROM widget WHERE " "gadget_added = 1 AND widget_group = 647)" ) t2 = timer() print(t2 - t1) conn.rollback() print('Programmatically:') for x in range(3): t1 = timer() cursor.execute( "SELECT gadget_id FROM widget " "WHERE gadget_added = 1 AND widget_group = 647" ) rows = cursor.fetchall() for row in rows: cursor.execute( "DELETE FROM gadget WHERE id = %s", [row] ) t2 = timer() print(t2 - t1) conn.rollback()
Join: 1.3677521950000937 1.0999519849999615 1.0096592069999133 Temporary table: 22.180368934999933 17.282324567999922 16.706904645999884 Programmatically: 9.796613559999969 9.701168091 9.730797720000055
In this example, the join gives us the fastest results by far, followed by the programmatic loop, and then the temporary table. Your results, though, may vary depending on the table structure, indexes, resource usage, and other factors.
Somewhere out there are expert database administrators who understand all of this. I’ve simply learned to be ready to attack my problems from multiple angles to get the results I need.