You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"<a href=\"https://colab.research.google.com/github/Animeshcoder/MySQL-Python/blob/main/Python_MySQL_P6.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
27
+
]
28
+
},
29
+
{
30
+
"cell_type": "markdown",
31
+
"source": [
32
+
"### **Introduction:**\n",
33
+
"This project is a Python script that demonstrates how to identify and remove duplicate rows from a MySQL database table. The script defines a list of SQL statements that are used to perform this task in three steps. The first step uses a self-join to identify rows in the table that have the same values for the Name, Phone No, and Age columns. The second step updates NULL values in one row with values from the other row. The third step removes duplicate rows from the table. This project can serve as a starting point for building more complex data processing pipelines that involve cleaning and deduplicating data in a MySQL database.\n",
34
+
"\n"
35
+
],
36
+
"metadata": {
37
+
"id": "cstqU4qOs9_7"
38
+
}
39
+
},
40
+
{
41
+
"cell_type": "markdown",
42
+
"source": [
43
+
"### **Steps Involved:**\n",
44
+
"\n",
45
+
"**Step 1: Connect to the database**\n",
46
+
"\n",
47
+
"The first part of the script sets up the connection details for the MySQL database and creates a connection object using PyMySQL’s connect function. The user, password, host, and database arguments are passed to this function as a dictionary to specify the connection details.\n",
48
+
"\n",
49
+
"**Step 2: Define SQL statements**\n",
50
+
"\n",
51
+
"Next, a list of SQL statements is defined that will be used to identify and remove duplicate rows from the table. The first statement is a SELECT statement that uses a self-join to identify rows in the table that have the same values for the Name, Phone No, and Age columns. The second statement is an UPDATE statement that updates NULL values in one row with values from the other row. The third statement is a DELETE statement that removes duplicate rows from the table.\n",
52
+
"\n",
53
+
"**Step 3: Execute SQL statements**\n",
54
+
"\n",
55
+
"A cursor object is created using the cursor method of the connection object. This cursor is used to execute the first SQL statement in the list and fetch all rows of the result set. The remaining SQL statements in the list are then executed one by one using a loop.\n",
56
+
"\n",
57
+
"**Step 4: Commit changes and close connection**\n",
58
+
"\n",
59
+
"After all SQL statements have been executed, any changes made to the database are committed by calling the connection object’s commit method. Finally, both cursor and connection objects are closed by calling their respective close methods."
60
+
],
61
+
"metadata": {
62
+
"id": "w_hb5MRXSAWL"
63
+
}
64
+
},
65
+
{
66
+
"cell_type": "code",
67
+
"execution_count": null,
68
+
"metadata": {
69
+
"id": "34Oi4IcrslSU"
70
+
},
71
+
"outputs": [],
72
+
"source": [
73
+
"import pymysql\n",
74
+
"\n",
75
+
"# Replace these values with your MySQL connection details\n",
76
+
"db_config = {\n",
77
+
" 'user': 'youruser',\n",
78
+
" 'password': 'yourpassword@123',\n",
79
+
" 'host': 'yourhost',\n",
80
+
" 'database': 'yourdatabasename'\n",
81
+
"}\n",
82
+
"\n",
83
+
"# Connect to the MySQL database\n",
84
+
"cnx = pymysql.connect(**db_config)\n",
85
+
"cursor = cnx.cursor()\n",
86
+
"\n",
87
+
"# Define the SQL statements\n",
88
+
"sql_statements = [\n",
89
+
"\"\"\"\n",
90
+
" -- Step 1: Identify matching rows using a self-join\n",
91
+
" SELECT t1.*, t2.*\n",
92
+
" FROM table t1\n",
93
+
" JOIN table t2\n",
94
+
" ON t1.`Name` = t2.`Name` AND t1.`Phone No` = t2.`Phone No` AND t1.`Age` = t2.`Age`\n",
95
+
" WHERE t1.id < t2.id;\n",
96
+
"\"\"\",\n",
97
+
"\"\"\"\n",
98
+
" -- Step 2: Update NULL values in one row with values from the other row\n",
99
+
" UPDATE table t1\n",
100
+
" JOIN table t2\n",
101
+
" ON t1.`Name` = t2.`Name` AND t1.`Phone No` = t2.`Phone No` AND t1.`Age` = t2.`Age`\n",
102
+
" SET t1.`Date of Birth` = COALESCE(t1.`Date of Birth`, t2.`Date of Birth`),\n",
103
+
" WHERE t1.id < t2.id;\n",
104
+
"\n",
105
+
"\"\"\",\n",
106
+
"\"\"\"\n",
107
+
"-- Step 3: Delete duplicate rows from the table\n",
108
+
"DELETE FROM ntable\n",
109
+
"WHERE id NOT IN (\n",
110
+
"SELECT * FROM (\n",
111
+
"SELECT MIN(id)\n",
112
+
"FROM ntable\n",
113
+
"GROUP BY `Name`, `Phone No`, `Age`\n",
114
+
") AS x\n",
115
+
");\n",
116
+
"\"\"\"\n",
117
+
"]\n",
118
+
"\n",
119
+
"# Execute the first SQL statement and fetch all rows of the result set\n",
120
+
"cursor.execute(sql_statements[0])\n",
121
+
"rows = cursor.fetchall()\n",
122
+
"\n",
123
+
"# Execute the remaining SQL statements one by one\n",
0 commit comments