Lab: Database Interfaces and SQL Injection
Table of Contents
1 Administrivia
- I have restructured the second half of the course somewhat:
- Major change: project 2 will go out after midterm break
- Lab topics and other deadlines have been adjusted accordingly
- Midterm will be posted on Monday, due 9pm Friday, April 30
- No time limit
- Open note, open book, open anything linked from the course web page, open computer (writing code/queries, doing math)
- Closed internet (other than course web page links), closed any other person besides me
- A mix of short answer and multiple choice questions covering material from the first four weeks
- Will be on Gradescope, like HW2
- Gradescope has a strange interface: saving your answers "submits" the work you've done so far. You continue working by clicking on "Resubmit". You can submit/resubmit any number of times.
2 Command-Line Interface
- This is all we've worked with up to this point
- SQLite is a "serverless" database system, meaning I just need the
sqlite3
executable and a database file- No need for the database system to already be running and configured, we're not connecting to anything
$ sqlite3 musicbrains-cs344-s21.db SQLite version 3.28.0 2019-04-15 14:49:49 Enter ".help" for usage hints. sqlite>
.help
shows us all sorts of useful things:.tables
and.indexes
to list tables and indexes.dump
to generate SQL to create tables and insert data.timer on
to display timing information after each query.headers on
to display column headers on output.excel
to direct output to a CSV file and open the file upon query completion.output FILE
to send output to a file.expert
is a neat experimental feature that will suggest useful indexes for a querysqlite> .expert sqlite> select artist.name, count(distinct artist_alias.name) as count ...> from artist, artist_alias, area ...> where artist.id = artist_alias.artist and artist.area = area.id and area.name = 'United Kingdom' and artist.begin_date_year > 1950 ...> group by artist.id ...> order by count desc ...> limit 10; CREATE INDEX artist_alias_idx_00643fc7 ON artist_alias(artist); CREATE INDEX artist_idx_f8d047e7 ON artist(area, begin_date_year); CREATE INDEX area_idx_22db824a ON area(name, id); SEARCH TABLE area USING COVERING INDEX area_idx_22db824a (name=?) SEARCH TABLE artist USING INDEX artist_idx_f8d047e7 (area=? AND begin_date_year>?) SEARCH TABLE artist_alias USING INDEX artist_alias_idx_00643fc7 (artist=?) USE TEMP B-TREE FOR GROUP BY USE TEMP B-TREE FOR ORDER BY
- A covering index is an index that contains all the attributes involved in a query. This can improve performance because a query can retrieve all the information it needs from the index and never have to go to the original data file
- Let's see how much the expert advice speeds up this query:
sqlite> .timer on sqlite> select artist.name, count(distinct artist_alias.name) as count ...> from artist, artist_alias, area ...> where artist.id = artist_alias.artist and artist.area = area.id and area.name = 'United Kingdom' and artist.begin_date_year > 1950 ...> group by artist.id ...> order by count desc ...> limit 10; name|count The Beatles|35 Judas Priest|16 Deep Purple|14 Orchestral Manoeuvres in the Dark|14 The KLF|14 Led Zeppelin|13 T. Rex|13 National Philharmonic Orchestra|13 Chris Barber’s Jazz Band|12 Heller & Farley|12 Run Time: real 2.239 user 2.154869 sys 0.074757 sqlite> CREATE INDEX artist_alias_idx_00643fc7 ON artist_alias(artist); Run Time: real 0.184 user 0.144469 sys 0.037718 sqlite> CREATE INDEX artist_idx_f8d047e7 ON artist(area, begin_date_year); Run Time: real 1.424 user 1.169061 sys 0.229029 sqlite> CREATE INDEX area_idx_22db824a ON area(name, id); Run Time: real 0.082 user 0.066150 sys 0.013522 sqlite> select artist.name, count(distinct artist_alias.name) as count ...> from artist, artist_alias, area ...> where artist.id = artist_alias.artist and artist.area = area.id and area.name = 'United Kingdom' and artist.begin_date_year > 1950 ...> group by artist.id ...> order by count desc ...> limit 10; name|count The Beatles|35 Judas Priest|16 Deep Purple|14 Orchestral Manoeuvres in the Dark|14 The KLF|14 Led Zeppelin|13 T. Rex|13 National Philharmonic Orchestra|13 Chris Barber’s Jazz Band|12 Heller & Farley|12 Run Time: real 0.094 user 0.056785 sys 0.035941
- Ok, but what did this cost us in terms of storage space?
- SQLite provides a
dbstat
table that contains information about database internalssqlite> select sum(pgsize)/(1024*1024) from dbstat where name like '%idx%'; 23
- The pages for the indexes are taking up about 23 MB
- To verify, we check the size of the
.db
file-rw-r--r-- 1 awb awb 548M Apr 23 12:06 musicbrainz-cs334-s21.db
- Delete the indexes
sqlite> drop index area_idx_22db824a; sqlite> drop index artist_idx_f8d047e7; sqlite> drop index artist_alias_idx_00643fc7;
- Check file size again
-rw-r--r-- 1 awb awb 548M Apr 23 12:17 musicbrainz-cs334-s21.db
- Hmm, unchanged. What does
dbstat
say?sqlite> select sum(pgsize)/(1024*1024) from dbstat where name like '%idx%'; sqlite> select sum(pgsize)/(1024*1024) from dbstat; 524
dbstat
indicates no pages are being used for the indexes and that the total size is 524 MB. What's going on here is that SQLite doesn't automatically compact the database file. Right now the pages that were used for the indexes are still sitting around. We can use theVACUUM
statement to tell SQLite to compact the database file into the minimum size.sqlite> VACUUM;
-rw-r--r-- 1 awb awb 525M Apr 23 12:22 musicbrainz-cs334-s21.db
- SQLite provides a
3 GUI Interface: pgAdmin
- Database systems typically provide a GUI interface as well
- For Postgres, this is the pgAdmin application
- Since Postgres is a client-server database, it needs to already be running, and we use pgAdmin to connect to it
- Can use this for all kinds of administrative tasks, monitoring, writing queries, and analyzing performance
- Even manually editing data!
- For our last lab of the term we'll take a closer look at the configuration and administrative side
- If you'd like to play around with pgAdmin, they have an online demo version: https://www.pgadmin.org/try/
4 Programming Interface: Python + sqlite3
- Of course it's very common to need to write software that interacts with a database
- There are standard APIs that mainstream database systems support
- Open Database Connectivity (ODBC) for C
- Java Database Connectivity (JDBC) for Java, based on ODBC
- These are just interfaces, and a library would implement the API for a specific database system
- Python has a built-in
sqlite3
module (https://docs.python.org/3/library/sqlite3.html), and modules exists for Postgres, MySQL, etc.
- So let's take Python + SQLite for a spin
import sqlite3 con = sqlite3.connect("musicbrainz-cs334-s21.db") q = con.execute("select * from medium_format") print(q.fetchall())
- Just running a query is something we could do without Python, so let's make it more interesting by doing something that's much easier in Python than SQLite
- For example, SQLite doesn't have a funtion to reverse a string, so if we wanted to reverse all the format names, we could do
import sqlite3 con = sqlite3.connect("musicbrainz-cs334-s21.db") q = con.execute("SELECT * FROM medium_format") formats = q.fetchall() for i, name, description in formats: print(name) con.execute(f"UPDATE medium_format SET name='{name[::-1]}' WHERE id={i}") con.commit() con.close()
- We could even get the user in on the fun
import sqlite3 con = sqlite3.connect("musicbrainz-cs334-s21.db") artist = input("What artist would you like to search for? ") q = con.execute(f"""SELECT release.name FROM release, artist_credit WHERE release.artist_credit = artist_credit.id AND artist_credit.name='{artist}'""") print(f"{artist} has these releases in the database:") for name in q: print(name) con.commit() con.close()
- The way I have handled user input is a BAD IDEA™
$ python3 sqlite-fun.py What artist would you like to search for? ' UNION SELECT name from release_status; -- ' UNION SELECT name from release_status; -- has these releases in the database: ('Bootleg',) ('Official',) ('Promotion',) ('Pseudo-Release',)
- This is called a SQL injection attack (https://xkcd.com/327/)
- The user was able to gain access to unrelated information in the database
- Imagine if there was private customer data or credit card information or passwords stored in another table…
- We can rewrite the Python to protect against this
import sqlite3 con = sqlite3.connect("musicbrainz-cs334-s21.db") con.set_trace_callback(print) artist = input("What artist would you like to search for? ") q = con.execute(f"""SELECT release.name FROM release, artist_credit WHERE release.artist_credit = artist_credit.id AND artist_credit.name=?""", (artist,)) print(f"{artist} has these releases in the database:") for name in q: print(name) con.commit() con.close()
5 SQL Injection
A great site about SQL Injection with examples and explanations: http://databases.cs.virginia.edu/sqlinject/