Introduction
Despite being a great DBMS, PostgreSQL has a few wrinkles that can cause quite a bit of pain 🙂
One such wrinkle is the insertion of database rows with Bytea
arrays.
If you’re not dealing with PostgreSQL or don’t need to wrangle with Bytea
arrays feel free to skip this article. If you are, however, this article may save you a lot of time and frustration.
The code
Since the correct number of backslashes in the code below is important but unlikely to be displayed properly in your web browser please view it syntax highlighted here or in plain format here.
The code below consists of roughly two sections:
- the actual code of interest dealing with the backslash plague and the quoting orgy (
prepareByteaString()
, lines 49-82)
- a test harness (lines 84-118) merely facilitating the testing of the function of interest
The problem on hand is that the PostgreSQL
Array Value Input syntax requires that the Bytea
array literal be enclosed in single quotes. Psycopg2
however quotes individual byte strings using single quotes as well.
Please note: a byte string or array corresponds to one Bytea
variable. A Bytea
array is hence an array of byte arrays or strings.
The resulting Bytea
array literals are a mess and cause syntax errors when used in INSERT
statements etc. The approach chosen is to “re-quote” the Bytea
literals returned by the Psycopg2
Binary()
function from single to double quotes, For more detail please see the comments on lines 68-80.
36 # created: Thu Oct 25 21:35:50 2007
37 __version__ = "$Id:$"
38 # $HeadURL $
39
40 import psycopg2 as ps2
41
42 class PGT(object):
43 """Utility functions for using a PostgreSQL database with python"""
44
45 def __init__(self):
46 """initialiser"""
47 # super(NEW_CLASS, self).__init__()
48
49 @staticmethod
50 def prepareByteaString(byteaSeq):
51 """
52 Given a sequence of byte arrays this function prepares a properly
53 quoted string to be used for inserting database rows with Bytea
54 arrays.
55
56 Given e.g. a table like the following
57 Create table baex(byteaa Bytea ARRAY[16]);
58 the resulting string ('bas') can be used as follows:
59 cursor.execute("INSERT INTO baex(byteaa) values('{%s}')" % bas))
60
61 Parameters:
62 - byteaSeq: a sequence of byte arrays each corresponding to a Bytea
63 value in the database
64 Returns:
65 string: containing all the byte arrays from the 'byteaSeq' properly
66 quoted for utilisation in an INSERT statement
67 """
68 # in a first step
69 # 1. quote all the byte arrays (using the psycopg2 Binary()
70 # function)
71 # 2. strip away the single quotes on the left and right side
72 # 3. escape any double quote characters with a backslash
73 # The last step is necessary because we will use the double quote as
74 # the quote/delimiter for the byte arrays
75 baSeq = [str(ps2.Binary(ba))[1:-1].replace(r'"', r'\"') for ba in byteaSeq]
76 # join the prep'ed byte arrays into a single string
77 bas = "\"%s\"" % '","'.join(baSeq)
78 # double the number of backslashes (needed because we're inserting a
79 # Bytea array as opposed to a single Bytea)
80 bas = bas.replace('\\', '\\\\')
81 # done!
82 return(bas)
The code below is only executed when you invoke the Python
file directly but will not run if you import it.
84 if __name__ == '__main__':
85 ### TEST code ********************************************************
86 import os, sys
87 from random import random as rand
88
89 # connect to test database
90 db = ps2.connect("dbname='test' user='postgres'")
91 cursor = db.cursor()
92 # create test table
93 cursor.execute('Create Table public.baex(id Serial, byteaa Bytea ARRAY[16])')
For test purposes I am connecting to my test database (line 90) and creating a test table (baex
, line 93).
95 sys.stdout.flush()
96 print "\n******************** Bytea data generated: ********************"
Then I generate Bytea
data for three database rows and insert it into the database (loop on lines 98-109).
97 # generate 3 byte array sequences
98 for rowId in range(1, 4):
99 byteaSeq = []
Each row is populated with a Bytea
array holding of up to seven elements. I am using the random()
function to generate the bytes.
100 # generate 1-7 random byte arrays
101 for numOfSstrings in range(1, int(rand()*8)):
102 bytea = ''.join([chr(int(rand()*256)) for x in range(int(rand()*5))])
103 byteaSeq.append(bytea)
104 print byteaSeq
105 # get the INSERT string for the byte array sequence generated
106 bas = PGT.prepareByteaString(byteaSeq)
107 # insert the row into the table
108 cursor.execute("INSERT INTO public.baex(id, byteaa) VALUES(%s, '{%s}')" \
109 % (rowId, bas))
110 db.commit()
111 sys.stdout.flush()
Once the data is inserted into the database I run psql
to check whether everything worked properly..
113 print "\n******************** Bytea data inserted: ********************"
114 sys.stdout.flush()
115 # show the data inserted
116 os.system("psql -d test -U postgres -c 'SELECT * FROM public.baex'")
.. and finally drop the test table.
117 cursor.execute('DROP TABLE public.baex')
118 db.commit()
A few test runs
I have selected the two test runs below because they show some interesting (edge) cases.
The first one shows that single quotes are dealt with properly (last byte in last byte string of first row and first byte string of the third row)
******************** Bytea data generated: ********************
['R\xb7', '\xd7\xcd\xbb', 'u', "\xa3[\x97'"]
['\xfbT\x84g', '\xfa', '']
["'", '\x8d', '', '\xc5\n5', 'A', '\xa4P*']
******************** Bytea data inserted: ********************
id | byteaa
----+-----------------------------------------------
1 | {"R\\267","\\327\\315\\273",u,"\\243[\\227'"}
2 | {"\\373T\\204g","\\372",""}
3 | {',"\\215","","\\305\125",A,"\\244P*"}
(3 rows)
The second test run demonstrates that double quotes are handled correctly (fist byte of first byte string in first row)
******************** Bytea data generated: ********************
['"\xa5\x13', '\x9a`', '', '\xf4\x98']
[]
['u', '?7', '1\xff', '.\x16\xcf', '\xcbe}', '']
******************** Bytea data inserted: ********************
id | byteaa
----+--------------------------------------------
1 | {"\"\\245\23","\\232`","","\\364\\230"}
2 | {""}
3 | {u,?7,"1\\377",".\26\\317","\\313e}",""}
(3 rows)
Conclusion
Once you bite the bullet and invest the time to think about the problem, code the function and test it, it’s simple 🙂