Fun with PostgreSQL, Psycopg2 and Bytea arrays


Despite being a great DBMS, PostgreSQL has a few wrinkles that can cause quite a bit of pain šŸ™‚

One such wrinkle is the insertion of database rows with Bytea arrays.

If you’re not dealing with PostgreSQL or don’t need to wrangle with Bytea arrays feel free to skip this article. If you are, however, this article may save you a lot of time and frustration.

The code

Since the correct number of backslashes in the code below is important but unlikely to be displayed properly in your web browser please view it syntax highlighted here or in plain format here.

The code below consists of roughly two sections:

  • the actual code of interest dealing with the backslash plague and the quoting orgy (prepareByteaString(), lines 49-82)
  • a test harness (lines 84-118) merely facilitating the testing of the function of interest

The problem on hand is that the PostgreSQL Array Value Input syntax requires that the Bytea array literal be enclosed in single quotes. Psycopg2 however quotes individual byte strings using single quotes as well.

Please note: a byte string or array corresponds to one Bytea variable. A Bytea array is hence an array of byte arrays or strings.

The resulting Bytea array literals are a mess and cause syntax errors when used in INSERT statements etc. The approach chosen is to “re-quote” the Bytea literals returned by the Psycopg2 Binary() function from single to double quotes, For more detail please see the comments on lines 68-80.

  36 # created: Thu Oct 25 21:35:50 2007
  37 __version__ = "$Id:$"
  38 # $HeadURL $
  40 import psycopg2 as ps2
  42 class PGT(object):
  43     """Utility functions for using a PostgreSQL database with python"""
  45     def __init__(self):
  46         """initialiser"""
  47         # super(NEW_CLASS, self).__init__()
  49     @staticmethod
  50     def prepareByteaString(byteaSeq):
  51         """
  52         Given a sequence of byte arrays this function prepares a properly
  53         quoted string to be used for inserting database rows with Bytea
  54         arrays.
  56         Given e.g. a table like the following
  57             Create table baex(byteaa Bytea ARRAY[16]);
  58         the resulting string ('bas') can be used as follows:
  59             cursor.execute("INSERT INTO baex(byteaa) values('{%s}')" % bas))
  61         Parameters:
  62         - byteaSeq: a sequence of byte arrays each corresponding to a Bytea
  63                     value in the database
  64         Returns:
  65         string: containing all the byte arrays from the 'byteaSeq' properly
  66                 quoted for utilisation in an INSERT statement
  67         """
  68         # in a first step
  69         #   1. quote all the byte arrays (using the psycopg2 Binary()
  70         #      function)
  71         #   2. strip away the single quotes on the left and right side
  72         #   3. escape any double quote characters with a backslash
  73         # The last step is necessary because we will use the double quote as
  74         # the quote/delimiter for the byte arrays
  75         baSeq = [str(ps2.Binary(ba))[1:-1].replace(r'"', r'\"') for ba in byteaSeq]
  76         # join the prep'ed byte arrays into a single string
  77         bas = "\"%s\"" % '","'.join(baSeq)
  78         # double the number of backslashes (needed because we're inserting a
  79         # Bytea array as opposed to a single Bytea)
  80         bas = bas.replace('\\', '\\\\')
  81         # done!
  82         return(bas)

The code below is only executed when you invoke the Python file directly but will not run if you import it.

  84 if __name__ == '__main__':
  85     ### TEST code ********************************************************
  86     import os, sys
  87     from random import random as rand
  89     # connect to test database
  90     db = ps2.connect("dbname='test' user='postgres'")
  91     cursor = db.cursor()
  92     # create test table
  93     cursor.execute('Create Table public.baex(id Serial, byteaa Bytea ARRAY[16])')

For test purposes I am connecting to my test database (line 90) and creating a test table (baex, line 93).

  95     sys.stdout.flush()
  96     print "\n******************** Bytea data generated: ********************"

Then I generate Bytea data for three database rows and insert it into the database (loop on lines 98-109).

  97     # generate 3 byte array sequences
  98     for rowId in range(1, 4):
  99         byteaSeq = []

Each row is populated with a Bytea array holding of up to seven elements. I am using the random() function to generate the bytes.

 100         # generate 1-7 random byte arrays
 101         for numOfSstrings in range(1, int(rand()*8)):
 102             bytea = ''.join([chr(int(rand()*256)) for x in range(int(rand()*5))])
 103             byteaSeq.append(bytea)
 104         print byteaSeq
 105         # get the INSERT string for the byte array sequence generated
 106         bas = PGT.prepareByteaString(byteaSeq)
 107         # insert the row into the table
 108         cursor.execute("INSERT INTO public.baex(id, byteaa) VALUES(%s, '{%s}')" \
 109                        % (rowId, bas))
 110     db.commit()
 111     sys.stdout.flush()

Once the data is inserted into the database I run psql to check whether everything worked properly..

 113     print "\n******************** Bytea data inserted: ********************"
 114     sys.stdout.flush()
 115     # show the data inserted
 116     os.system("psql -d test -U postgres -c 'SELECT * FROM public.baex'")

.. and finally drop the test table.

 117     cursor.execute('DROP TABLE public.baex')
 118     db.commit()

A few test runs

I have selected the two test runs below because they show some interesting (edge) cases.

The first one shows that single quotes are dealt with properly (last byte in last byte string of first row and first byte string of the third row)

******************** Bytea data generated: ********************
['R\xb7', '\xd7\xcd\xbb', 'u', "\xa3[\x97'"]
['\xfbT\x84g', '\xfa', '']
["'", '\x8d', '', '\xc5\n5', 'A', '\xa4P*']

******************** Bytea data inserted: ********************
 id |                    byteaa
  1 | {"R\\267","\\327\\315\\273",u,"\\243[\\227'"}
  2 | {"\\373T\\204g","\\372",""}
  3 | {',"\\215","","\\305\125",A,"\\244P*"}
(3 rows)

The second test run demonstrates that double quotes are handled correctly (fist byte of first byte string in first row)

******************** Bytea data generated: ********************
['"\xa5\x13', '\x9a`', '', '\xf4\x98']
['u', '?7', '1\xff', '.\x16\xcf', '\xcbe}', '']

******************** Bytea data inserted: ********************
 id |                   byteaa
  1 | {"\"\\245\23","\\232`","","\\364\\230"}
  2 | {""}
  3 | {u,?7,"1\\377",".\26\\317","\\313e}",""}
(3 rows)


Once you bite the bullet and invest the time to think about the problem, code the function and test it, it’s simple šŸ™‚

7 thoughts on “Fun with PostgreSQL, Psycopg2 and Bytea arrays

  1. What a mess. I decided to never use this escaping hell on any postgresql interfaces and just use base64 encoding, which btw is also smaller for binary data.
    So i just use the postgres base64 encode and decode functions like this:

    CREATE TABLE btable (bvalue bytea);
    INSERT INTO btable (bvalue) values(decode(‘%s’,’base64′));
    SELECT encode(bvalue,’base64′) FROM btable;

    Going to arrays this way is a no brainer.


  2. The upcoming PostgreSQL 9.0 release uses a different, more sensible hexadecimal format. That makes the problem go away: the escaped version of the string contains only a two-character header followed by “sensible” characters.

  3. @Olaf H.

    Escaping is a pain, so one usually uses facilities like Binary in psycopg2 to handle them. Base64 is NOT smaller than using raw bytes because a byte has a base of 256.

  4. @muharem, @Randall Smith: I’m a bit confused by this. I, too, thought that psycopg2.Binary() was supposed to take care of this, but the author of this blog post seems to be calling Binary() and *then* escaping… Is the second step necessary, then? Can someone make this clear?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s