Retrieving data with Perl DBI DBD::ODBC

Contents

Prerequisites

The prerequisite software for this tutorial is:

  1. Perl

    We used Perl 5.8, but you only need the minimum version required by the DBI and DBD::ODBC modules, which is currently 5.6. Use perl --version to find out what version of Perl you have installed.

  2. DBI module

    We used DBI 1.45 and DBI 1.607, but the samples in this tutorial should work with anything after 1.40. To find out if you have a recent enough version of DBI installed, run:

    perl -e 'use DBI 1.40;'

    If you get an error like "DBI version 1.40 required--this is only version 1.30 at -e line 1," you need to upgrade DBI.

    If you get an error saying "DBI cannot be found in @INC," you have probably not got DBI installed.

    Go to CPAN to get an up-to-date version of the DBI module.

  3. DBD::ODBC

    We used DBD::ODBC 1.11 and DBD::ODBC 1.17. You can use similar methods to the ones shown above to check whether DBD::ODBC is installed and to see what version you have.

    To check that you have the DBD::ODBC module installed:

    perl -e 'use DBD::ODBC;'

    If you have not got DBD::ODBC installed, refer to Enabling ODBC support in Perl with Perl DBI and DBD::ODBC for installation instructions.

    To show the DBD::ODBC version:

    perl -MDBD::ODBC -e 'print $DBD::ODBC::VERSION;'

    To show all drivers DBI knows about and their versions:

    perl -MDBI -e 'DBI->installed_versions;'

We also used the Easysoft ODBC-ODBC Bridge as the ODBC driver to access a remote Microsoft SQL Server database from UNIX.

Assumptions

  1. Operating system

    This tutorial was designed on UNIX and Linux, and we have assumed you are using UNIX or Linux too. However, all the Perl examples should work equally well on Windows as long as some minor alterations to the command line are made.

  2. ODBC Driver Manager

    We assume that you're using the unixODBC Driver Manager.

Perl DBI and DBD::ODBC architecture

Perl script using DBI methods
            |
            v
           API
            |
            v
           DBI
            |
            v
         DBD::ODBC
            |
            v
    ODBC Driver Manager (for example, unixODBC)
            |
            v
        ODBC driver

Simple methods of retrieving data

Specifying what you want

You specify the data you want to retrieve from the database by using SQL SELECT statements. The basic form of a SQL SELECT statement is:

SELECT column_list from table

where:

You can qualify the rows you want back by adding a WHERE clause like:

WHERE column = 'text'

Your SQL SELECT statement is passed to the DBI prepare method, which, in ODBC terms, ends up in a call to the SQLPrepare or SQLExecDirect ODBC APIs.

Be careful when creating SQL from Perl variables, as you can include characters that invalidate the SQL. For example:

my $name = "can't";
my $sql = qq/select * from table where column = '$name'/;

$sql is invalid because it's now:

select * from table where column = 'can't'

Most databases would require the second ' to be doubled up. You can use the DBI quote method to do this for you with:

my $name = $dbh->quote("can't");
my $sql = qq/select * from table where column = $name/;

The other way (recommended) is to use bound parameters.

Simple prepare and execute

This code obtains and displays all the rows in a table:

my $dbh = DBI->connect();               # connect
my $sql = qq/select * from table/;      # the query to execute
my $sth = $dbh->prepare($sql);          # prepare the query
$sth->execute();                        # execute the query
my @row;
while (@row = $sth->fetchrow_array) {  # retrieve one row
    print join(", ", @row), "\n";
}

Here, we prepare the SQL, execute it, and then use the fetchrow_array method to return an array of column values for each row. Repeatedly calling the fetchrow_array method until it returns undef, retrieves all rows in the result set.

Note NULL fields are returned as undef in the array returned by fetchrow_array (more about NULL values later).

Note Don't use fetchrow_array in a scalar context unless you really know what you are doing, as it can return the first or last column (dependent on the ODBC driver) which could be a NULL value and therefore undef. Also fetchrow_array can return undef if there are no more rows or if an error occurs.

There are other methods of obtaining the result set — refer to later sections in this tutorial.

Obtaining metadata from the result set

ODBC can describe the result set for you by listing the number of columns and the names of the columns. You can obtain this information with the statement attributes NUM_OF_FIELDS, NAME, NAME_uc, NAME_lc, NAME_hash, NAME_lc_hash, and NAME_uc_HASH.

Assuming you have the table fred with columns a (integer), b (char(10)) and c (float):

my $sth = $dbh->prepare(select * from fred);

$sth->{NUM_OF_FIELDS} returns 3.
$sth->{NAME}->[0] returns 'a' although it could return 'A'
$sth->{NAME_uc}->[0] returns 'A'.
$sth->{NAME_lc}->[0] returns 'a'.

Note The NAME attribute can return lower case or upper case column names depending on the database. Some database always uppercase column names in CREATE statements if they are unquoted. Some databases retain the case of column names if they are enclosed in the identifier quote character. For this reason, it's best to use NAME_uc or NAME_lc.

Note. Some ODBC drivers may be unable to return a column name at all if the column is an expression like 1 or a function. For example, SELECT 1 FROM table in SQL Server returns an empty string for $sth-{NAME}->[0]. You can get around this by using column aliases as in SELECT 1 as col1 FROM table.

NAME_hash, NAME_lc_hash, and NAME_uc_hash are like NAME, NAME_lc and NAME_uc except that the result is a hash instead of an array, with the keys being the column names, and the values being the column index starting at 0.

Similarly, the TYPE attribute returns an array reference of column types. For the fred table above:

print join(", ", @{$sth->{TYPE}}), "\n";

returns 4, 1, and 6. The column types are defined by international standards (refer to the DBI manual).

Using parameters

The main reasons for using parameters are:

  1. You can prepare the SQL once then execute many times with different parameters thus skipping the prepare parsing.
  2. With bound parameters, you don't need to bother about quoting issues.

Expanding the simple example in Simple prepare and execute, to include a WHERE clause, we have:

my $dbh = DBI->connect();              # connect
my $var = $dbh->quote("value to search for");
my $sql = qq/select * from table where column = $var/;
                                        # the query to execute
my $sth = $dbh->prepare($sql);         # prepare the query
$sth->execute();                       # execute the query
my @row;
while (@row = $sth->fetchrow_array) { # retrieve one row
    print join(", ", @row), "\n";
}

This is fine but what if we want to execute the query multiple times with different values for $var? We can repeat the $sql assignment with different $var values and re-run $dbh->prepare, but this is inefficient because it causes the ODBC driver and database to re-parse the SQL each time, which is unnecessary.

A better solution is to use the following:

my @row;                                # row data
my $dbh = DBI->connect();              # connect
my $var = "value to search for";
my $sql = qq/select * from table where column = ?/;
                                        # the query to execute with parameter
my $sth = $dbh->prepare($sql);         # prepare the query
$sth->execute(($var));                 # execute the query with parameter
while (@row = $sth->fetchrow_array) { # retrieve one row
    print join(", ", @row), "\n";
}
$var = "another value to search for";
$sth->execute(($var));                 # execute the query (no need to re-prepare)
while (@row = $sth->fetchrow_array) { # retrieve one row
    print join(", ", @row), "\n";
}

Here, the SQL contains a parameter marker (?) indicating to the ODBC driver that we will provide the values later. The SQL is passed to the prepare method and the database will parse the SQL and note a parameter is required. When the execute method is called, we pass the parameters required. You can use multiple parameter markers, for example:

$sql = q/select * from table where col1 = ? and col2 = ?/;
.
.
$sth->execute(($param1, $param2));

Note You must pass the array of parameters in the order that you want them to match the parameter markers in the SQL. In the above example, $param1 substitutes the first parameter marker in col1 = ? and $param2 substitutes the parameter marker in col2 = ?.

Note You can't use parameter markers in place of column names. For example, select ? from table or select * from table where ? = 'A'.

When passing the parameters to the execute method, a default binding type is used. DBD::ODBC attempts to find out what the parameter markers represent and bind the parameters as the same type. If your ODBC driver does not have the ODBC API SQLDescribeParam, a string binding is used (SQL_VARCHAR). You can also bind the parameters yourself before the execute call, and in this way, you can specify how the parameters should be bound:

# import SQL types for use in bind_param method:
use DBI qw(:sql_types);

$sql = q/select * from table where col1 = ?/;
$sth = $dbh->prepare($sql);

# uses default bind type:
$sth->bind_param(1, $param1);
# Specify the parameter should be bound as an SQL_INTEGER
$sth->bind_param(1, $param1, {TYPE => SQL_INTEGER});

Note Once the parameters are bound, you can't change the type they are bound as, although you can obviously change the bound values.

Using bound columns

In previous examples in this tutorial, we used the fetchrow_array method to retrieve the result set. DBI also provides a method to bind the returned column data to Perl variables using the bind_col and bind_columns methods:

Going back to the first example, we had:

my $dbh = DBI->connect();             # connect
my $sql = qq/select * from table/;     # the query to execute
my $sth = $dbh->prepare($sql);         # prepare the query
$sth->execute();                       # execute the query
my @row;
while (@row = $sth->fetchrow_array) { # retrieve one row
    print join(", ", @row), "\n";
}

which can be rewritten to use bound columns as follows:

my $dbh = DBI->connect();             # connect
my $sql = qq/select * from table/;    # the query to execute
my $sth = $dbh->prepare($sql);        # prepare the query
$sth->execute();                      # execute the query
my ($col1, $col2, $col3);
# Note columns start at 1 (not 0).
$sth->bind_col(1, \$col1);
$sth->bind_col(2, \$col2);
$sth->bind_col(3, \$col3);
while ($sth->fetch) { # retrieve one row
    print "$col1, $col2, $col3\n";
}

Binding columns and using the fetch method is usually faster than using methods like fetchrow_array. As with bind_param, you can specify the type the column is bound as.

$sth->bind_col(1, \$col1, {TYPE => SQL_INTEGER});
$sth->bind_col(2, \$col2, {TYPE => SQL_CHAR});$sth->bind_col(3, \$col3, {TYPE => SQL_FLOAT});

In ODBC, there's usually no need to specify the type. Instead of separate calls to bind_col, you can bind all the columns in the above example in one go with:

$sth->bind_columns(\$col1, \$col2, \$col3);

–Or–

$sth->bind_columns(\($col1, $col2, $col3));

Hash and reference methods of retrieving data

DBI supports a number of methods that return a result set (or part of it) in hash or array references. The following examples assume you have already prepared and executed a SELECT on a table containing two rows with two columns named a and b:

column a | Column b
---------+---------
       1 |      one
       2 |      two

fetchrow_arrayref

my $ref;
while($ref = $sth->fetchrow_arrayref) {
    print join (", ", @{$ref}), "\n";
}
1, one  
2, two

fetchrow_hashref

my $ref;
while($ref = $sth->fetchrow_hashref) {
        print join (", ", keys %$ref), "\n";
        print join (", ", values %$ref), "\n";
}

a, b
1, one
a, b
2, two

fetchall_arrayref

#
# You must set RaiseError or check $sth->err because
# fetchall_arrayref returns the data fetched so far.
#
$sth->{RaiseError} = 1;
my $ref;
$ref = $sth->fetchall_arrayref;
print "Number of rows returned is ", 0 + @{$ref}, "\n";
foreach $r (@{$ref})
{
    print join(", ", @{$r}), "\n";
}
1, one
2, two

You can add a slice to fetchall_arrayref to specify which columns to return:

  • To return all columns as above:
    $ref = $sth->fetchall_arrayref([]);
    1, one
    2, two
  • To return only the first column:
    $ref = $sth->fetchall_arrayref([0]);
    1
    2
  • To return only the last column:
    $ref = $sth->fetchall_arrayref([-1]);
    one
    two
  • To return each row as a hashref:
    $ref = $sth->fetchall_arrayref({});
    print "Number of rows returned is ", 0 + @{$ref}, "\n";
    print join (", ", keys %{$ref->[0]}), "\n";
    foreach $r (@{$ref})
    {
        print join(", ", (values %$r)), "\n";
    }
    print "Number of rows returned is ", 0 + @{$ref}, "\n";
    print join (", ", keys %{$ref->[0]}), "\n";
    foreach $r (@{$ref})
    {
        print join(", ", (values %$r)), "\n";
    }
    
    Number of rows returned is 2
    a
    1
    2

    Note When specifying the names of the columns to return in this way, the case of keys for the returned hashes always matches the case used in the parameter to fetchall_hashref, regardless of what FetchKeyHashName is set to.

You can add a max_rows argument to fetchall_arrayref to restrict the number of rows:

$ref = $sth->fetchall_arrayref([], 1);
1, one

This method is particularly useful if:

  • Your database engine does not support the top n SQL syntax.
  • You want the increased performance of fetchall_arrayref.
  • You want to keep memory down but also need to process all the rows: fetchall_arrayref can be called repeatedly on the same result set to get rows in chunks.

A rather elegant way of doing this, shown in the DBI documentation, is:

$sth->{RaiseError} = 1;
my $rows = []; # cache for batches of rows
while(my $row = (shift(@$rows) || # get row from cache, or reload cache:
                 shift(@{$rows=$sth->fetchall_arrayref(undef,1)||[]}))) {
    print join(", ", @{$row}), "\n";
}
1, one       
2, two

Special cases

SQL_CHAR types and trailing spaces

Databases store char(n) columns as exactly n characters. For example, you insert FRED into a char(10) column. When you retrieve the column value, you'll get 'FRED '. This often leads to confusion, especially in tests like $var eq 'FRED'. With the following table definition and INSERT statements:

create table tut2_8 (a int, b char(10))
insert into tut2_8 values (1, 'one')
insert into tut2_8 values (2, 'two')

the following code never prints out Found 'two'.

my ($col1, $col2);
$sth->bind_columns(\($col1, $col2));
while ($sth->fetch) { # retrieve one row
    print "$col1, $col2\n";
    print "Found 'two'" if ($col2 eq 'two');
}

The ChopBlanks attribute is provided to help you here. ChopBlanks may be set on the connection or statement handle and is inherited from connections. The default for ChopBlanks is false. If you insert $sth->{ChopBlanks} = 1; before the call to execute, the above test now works.

Note ChopBlanks only works on fixed-length CHAR columns.

Long columns

Suppose you create a table with the following code:

my $dbh = DBI->connect();             # connect
$dbh->do(q/create table tut2_9 (a int, b text)/);
my $sth = $dbh->prepare(q/insert into tut2_9 values (1, ?)/);
my $p = 'x' x 500;
$sth->execute($p);

The text column type here is a SQL Server long data type. Other databases have memo, blob, and so on.

The following data retrieval code:

$sth = $dbh->prepare(q/select * from tut2_9/);
$sth->execute;
my ($col1, $col2);
$sth->bind_columns(\($col1, $col2));
while ($sth->fetch) { # retrieve one row
    print "$col1, $col2\n";
}

fails with:

DBD::ODBC::st fetch failed: [unixODBC][][Microsoft]
[ODBC SQL Server Driver]
String data, right truncation (SQL-01004)
(DBD: st_fetch/SQLFetch 
(long truncated DBI attribute LongTruncOk not set 
and/or LongReadLen too small) err=-1) at ./tut2_9.pl line 19.

The attributes LongReadLen and LongTruncOk control how long columns are retrieved. By default, DBD::ODBC has LongReadLen set to 80 and LongTruncOk is false. You can find the current settings with:

print "LongReadLen is '", $h->{LongReadLen}, "'\n";
print "LongTruncOk is ", $h->{LongTruncOk}, "\n";

When LongTruncOk is false (as above), attempts to retrieve more than LongReadLen bytes from a column fail. If you're not bothered about the column data being truncated, set LongTrunkOk to 1. If you need more than 80 bytes from the column, you need to set LongReadLen. If you want long columns ignored and no data fetched from them (undef returned instead) then you need to set LongReadLen to 0 and LongTruncOk to true.

Handling NULL data

NULL in SQL

Columns that have the NULL value are special. NULL is not the same as an empty string. You need to be careful using comparisons on columns that are nullable. For instance:

Given the table CREATE TABLE fred (a int, b char(1) null):

Col1 | Col2
-----+-----
   1 |    Y
   2 |    N
   3 | NULL

SELECT * FROM fred WHERE col2 = 'Y' returns:

col1 | col2
-----+-----
   1 |    Y

and SELECT * FROM fred WHERE col2 > 'Y' returns:

col1 | col2
-----+-----
   1 |    N

Some people expect the result of the second query to include row 3 because they think col2 > 'Y' includes NULL; it doesn't. In relational operations, unlike binary operations, there are three states that an operation can result in: TRUE, FALSE, and UNKNOWN. The last state is the one that needs a closer look at.

The UNKNOWN state occurs for any logical operation where the result cannot be evaluated as either TRUE or FALSE, and is connected with the NULL concept.

A NULL in a SQL table, can be regarded as a placeholder for missing information, it's not a value as such, just a marker to indicate the lack of information. So in the query for all rows where col2 is not equal to Y, the rows containing a NULL are not returned. This is because the answer to the question "Does the col2 column contain values not equal to Y" returns FALSE, as the column contains the unknown NULL.

The normal way to select NULL columns in SQL syntax is using column is null (the opposite being column is not null). So to find all the rows in table Fred where col2 has a NULL value, use:

select * from fred where col2 is null

NULL in Perl

In Perl DBI, NULL column values are represented by undef (the undefined value). As a result, the following Perl run against the table above:

$sth = $dbh->prepare(q/select * from tut2_19/);
$sth->execute;

my @row;
while(@row = $sth->fetchrow_array) {
        print join(", ", @row), "\n";
}
1, Y
2, N
3,

results in "Use of uninitialised value in join or string at XXX.pl line NNN," if you have warnings turned on (and you should). Use either of the following examples to avoid this:

# print out row
print DBI::neat_list(\@row), "\n";

or

# change all NULLs to empty strings
foreach (@row) {$_ = '' unless defined};

If you're using a parameterised query, you may expect to do:

$sql = q/select * from table where a = ?/;
$stmt = $dbh->prepare($sql);
$stmt->execute(undef);  # pass NULL as parameter

but with some databases this does not work. In ODBC terms, the above code results in a call to the ODBC API SQLBindParameter with a data pointer of NULL and an indicator of SQL_NULL_DATA. Some ODBC drivers and databases do not select all the rows where the column is NULL when this method is used. As shown above, a standard way of selecting a NULL column is where column is null, but you can't parameterise this. This has been the subject of much discussion and one method put forward is:

my $test = undef;
$sql = q/select * from table where (col = ? or (col is null and ? = 1))/;
$stmt->execute($test,defined($test)?0:1);