Apache Spark ODBC driver

The Apache Spark ODBC driver lets you work with Spark data in applications such as Perl, PHP, Excel, and Oracle).

Configuring an ODBC data source

Before the Apache Spark ODBC driver can be used to connect an application to Apache Spark, it's necessary to configure an ODBC data source. An ODBC data source stores the connection details for the target database (in this case, Apache Spark) and the ODBC driver that's required to connect to it (in this case, the Apache Spark ODBC driver).

ODBC data sources are configured in ODBC Data Source Administrator, which is included with Windows.

In ODBC Data Source Administrator:

Choose the System DSN tab, and then choose Add.
In the Create New Data Source dialog box, choose Easysoft ODBC-Apache Spark Driver, and then choose Finish.
Complete the fields in the Easysoft ODBC-Apache Spark Driver DSN Setup dialog box.
In your application, connect to your newly configured data source and run a sample query. For example:
```
select * from MyTable
```

Work with Apache Spark data in Perl

Strawberry Perl is a Perl distribution for Windows that includes the necessary middleware layers (Perl DBI and Perl DBD::ODBC) to enable the Apache Spark ODBC driver to connect your Perl applications to Apache Spark.

Here's a Perl script that retrieves some Apache Spark data:

#!/usr/bin/perl -w
use strict;
use DBI;
my $dbh = DBI-> connect('dbi:ODBC:MyApacheSparkDataSource');

my $sql = "SELECT MyCol FROM MyTable LIMIT 10";

# Prepare the statement.
my $sth = $dbh->prepare($sql)
    or die "Can't prepare statement: $DBI::errstr";

# Execute the statement.
$sth->execute();

my($SparkCol);

# Fetch and display the result set value.
while(($SparkCol) = $sth->fetchrow()){
   print("$SparkCol\n");                   
}

$dbh->disconnect if ($dbh);

Work with Apache Spark data in PHP

Here's a PHP script that retrieves some Apache Spark data:

<?php
  $con = odbc_connect("MyApacheSparkDataSource", "", "");
  $err = odbc_errormsg();
  if (strlen($err) <> 0) {
    echo odbc_errormsg();
  } else {
    $rs2 = odbc_exec($con, "select MyCol from MyTable");
    odbc_result_all($rs2);
    odbc_close($con);
  }
?>

Connecting Excel to Apache Spark

Follow these steps to return data from Apache Spark to Microsoft Excel by using Microsoft Query:

On the Data tab, choose New Query > From Other Source > From ODBC.
Choose the Apache Spark ODBC data source when prompted.
Choose a table from the available data sets.
Choose the Load to return the Apache Spark data to the worksheet.
Note that for large result sets, you may have to have to filter the data using Excel before the data can be returned to the worksheet.

Connect to Apache Spark from Oracle

Create a DG4ODBC init file on your Oracle machine. To do this, change to the %ORACLE_HOME%\hs\admin directory. Create a copy of the file initdg4odbc.ora. Name the new file initspark.ora.
Note In these instructions, replace %ORACLE_HOME% with the location of your Oracle HOME directory. For example, C:\oraclexe\app\oracle\product\11.2.0\server.
Ensure these parameters and values are present in your init file:
```
HS_FDS_CONNECT_INFO = MyApacheSparkDataSource
```
Comment out the line that enables DG4ODBC tracing. For example:
```
#HS_FDS_TRACE_LEVEL = <trace_level>
```

Add an entry to %ORACLE_HOME%\network\admin\listener.ora that creates a SID_NAME for DG4ODBC. For example:

SID_LIST_LISTENER =
 (SID_LIST =
   (SID_DESC=
     (SID_NAME=spark)
     (ORACLE_HOME=%ORACLE_HOME%)
     (PROGRAM=dg4odbc)
   )
 )

Add a DG4ODBC entry to %ORACLE_HOME%\network\admin\tnsnames.ora that specifies the SID_NAME created in the previous step. For example:
```
SPARK =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = oracle_host)(PORT = 1521))
    (CONNECT_DATA =
      (SID = spark)
    )
    (HS = OK)
  )
```
Replace oracle_host with the host name of your Oracle machine.

Start (or restart) the Oracle Listener:

cd %ORACLE_HOME%\bin
lsnrctl stop
lsnrctl start

Connect to your Oracle database in SQL*Plus.

In SQL*Plus, create a database link for the target Apache Spark instance. For example:

CREATE PUBLIC DATABASE LINK SPARKLINK CONNECT TO "mydummyuser" IDENTIFIED BY "mydummypassword" USING 'spark';

Try querying your Apache Spark data. For example:
```
SELECT * FROM "MyTable"@SPARKLINK;
```

Notes

If you have problems connecting to Apache Spark from Oracle, enable DG4ODBC tracing and check the trace files written to the %ORACLE_HOME%\hs\trace directory. To enable DG4ODBC tracing, add the line HS_FDS_TRACE_LEVEL = DEBUG to initspark.ora and then start or restart the Oracle listener. If the trace directory does not exist, create it.
If you enable ODBC Driver Manager tracing, but do not get a trace file or get an empty trace file (check for C:\SQL.log), change the trace file location to the Windows TEMP directory. For example, C:\Windows\Temp\SQL.log.

About Easysoft
Contact us
About us
Clients
Blog
Careers

Legal
Privacy statement

© 1993 - 2025 Easysoft Limited. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other trademarks and registered trademarks appearing on easysoft.com are the property of their respective owners.

Apache Spark ODBC driver

Configuring an ODBC data source

Work with Apache Spark data in Perl

Work with Apache Spark data in PHP

Connecting Excel to Apache Spark

Connect to Apache Spark from Oracle

Notes

About Easysoft

Products

Services

Licensing

Support

Search

Popular Resources

Legal