Using ODBC Data in RapidMiner
The RapidMiner distribution includes a generic connector that enables data sets to be constructed from any database for which an ODBC driver is available. The connector is called the JDBC-ODBC Bridge and is an integral part of Java 1.7 and earlier. The implications of that are:
- Version 1.8 and later versions of Java do not include the native JDBC-ODBC Bridge.
- The native bridge is included in a core Java Archive file, it cannot be separated from the Java distribution it came with.
If you need to connect RapidMiner to a 32-bit ODBC driver, use the Easysoft JDBC-ODBC Bridge instead. The Easysoft JDBC-ODBC Bridge has a Java component and a 32-bit native component. It is fully supported by Easysoft and is currently being deployed in production environments throughout the world.
You need to use the Professional version of RapidMiner to use third party drivers such as the Easysoft JDBC-ODBC Bridge with it. A trial version of RapidMiner Professional is available to users who create an account on the RapidMiner web site.
If you do not already have RapidMiner Professional, follow these steps:
- Create a RapidMiner account and then verify the email address you supplied when creating the account.
- Download RapidMiner Studio for your Windows platform.
- Start RapidMiner and then log in when prompted.
Your RapidMiner Professional trial period has now started.
To connect RapidMiner to some ODBC data:
- Configure a System ODBC data source for the database that you want to connect to in RapidMiner.
To do this, use the 32-bit version of ODBC Data Source Administrator on your RapidMiner machine. On some versions of Windows, this is located in Control Panel > Administrative Tools. On some version of Windows, you need to search for ODBC in the taskbar search box. The 32-bit version of ODBC Data Source Administrator should be clearly labelled. If in doubt, in the Windows Run dialog box, type:
%windir%\syswow64\odbcad32.exe
We wanted to try RapidMiner with some Microsoft Access data, and so we created an Access ODBC driver data source that pointed to the Northwind database.
- Download the Easysoft JDBC-ODBC Bridge. (Registration required.)
- Install and license the Easysoft JDBC-ODBC Bridge on the machine where the RapidMiner is installed.
For installation instructions, see the Easysoft JDBC-ODBC Bridge documentation.
- In RapidMiner, choose Connections > Manage Database Drivers.
The Manage Database Drivers dialog box is displayed.
- Choose Add.
- Complete the dialog box fields:
Field Value Name Easysoft JDBC-ODBC Bridge URL prefix jdbc:easysoft:// Port (Leave blank) Schema separator (Leave blank) Jar file easysoft_install_dir\Jars\EJOB.jar On 64-bit Windows, the default location for <easysoft_install_dir> is <drive>:\Program Files (x86)\Easysoft Limited\Easysoft JDBC-ODBC Bridge.
On 32-bit Windows, the default location for <easysoft_install_dir> is <drive>:\Program Files\Easysoft Limited\Easysoft JDBC-ODBC Bridge.
JDBC Class easysoft.sql.jobDriver - Choose Save.
- Choose Connections > Manage Database Connections.
The Manage Database Connections dialog box is displayed.
- Choose New.
- Complete the dialog box fields:
Field Value Name JDBC-ODBC Bridge Database system Easysoft JDBC-ODBC Bridge Host localhost/ Port (Leave blank) Dabase scheme my_odbc_data_source - Choose Advanced
The Advanced Connection Properties dialog box is displayed.
- Complete the dialog box fields:
Field Value logonuser my_windows_user logonpassword my_windows_password Select Override for both these fields.
- Use the Test button to verify that you have correctly entered the connection details.
To analyse the ODBC data made accessible via the JDBC-ODBC Bridge, create a new RapidMiner process:
- Choose File > New Process.
- Choose a blank process when prompted.
- Use the Repository pane to locate the ODBC data. We used the JDBC-ODBC Bridge to expose some Access data in the Northwind database: DB > JDBC-ODBC Bridge > Example Sets > Customers.
- In Design view, drag a table from the Repository pane to the Process pane. In our case, the table was Customers.
- Drag a Blending > Values > Set Data operator from the Operators pane to the Process pane.
- Drag the output of the retrieve operator to the Set Data operator.
- Use the Set Data parameters tab to alter a column value. For example, we altered a job title in our customer data from Sales Representative to Sales Manager:
Field Value Example index 1 Attribute name CustomerID Value ALFKI Additional values Edit List (0) ... > attribute name ContactTitle Edit List (0) ... > value Sales Manager
- Drag the output of the Set Data operator to the res button and the choose Play.
The modified data is displayed in the Results view.