Never been to DZone Snippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

« Newer Snippets
Older Snippets »
Showing 1-8 of 8 total  RSS 

Using JavaScript for ETL transformations

Scriptella provides a simple way to perform various transformations in JavaScript (or other scripting language which have a corresponding driver).
Our example transformation consists of 3 steps:
1) Select rows from source table.
2) Transform a column value from number to text
3) Insert a transformed value into a destination table.


<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
    <connection id="db" driver="auto" url="jdbc:hsqldb:mem:tst" user="sa" password="" classpath="../lib/hsqldb.jar"/>
    <connection id="js" driver="script"/> 
    <connection id="log" driver="text"/> <!-- For printing debug information on the console -->

    <script connection-id="db">
        CREATE TABLE Table_In (
            Error_Code INT
        );
        CREATE TABLE Table_Out (
            Error VARCHAR(10)
        );
        
        INSERT INTO Table_IN VALUES (1);
        INSERT INTO Table_IN VALUES (7);
    </script>

    <query connection-id="db">
        SELECT * FROM Table_In
        <script connection-id="log">
            Transforming $Error_Code
        </script>
        <!-- Transformation is described as an enclosing query
         which is executed before nested elements -->
        <query connection-id="js"> 
            <![CDATA[
               if (Error_Code < 5) {
                    Error_Code='WARNING'; //Set a transformed value
               } else {
                    Error_Code='ERROR'; //Set a transformed value
               }
               query.next(); //Don't forget to trigger nested scripts execution
            ]]>
            <script connection-id="db">
                <!-- Insert transformed value -->
                INSERT INTO Table_Out VALUES (?Error_Code); 
            </script>
            <script connection-id="log">
                Transformed to $Error_Code
            </script>
        </query>
    </query>
</etl>

How to execute Scriptella ETL files

Scriptella ETL provides several ways to execute ETL files:

Invocation from Ant
<taskdef resource="antscriptella.properties" classpath="/path/to/scriptella.jar[;additional_drivers.jar]"/>
<etl file="path/to/etl/file/> <!-- Execute ETL file from specified location -->

Command-Line Execution
Just type scriptella to run the file named etl.xml in the current directory. Alternatively you can use Java launcher:
java -jar scriptella.jar [arguments]

Executing ETL Files from Java
It is extremely easy to run Scriptella ETL files from java code. Just make sure scriptella.jar is on classpath and use any of the following methods to execute an ETL file:
EtlExecutor.newExecutor(new File("etl.xml")).execute(); //Execute etl.xml file
EtlExecutor.newExecutor(getClass().getResource("etl.xml")).execute(); //Execute etl.xml file loaded from classpath
EtlExecutor.newExecutor(
    servletContext.getResource("/WEB-INF/etl.xml")).execute(); //Execute etl.xml file from web application WEB-INF dir

Integration with Spring Framework
<beans>
    <!-- Spring beans declarations -->

    <!-- Spring managed bean which executes etl.xml file -->
    <bean id="executor" class="scriptella.driver.spring.EtlExecutorBean">
        <property name="configLocation" value="etl.xml"/>
    </bean>
</beans>

The usage of executor is straightforward:
EtlExecutor exec = (EtlExecutor) beanFactory.getBean("executor");
exec.execute();

See Spring Driver JavaDoc for additional details.

Splitting large Scriptella ETL files

The following example demonstrates how to split a large Scriptella ETL file into several parts. This example is based on a traditional XML parsed entities approach:

<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd"
[
    <!-- Declaring the first external parsed entity to include -->
    <!ENTITY part1 SYSTEM "part1.xml">
    
    <!-- Declaring the second external parsed entity to include -->
    <!ENTITY part2 SYSTEM "part2.xml">
]>
<etl>
    <connection driver="text"/>

    <!-- Including file #1 -->
    &part1;

    <script>
        content of the script
    </script>
    
    <!-- Including file #2 -->
    &part2;

</etl>

convert apache http combined logs into sql (and import it into a mysql database eventually)

you need to extract the data in your http server log files and put it in a database to query it with your usual tools using SQL. this perl script does just this.

it was hard to find it, that's why i put it here.

#!/usr/bin/perl -w
# Written by Aaron Jenson.
# Original source: http://www.visualprose.com/software.php
# Updated to work under Perl 5.6.1 by Edward Rudd
# Updated 24 march 2007 by Slim Amamou <slim.amamou@alpha-studios.com>
#  - output SQL with the option '--sql'
#  - added SQL create table script to the HELP
#
#  NOTE : you need the TimeDate library (http://search.cpan.org/dist/TimeDate/)
#
use strict;
use Getopt::Long qw(:config bundling);
use DBI;
use Date::Parse;

my %options = ();
my $i = 0;
my $sql = '';
my $valuesSql = '';
my $line = '';
my $dbh = 0;
my $sth = 0;
my @parts = ();
my $part;
my $TIMESTAMP = 3;
my $REQUEST_LINE = 4;
my @cols = (
	'remote_host',			## 0
	'remote_logname',		## 1
	'remote_user',			## 2
	'request_time',			## 3.string
	'time_stamp',			## 3.posix
	'request_line',			## 5
	'request_method',		## 6
	'request_uri',			## 7
	'request_args',			## 8
	'request_protocol',		## 9
	'status',				## 10
	'bytes_sent',			## 11
	'referer',				## 12
	'agent'					## 13
);
my $col = '';

GetOptions (\%options,
		"version" => sub { VERSION_MESSAGE(); exit 0; },
		"help|?" => sub { HELP_MESSAGE(); exit 0; },
		"host|h=s",
		"database|d=s",
		"table|t=s",
		"username|u=s",
		"password|p=s",
		"logfile|f=s",
		"sql");

$options{host} ||= 'localhost';
$options{database} ||= '';
$options{username} ||= '';
$options{password} ||= '';
$options{logfile} ||= '';
$options{sql} ||= '';

if( ! ($options{database} || $options{sql}))
{
	HELP_MESSAGE();
	print "Must supply a database to connect to.\n";
	exit 1;
}

if( ! $options{table} )
{
	HELP_MESSAGE();
	print "Must supply table name.\n";
	exit 1;
}

if( $options{logfile} )
{
	if( ! -e $options{logfile} )
	{
		print  "File '$options{logfile}' doesn't exist.\n";
		exit 1;
	}
	open(STDIN, "<$options{logfile}") || die "Can't open $options{logfile} for reading.";
}

if( $options{database} )
{
	$dbh = Connect();
	if (! $dbh) {
		exit 1;
	}
}

$sql = "INSERT INTO $options{table} (";
foreach $col (@cols)
{
	$sql .= "$col," if( $col );
}
chop($sql);
$sql .= ') VALUES (';
my ($linecount,$insertcount) = (0,0);
while($line = <STDIN>)
{
	$linecount++;
	@parts = SplitLogLine( $line );
	next if( $parts[$TIMESTAMP+1] == 0 );
	$valuesSql = '';
	for( $i = 0; $i < @cols; ++$i )
	{
		$parts[$i] =~ s/\\/\\\\/g;
		$parts[$i] =~ s/'/\\'/g;
		$valuesSql .= "'$parts[$i]'," if( $cols[$i] );
	}
	chop($valuesSql);

	if( $options{database} )
	{
		$sth  = $dbh->prepare("$sql$valuesSql)");
		if( ! $sth->execute() )
		{
			print "Unable to perform specified query.\n$sql$valuesSql\n" . $sth->errstr() . "\n";
		} else {
			$insertcount++;
		}
		$sth->finish();
	}
	if( $options{sql} )
	{
		print "$sql$valuesSql);\n";
	}
}
if( ! $options{sql} )
{
	print "Parsed $linecount Log lines\n";
	print "Inserted $insertcount records\n";
	print "to table '$options{table}' in database '$options{database}' on '$options{host}'\n";
}

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 
# Connects to a MySQL database and returns the connection.
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 
sub Connect
{
	my $dsn = "DBI:mysql:$options{database};hostname=$options{host}";
	return DBI->connect( $dsn, $options{username}, $options{password} );
}


# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 
# Splits up a log line into its parts.
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 
sub SplitLogLine
{
	my $line = shift;
	my $i = 0;
	my $inQuote = 0;
	my $char = '';
	my $part = '';
	my @parts = ();
	my $count = 0;
	chomp($line);
	for( $i = 0; $i < length($line); ++$i )
	{
		$char = substr($line, $i, 1);
		if( $char eq ' ' && ! $inQuote )
		{
			## print "Found part $part.\n";
			if( $count == $TIMESTAMP )
			{
				push(@parts, "[".$part."]");
				$part = str2time($part);
			}
			push(@parts, $part);
			if( $count == $REQUEST_LINE )
			{
				my @request = split(/[ ?]/, $part);
				push(@parts, $request[0]);
				push(@parts, $request[1]);
				if( $request[3] )
				{
					push(@parts, $request[2]);
					push(@parts, $request[3]);
				}
				else
				{
					push(@parts, '');
					push(@parts, $request[2]);
				}
				$count += 5;
			}
			else
			{
				++$count;
			}
			$part = '';
		}
		elsif( $char eq '"' || $char eq '[' || $char eq ']' )
		{
			$inQuote = !$inQuote;
		}
		else
		{
			$part .= $char;
		}
	}
	push(@parts,$part) if $part;

	return @parts;
}


# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 
# Prints the usage/help message for this program.
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 
sub HELP_MESSAGE
{
	print<<EOF;
Imports an Apache combined log into a MySQL database.
Usage: mysql_import_combined_log.pl -d <database name> -t <table name> [-h <hostname>] [-u <username>] [-p <password>] [-f <filename]
 --host|-h <host name>         The host to connect to.  Default is localhost.
 --database|-d <database name> The database to use.  Required.
 --username|-u <username>      The user to connect as.
 --password|-p <password>      The user's password.
 --table|-t <table name>       The name of the table in which to insert data.
 --logfile|-f <file name>      The file to read from.  If not given, data is read from stdin.
 --sql                         Output SQL
 --help|-?                     Print out this help message.
 --version                     Print out the version of this software.

----------------------------------
-- SQL create statements for the table
--

create table <TABLE_NAME> (
    remote_host varchar(50) ,
    remote_logname varchar(50) ,
    remote_user varchar(50) ,
    request_time char(28),
    time_stamp varchar(10) ,
    request_line varchar(255),
    request_method varchar(10) ,
    request_uri varchar(255),
    request_args varchar(255),
    request_protocol varchar(10) ,
    status varchar(10) ,
    bytes_sent varchar(10) ,
    referer varchar(255) ,
    agent varchar(255)
);

EOF
}



# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 
# Prints the version information for this program
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 
sub VERSION_MESSAGE
{
	print "mysql_import_combined_log.pl version 1.2\n";
	print "Version 1.0 Written by Aaron Jenson.\n";
	print "Update to work with perl 5.6.1 by Edward Rudd\n";
}

1;

Importing XML into a database with Scriptella ETL

The following Scriptella ETL simple usage example imports RSS file into a database table.
<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
    <connection id="in" driver="xpath" url="http://snippets.dzone.com/rss"/>
    <connection id="db" driver="hsqldb" url="jdbc:hsqldb:db/rss" user="sa" classpath="hsqldb.jar"/>
classpath="hsqldb.jar"/>
    <query connection-id="in">
        /rss/channel/item
        <script connection-id="db">
            INSERT INTO Rss (ID, Title, Description, Link) 
            VALUES (?rownum, ?title, ?description, ?link);
        </script>
    </query>
</etl>


Here is the full version of the example described above. It creates an RSS table, downloads rss file, inserts rss records into a database, converts rss.xml to a plain text file and saves it to rss.txt.
<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
    <connection id="in" driver="xpath" url="http://snippets.dzone.com/rss"/>
    <connection id="out" driver="text" url="rss.txt"/>
    <connection id="db" driver="hsqldb" url="jdbc:hsqldb:db/rss" user="sa" classpath="hsqldb.jar"/>
    <script connection-id="db">
       CREATE TABLE Rss (
           ID Integer,
           Title VARCHAR(255),
           Description VARCHAR(255),   
           Link VARCHAR(255)

       )
    </script>
    <query connection-id="in">
        /rss/channel/item
        <script connection-id="out">
            Title: $title
            Description: [
            ${description.substring(0, 20)}...
            ]
            Link: $link
            ----------------------------------
        </script>
        <script connection-id="db">
            INSERT INTO Rss (ID, Title, Description, Link) 
            VALUES (?rownum, ?title, ?description, ?link);
        </script>
    </query>
</etl>

Script to insert BLOB from file into a database

Scriptella ETL allows inserting files into a database. This is achieved by a simple bind variables extension syntax ?{file ...}.
The following sample initializes table of music tracks. Each track has a DATA field containing a file loaded from an external location. File song1.mp3 is stored in the same directory as etl.xml and song2.mp3 is loaded from the web:
    <!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
    <etl>
        <connection driver="hsqldb" url="jdbc:hsqldb:file:tracks" user="sa" classpath="hsqldb.jar"/>
        <script>
            CREATE TABLE Track (
              ID INT,
              ALBUM_ID INT,
              NAME VARCHAR(100),
              DATA LONGVARBINARY
            );
            <!-- Inserts file with path relative to ETL script location -->
            INSERT INTO Track(id, album_id, name, data) VALUES
                   (1, 1, 'Song1.mp3', ?{file 'song1.mp3'});
            <!-- Inserts file from an external URL-->
            INSERT INTO Track(id, album_id, name, data) VALUES
                   (2, 2, 'Song2.mp3', ?{file 'http://musicstoresample.com/song2.mp3'});
        </script>
    </etl>

Copy table from one database to another

This Scriptella ETL script copies all rows from Src_Table to Dest_Table.
Src_Table contains the following columns: id, first_name, last_name
Dest_Table contains the following columns: id, name
The name column of the Dest_Table is produced by a concatenation of first_name and last_name from the Src_Table
This example demonstrates HSQLDB-To-Oracle copy procedure, although it works between virtually any databases.
<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
  <connection id="in" driver="hsqldb" url="jdbc:hsqldb:file:demo" 
              classpath="hsqldb.jar" user="sa"/>
  <connection id="out" driver="oracle" url="jdbc:oracle:thin:@localhost:1521:ORCL" 
              classpath="ojdbc14.jar" user="scott" password="tiger"/>
  <!-- Copy all table rows from one to another database -->
  <query connection-id="in">
      SELECT * FROM Src_Table --Selects all rows
      <!-- For each row executes insert -->  
      <script connection-id="out"> 
          INSERT INTO Dest_Table(ID, Name) 
          VALUES (?id,?{first_name+' '+last_name})
      </script>
  </query>
</etl>

Scriptella script to shutdown HSQLDB server

The following snippet performs shutdown of HSQLDB database.

<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
    <connection driver="hsqldb" url="jdbc:hsqldb:hsql://127.0.0.1/mydb" user="sa" classpath="hsqldb.jar"/>
    <script>
        SHUTDOWN;
    </script>
</etl>


Running Scriptella from Ant is simple:
<taskdef classpath="scriptella.jar" resource="antscriptella.properties" />
<etl file="file_path"/>


Command line launcher is even simpler:
scriptella [file_path]


« Newer Snippets
Older Snippets »
Showing 1-8 of 8 total  RSS