Never been to DZone Snippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

« Newer Snippets
Older Snippets »
Showing 1-10 of 15 total  RSS 

Processing large CSV files

  # this method allows fast CSV processing, it reads file_name, generates a new
  # CSV file and inserts all data in a mysql table using LOAD DATA INFILE

    # you can use it like this:

    load_csv_data(PRELOAD_DIR+'socios.csv', 'users') do |csv, thing_id, row|
      csv << [thing_id,
        row['NOMBRE'].to_s.to_permalink+'.'+row['APELLIDOS'].to_s.to_permalink, #  login
        row['E_MAIL'], #  email
        row['SEXO'],  #  gender
        (row['NO_DATOS'] == 'S' || row['NO_DATOS'] == 'VERDADERO' ? 1 : 0 ), #  no_datos
      ]
    end


  def load_csv_data(file_name, table_name)
    STDERR.print("\nParsing '#{file_name}' to insert data in '#{table_name}'\n")

    n_lines = (%x[wc -l #{file_name}]).split(' ')[0].to_f
    time_start = Time.now
    begin
      require 'fastercsv'
      count = 0.0

      csv_string = FasterCSV.open("data.csv", "w", { :col_sep => ";", :force_quotes => true })  do |csv|
      FasterCSV.open("#{file_name}", { :col_sep => "\t", :headers => :first_row }).each do |row|
        thing_id = row[0].to_i
        if thing_id && thing_id > 0
          
          yield csv, thing_id, row

          percentage = ((count += 1) * 100.0)/n_lines
          STDERR.print "%.0f%%..." % percentage if (percentage.modulo(2.0) < 0.001)
        end
        end
      end
    rescue EOFError
      f.close
    end
    total_lines = (%x[wc -l data.csv]).split(' ')[0].to_i
    STDERR.print("\nInserting #{total_lines} lines in table '#{table_name}'\n")
    ActiveRecord::Base.connection.execute("LOAD DATA INFILE '#{File.join(File.dirname(__FILE__), '..', '..', 'data.csv')}' IGNORE INTO TABLE #{table_name} FIELDS TERMINATED BY ';' ENCLOSED BY '\"';")
    warn("Finished importing '#{file_name}'.\nLines: #{n_lines.to_i.to_s} (#{(n_lines.to_i - total_lines.to_i).to_s} lost)\nTime: %.2f minutes\n" % ((Time.now - time_start)/60))
  end


QIF to CSV conversion script in Ruby

// Converts QIF files to CSV files

#!/usr/bin/env ruby

require 'rubygems'
require 'fileutils'

if ARGV.size < 1
        puts "Usage: #{$0} file.qif"
        exit
end

input = File.new(ARGV[0])
output = [File.basename(ARGV[0]).split('.')[0..-2], 'csv'].join('.')
output = File.new(output, 'w+')
output.write("date,amount,description,transaction id, address\n")

entries = input.read.split("^\n")
entries.compact
for entry in entries
        e = entry.match(/D(.*)\nT-?(.*)\nP(.*)\nN(.*)\nA(.*)\n/).to_a[1..-1]
        e[1] = e[1].to_f rescue nil
        e[-1] = "\"#{e[-1]}\""
        output.write("#{e.join(',')}\n")
end

How to produce daily database table dumps in CSV format

The following code demonstrates how to produce CSV files with dynamic
file name pattern based on a current day. The produced files have the following naming format:
TABLE_NAME_MM_DD_YYYY.csv

<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
    <properties> <!-- Configure table name -->
        table_name=test
    </properties>
    <connection id="in" driver="auto" url="jdbc:oracle:thin:@localhost:1521:ORCL" 
      classpath="ojdbc14.jar" user="scott" password="tiger"/>
    <connection id="out" driver="csv" url="${table_name}_${etl.date.now('MM_dd_yyyy')}.csv" />
    <query connection-id="in"> <!-- Query table rows -->
        SELECT * FROM ${table_name}
        <script connection-id="out"> <!-- Export each row into a CSV -->
            $ID, $Name, $Surname <!-- Use column names from selected table -->
        </script>
    </query>
</etl>

Use Scriptella ETL to run the example.

See How to execute an ETL file from command line, Ant or directly from Java .

CSV parsing regex

The regular expression is taken from Raimond Brookman, Regex fun with CSV.
For a good general CSV overview see The Comma Separated Value (CSV) File Format.
A complete Ruby CSV parsing library is FasterCSV (sudo gem install fastercsv).



csv_data = <<-EOS

fname,lname,age,salary
nancy,davolio,33,$30000
erin,borakova,28,$25250
tony,raphael,35,$28700

"Date","Pupil","Grade"
"25 May","Bloggs, Fred","C"
"25 May","Doe, Jane","B"
"15 July","Bloggs, Fred","D"

123456789,"Carr, Lisa",100000.00
444556666,"Barr, Clark",87000.00
777227878,"Parr, Jack",123000.00
998877665,"Charr, Lee",123000.00

Conference room 1, "John,  
Please bring the M. Mathers file for review  
-J.L.
"
10/18/2002,...

John,Doe,120 jefferson st.,Riverside, NJ, 08075
Jack,McGinnis,220 hobo Av.,Phila, PA,09119
"John ""Da Man""",Repici,120 Jefferson St.,Riverside, NJ,08075
Stephen,Tyler,"7452 Terrace ""At the Plaza"" road",SomeTown,SD, 91234
,Blankman,,SomeTown, SD, 00298
"Joan ""the bone"", Anne",Jet,"9th, at Terrace plc",Desert City,CO,00123

XXXX,D,3-May-02,83.01,83.58,71.13,78.04,9645300
XXXX,D,2-May-02,82.47,85.76,82.05,83.84,7210000,
XXXX,D,1-May-02,86.80,90.83,81.74,85.50,14253300

"1997",car model,E350
1997,car model,E350,"  Super luxurious truck    "
1997,car model,E350,"Go get one now
they are going fast"
1997,car model,E350,"Super ""luxurious"" truck"
1997,car model,E350,"Super, luxurious truck"

1997,car model,E350,"ac, abs, moon",3000.00
1999, car model,"Venture ""Extended Edition""",,4900.00,
1996, car model,Old Car,"BEYOND REPAIR!
air, moon roof, loaded",4799.00

This,is,a test,CSV, file," from ""http://lorance.freeshell.org/csv/test.csv""."
It contains,"quoted text",and,numbers 1234,5678
It also has,"quoted text with an embedded quote""<- right there"
Then there are a few,,blank fields like these here ->,,,
A quoted blank field,"",<- there.
A quoted blank field with newline,"\n",<- there.
This next one causes an error if newline handling is turned off.
"There is a newline here ->
<- and it should be processed correctly."
ABCD
"And here,,, is an""Error - no"
"And here,,, is an"Error - yes
"And here,,, is an",Error - no

1,2,3
ab,"c,d","e""f", "g"",""","h
jk",kl

"aaa","bbb","ccc"
zzz,yyy,xxx
"aaa","b
bb","ccc"
zzz,yyy,xxx

"aaa","b""bb","ccc"

EOS



csv_data.split(/(,|\r\n|\n|\r)(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/m).each do |csv|
#csv_data.split(/[,\n\r]+(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/m).each do |csv|

   next if csv.empty?

   csv = csv.strip

   if csv =~ /\A(".*[^"]|[^"].*")\z/m then     # examples: csv => "ab\nc"def  or  abc"de\nf"
       puts
       puts "Error:" 
       p csv 
       puts csv[/\A./mu], csv[/.\z/mu] 
       #puts csv[0..0], csv[-1..-1] 
       puts
       next
   end

   if csv =~ /\A".*"\z/m then csv.gsub!(/\A"(.*)"\z/m, '\1') end  # remove double-quotes at string beginning & end
   if csv =~ /""/m then csv.gsub!(/""/m, '"') end                 # remove a double-quote from double double-quotes

   p csv

end



PHP : Exportar CSV a mySQL / Export CSV to mySQL

Cóodigo fuente / Source code :

function exportarCSV_a_mySQL($fileCSV)
{
	$registros=0;

	$ruta=$fileCSV['tmp_name'];

	if(!file_exists($ruta))
	{return false;}

	$tabla=quitar_extension($fileCSV['name']);
	
	$borra_tabla="DROP TABLE `".$tabla."`";
	mysql_query($borra_tabla);
	$f=fopen($ruta,"r");
	if($f)
	{
		echo "<b>Guardando CSV en la BDD :</b><br />";
		$contenido=fread($f,filesize($ruta));
		fclose($f);
		$contenido=ereg_replace("\r\n", "\n" , $contenido); // convertimos windows a unix
		$lineas=explode("\n",$contenido);
		$titulo=explode(";",$lineas[0]);
		$NUM_CAMPOS=count($titulo);
		$sql_generado_para_eliminar="";
		$crear_tabla_campos="";
		for($i=0;$i<$NUM_CAMPOS;$i++)
		{
		$titulo[$i]=ereg_replace("\"", "" , $titulo[$i]); // kitamos comillas
		$sql_generado_para_eliminar.=" AND `".$titulo[$i]."` =''";
		$crear_tabla_campos.="`".$titulo[$i]."` varchar(60) NOT NULL";
			if($i+1!=$NUM_CAMPOS)// si no es el ultimo , ponemos coma
			{
			$crear_tabla_campos.=",";
			}
		}
		$crear_tabla="CREATE TABLE `".$tabla."` (".$crear_tabla_campos.") ENGINE=MyISAM DEFAULT CHARSET=latin1;";
		mysql_query($crear_tabla);
		$linea=1;
		do
		{
			$insertar_titulos="";
			$insertar_campos="";
			$campo=explode(";",$lineas[$linea]);
			for($i=0;$i<$NUM_CAMPOS;$i++)
			{
			$campo[$i]=ereg_replace("\"", "" , $campo[$i]);
			$insertar_titulos.=" `".$titulo[$i]."` ";
			$insertar_campos.=" '".$campo[$i]."' ";
				if($i+1!=$NUM_CAMPOS)// si no es el ultimo , ponemos coma
				{
				$insertar_titulos.=",";
				$insertar_campos.=",";
				}
			}
			$sql="INSERT INTO `".$tabla."` ( ".$insertar_titulos." ) VALUES ( ".$insertar_campos." );";
			if(mysql_query($sql))
			{
			echo ". ";
			$registros++;
			}
			else
			{echo "X ";return false;}
		$linea++;
		}while(next($lineas));

	$sql="DELETE FROM `".$tabla."` WHERE 1".$sql_generado_para_eliminar;mysql_query($sql);
	echo "<br />";
	return $tabla;
	}
	else
	{
	return false;
	}
}

function quitar_extension($archivo)
{
	$extension = strrchr($archivo,".");
	$pos=strpos($archivo,$extension);
	return substr($archivo,0,$pos);
}


Código ejemplo de llamada / Code exmple call :

$tabla = exportarCSV_a_mySQL($_FILES['archivo_csv']);
if($tabla)
{
echo "Export OK in mysql table : ".$tabla;
}
else
{
echo "Error in export ...";
}

Turn CSV with headers into Array of Hashes (in 5 lines or less)

This assumes you have a CSV file whose first line are headings/labels for the individual columns.

require 'csv'

csv_data = CSV.read 'data.csv'
headers = csv_data.shift.map {|i| i.to_s }
string_data = csv_data.map {|row| row.map {|cell| cell.to_s } }
array_of_hashes = string_data.map {|row| Hash[*headers.zip(row).flatten] }

Ruby CSV to XML Converter

This code will take an input CSV file and output XML. IT was easy to write, but I haven't found anything out there to do this. The first line of the CSV file should contain the element names.

#!/usr/bin/ruby

require 'csv'

print "CSV file to read: "
input_file = gets.chomp

print "File to write XML to: "
output_file = gets.chomp

print "What to call each record: "
record_name = gets.chomp

csv = CSV::parse(File.open(input_file) {|f| f.read} )
fields = csv.shift

puts "Writing XML..."

File.open(output_file, 'w') do |f|
  f.puts '<?xml version="1.0"?>'
  f.puts '<records>'
  csv.each do |record|
    f.puts " <#{record_name}>"
    for i in 0..(fields.length - 1)
      f.puts "  <#{fields[i]}>#{record[i]}</#{fields[i]}>"
    end
    f.puts " </#{record_name}>"
  end
  f.puts '</records>'
end # End file block - close file

puts "Contents of #{input_file} written as XML to #{output_file}."

Load CSV data into a database (Scriptella ETL tool)

This example demonstrates usage of Scriptella ETL Tool to load CSV data into a database table.

Input CSV file data.csv:
id,priority,summary,status
1,Critical,NullPointerException in Main class,Open
5,Low,"Checkstyle, PMD, Findbugs issues",Reopened
7,Low,Maven integration,Open
10,High,SPI API,Closed

The CSV loading script has the following content:
<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
  <connection id="in" driver="csv" url="data.csv"/>
  <connection id="out" driver="oracle" url="jdbc:oracle:thin:@localhost:1521:ORCL" 
      classpath="ojdbc14.jar" user="scott" password="tiger"/>
  <!-- Copy all CSV rows to a database table -->
  <query connection-id="in">
      <!-- Empty query means select all columns -->
      <script connection-id="out">
          INSERT INTO Table_Name VALUES (?id,?priority, ?summary, ?status)
      </script>
  </query>
</etl>


Use RegEx to filter CSV data:
<query connection-id="in">
    <!--Select bugs with status open or reopened.-->
    ,,,open|reopened
    <!--Inserts imported rows into a database-->
    <script connection-id="out">
       INSERT INTO Table_Name VALUES (?id, ?priority, ?summary, ?status);
    </script>
</query>

CSV Parser / Writer for PHP

CSV Parser / Writer

Example A:

//cell separator, row separator, value enclosure
$csv = new CSV(';', "\r\n", '"');

//parse the string content
$csv->setContent(file_get_contents('data.csv'));

//returns an array with the CSV data
print_r($csv->getArray());



Exemple B:

$csv = new CSV(';', "\r\n", '"');
//sets up the content through an array
$csv->setArray(
	array(
		array('col"una1', "colu\r\nna2"),
		array('col;una3', 'coluna4')
	)
);
//retorns string with the CSV representation
print $csv->getContent();



<?php
//+ Jonas Raoni Soares Silva
//@ http://jsfromhell.com
class CSV{
	var $cellDelimiter;
	var $valueEnclosure;
	var $rowDelimiter;

	function CSV($cellDelimiter, $rowDelimiter, $valueEnclosure){
		$this->cellDelimiter = $cellDelimiter;
		$this->valueEnclosure = $valueEnclosure;
		$this->rowDelimiter = $rowDelimiter;
		$this->o = array();
	}
	function getArray(){
		return $this->o;
	}
	function setArray($o){
		$this->o = $o;
	}
	function getContent(){
		if(!(($bl = strlen($b = $this->rowDelimiter)) && ($dl = strlen($d = $this->cellDelimiter)) && ($ql = strlen($q = $this->valueEnclosure))))
			return '';
		for($o = $this->o, $i = -1; ++$i < count($o);){
			for($e = 0, $j = -1; ++$j < count($o[$i]);)
				(($e = strpos($o[$i][$j], $q) !== false) || strpos($o[$i][$j], $b) !== false || strpos($o[$i][$j], $d) !== false)
				&& $o[$i][$j] = $q . ($e ? str_replace($q, $q . $q, $o[$i][$j]) : $o[$i][$j]) . $q;
			$o[$i] = implode($d, $o[$i]);
		}
		return implode($b, $o);
	}
	function setContent($s){
		$this->o = array();
		if(!strlen($s))
			return true;
		if(!(($bl = strlen($b = $this->rowDelimiter)) && ($dl = strlen($d = $this->cellDelimiter)) && ($ql = strlen($q = $this->valueEnclosure))))
			return false;
		for($o = array(array('')), $this->o = &$o, $e = $r = $c = 0, $i = -1, $l = strlen($s); ++$i < $l;){
			if(!$e && substr($s, $i, $bl) == $b){
				$o[++$r][$c = 0] = '';
				$i += $bl - 1;
			}
			elseif(substr($s, $i, $ql) == $q){
				$e ? (substr($s, $i + $ql, $ql) == $q ?
				$o[$r][$c] .= substr($s, $i += $ql, $ql) : $e = 0)
				: (strlen($o[$r][$c]) == 0 ? $e = 1 : $o[$r][$c] .= substr($s, $i, $ql));
				$i += $ql - 1;
			}
			elseif(!$e && substr($s, $i, $dl) == $d){
				$o[$r][++$c] = '';
				$i += $dl - 1;
			}
			else
				$o[$r][$c] .= $s[$i];
		}
		return true;
	}
}
?>

Snippet to grab historical data for stocks

This is a quick little snippet I'm whipping up to import some historical data in for a graphing app we're building for stock data. I figured I'd post this snippet before I maul it into something application-specific...

It downloads a csv file through yahoo's finance site and then parses it and prints out the date and adjusted close for each business day that has data.

require 'open-uri'
require 'csv'

def get_adjusted_close stock_symbol
  puts "-- #{stock_symbol} Adjusted Close - Historical --"
  url = "http://ichart.finance.yahoo.com/table.csv?s=#{stock_symbol}&d=7&e=1&f=2006&g=d&a=2&b=26&c=1990&ignore=.csv"
  puts "Connecting to #{url}\n"

  csv = CSV.parse(open(url).read)

  csv.each{|row|
    puts "#{row[0]} - #{row.last}"
  }
  puts "---------------------------------"
end

example_stocks = "CSCO GOOG"
print "Enter a series of stock symbols separated by spaces (example: #{example_stocks}) to retrieve the historical adjusted close.\n"
stock_symbols = gets
stock_symbols ||= example_stocks

stock_symbols.split.each{|symbol|
  get_adjusted_close(symbol)
}
« Newer Snippets
Older Snippets »
Showing 1-10 of 15 total  RSS