The regular expression is taken from Raimond Brookman,
Regex fun with CSV.
For a good general CSV overview see
The Comma Separated Value (CSV) File Format.
A complete Ruby CSV parsing library is
FasterCSV (sudo gem install fastercsv).
csv_data = <<-EOS
fname,lname,age,salary
nancy,davolio,33,$30000
erin,borakova,28,$25250
tony,raphael,35,$28700
"Date","Pupil","Grade"
"25 May","Bloggs, Fred","C"
"25 May","Doe, Jane","B"
"15 July","Bloggs, Fred","D"
123456789,"Carr, Lisa",100000.00
444556666,"Barr, Clark",87000.00
777227878,"Parr, Jack",123000.00
998877665,"Charr, Lee",123000.00
Conference room 1, "John,
Please bring the M. Mathers file for review
-J.L.
"
10/18/2002,...
John,Doe,120 jefferson st.,Riverside, NJ, 08075
Jack,McGinnis,220 hobo Av.,Phila, PA,09119
"John ""Da Man""",Repici,120 Jefferson St.,Riverside, NJ,08075
Stephen,Tyler,"7452 Terrace ""At the Plaza"" road",SomeTown,SD, 91234
,Blankman,,SomeTown, SD, 00298
"Joan ""the bone"", Anne",Jet,"9th, at Terrace plc",Desert City,CO,00123
XXXX,D,3-May-02,83.01,83.58,71.13,78.04,9645300
XXXX,D,2-May-02,82.47,85.76,82.05,83.84,7210000,
XXXX,D,1-May-02,86.80,90.83,81.74,85.50,14253300
"1997",car model,E350
1997,car model,E350," Super luxurious truck "
1997,car model,E350,"Go get one now
they are going fast"
1997,car model,E350,"Super ""luxurious"" truck"
1997,car model,E350,"Super, luxurious truck"
1997,car model,E350,"ac, abs, moon",3000.00
1999, car model,"Venture ""Extended Edition""",,4900.00,
1996, car model,Old Car,"BEYOND REPAIR!
air, moon roof, loaded",4799.00
This,is,a test,CSV, file," from ""http://lorance.freeshell.org/csv/test.csv""."
It contains,"quoted text",and,numbers 1234,5678
It also has,"quoted text with an embedded quote""<- right there"
Then there are a few,,blank fields like these here ->,,,
A quoted blank field,"",<- there.
A quoted blank field with newline,"\n",<- there.
This next one causes an error if newline handling is turned off.
"There is a newline here ->
<- and it should be processed correctly."
ABCD
"And here,,, is an""Error - no"
"And here,,, is an"Error - yes
"And here,,, is an",Error - no
1,2,3
ab,"c,d","e""f", "g"",""","h
jk",kl
"aaa","bbb","ccc"
zzz,yyy,xxx
"aaa","b
bb","ccc"
zzz,yyy,xxx
"aaa","b""bb","ccc"
EOS
csv_data.split(/(,|\r\n|\n|\r)(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/m).each do |csv|
next if csv.empty?
csv = csv.strip
if csv =~ /\A(".*[^"]|[^"].*")\z/m then
puts
puts "Error:"
p csv
puts csv[/\A./mu], csv[/.\z/mu]
puts
next
end
if csv =~ /\A".*"\z/m then csv.gsub!(/\A"(.*)"\z/m, '\1') end
if csv =~ /""/m then csv.gsub!(/""/m, '"') end
p csv
end