Never been to DZone Snippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

« Newer Snippets
Older Snippets »
Showing 1-10 of 16 total  RSS 

Split array or string into smaller arrays

Splits an array or string into smaller arrays of the given size. Inspired on the ruby one ;).

import math
# v = value to split, l = size of each chunk
f = lambda v, l: [v[i*l:(i+1)*l] for i in range(int(math.ceil(len(v)/float(l))))]


Example

>>> f('000000111111222222333333444444567', 6)
['000000', '111111', '222222', '333333', '444444', '567']

>>> f(tuple('000000111111222222333333444444567'), 6)
[('0', '0', '0', '0', '0', '0'), ('1', '1', '1', '1', '1', '1'), ('2', '2', '2', '2', '2', '2'), ('3', '3', '3', '3', '3', '3'), ('4', '4', '4', '4', '4', '4'), ('5', '6', '7')]


Long live python :). Of course, a lisp/haskell implementation probably blows this snippet out.

Very simple ruby code to chunk a string

I wanted to chunk a string and saw lots of 100 line examples. This is the trivial thing I came up with. Does have a small bug when the string length is an exact multiple of the len variable (as in this example!) that adds an extra empty string on the end.

But it worked for what I needed.
  string = "some long test string or other"
  r = []
  len = 15
  start = 0
  while(start <= string.length) do                                              
    r << string[start...start+len] 
    start += len 
  end
  r # ["some long test ", "string or other", ""]

Split Apache logs according to GeoIP country

// Split Apache logs according to GeoIP country

#!/usr/bin/perl

# $Id$

# Split Apache logs according to GeoIP country

use strict;
use warnings;

## no critic (ValuesAndExpressions::RequireInterpolationOfMetachars)
our ($VERSION) = '$Revision$' =~ m{ \$Revision: \s+ (\S+) }xms;
## use critic

use Geo::IP;

my $gi = Geo::IP->open('/usr/local/share/GeoIP/GeoIPCity.dat', GEOIP_STANDARD);

my @logs = @ARGV;

my %record_for;

foreach my $log (@logs) {
    die "Can't read $log\n" if !-r $log;
    
    my %fh_for;
    my $num_lines_parsed = 0;
    
    my $log_fh;
    if ($log =~ m/ \.gz \z /xms) {
        open $log_fh, "gzip -cd $log |" or die "Can't open gzip pipe\n";
    }
    else {
        open $log_fh, '<', $log or die "Can't open $log\n";
    }
    
    my $log_base = $log;
    $log_base =~ s/ \.gz \z //xms;
    
    while (my $line = <$log_fh>) {
        $num_lines_parsed++;
        if (!($num_lines_parsed % 1000)) {
            print STDERR "Parsed $num_lines_parsed lines of $log\n";
        }
        
        my ($host) = $line =~ m/ \A (\S+) \s /xms;
        
        if (!exists $record_for{$host}) {
            my $record = $gi->record_by_name($host);
            $record_for{$host} = $record || 0;
        }
        
        my $country = 'unknown';
        if (exists $record_for{$host} && $record_for{$host}) {
            $country = lc($record_for{$host}->country_name());
            $country =~ s/\W+/_/gxms;
        }
        
        if (!exists $fh_for{$country}) {
            open $fh_for{$country}, '>', "$log_base.$country.out"
                or die "Can't write to $log_base.$country.out\n";
        }
        
        print {$fh_for{$country}} $line;
    }
    
    foreach my $fh (values %fh_for) {
        close $fh;
    }
    
    close $log_fh;
}

Splitting large Scriptella ETL files

The following example demonstrates how to split a large Scriptella ETL file into several parts. This example is based on a traditional XML parsed entities approach:

<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd"
[
    <!-- Declaring the first external parsed entity to include -->
    <!ENTITY part1 SYSTEM "part1.xml">
    
    <!-- Declaring the second external parsed entity to include -->
    <!ENTITY part2 SYSTEM "part2.xml">
]>
<etl>
    <connection driver="text"/>

    <!-- Including file #1 -->
    &part1;

    <script>
        content of the script
    </script>
    
    <!-- Including file #2 -->
    &part2;

</etl>

Java - Splitta una stringa

	// Splitta una stringa
	private String[] splitString(String str, String delims)
	{
		if(str == null)
			return null;
		else if(str.equals("") || delims == null || delims.length() == 0)
			return new String[]{ str };
		
		String[] s;
	  	Vector v = new Vector();
		
	  	int pos = 0;
		int newpos = str.indexOf(delims, pos);;

		while(newpos != -1)
		{
			v.addElement(str.substring(pos, newpos));
			pos = newpos + delims.length();
			newpos = str.indexOf(delims, pos);
		}
		v.addElement(str.substring(pos));
		
		s = new String[v.size()];
		for(int i=0, cnt=s.length; i<cnt; i++)
			s[i] = (String) v.elementAt(i);
		
		return s;
	}

fractionfiles.py

// Splits a file into smaller ones, and joins them together.

#!/usr/bin/env python

"""Splits and joins files. Helpful when media can't fit a file.
Be prepared for a lot of output files!"""

__author__="Andrew Pennebaker (andrew.pennebaker@gmail.com)"
__date__="6 Jan 3006 - 12 Feb 2006"
__copyright__="Copyright 2006 Andrew Pennebaker"
__license__="GPL"
__version__="0.3"
__URL__="http://snippets.dzone.com/posts/show/3541"

import sys, os
from getopt import getopt

SPLIT_MODE="SPLIT"
JOIN_MODE="JOIN"

def splitFile(name, length, number):
	if length==None:
		infile=open(name, "rb")
		size=0
		while infile.read(1)!="":
			size+=1

		infile.close()

		maxlength=size/number
		if number*maxlength<size:
			maxlength+=1

	else:
		if length<1:
			raise Exception

	infile=None
	try:
		infile=open(name, "rb")
	except Exception, e:
		raise e

	i=0
	j=0
	c=infile.read(1)
	while c!="":
		outfile=None
		try:
			outfile=open("%s.%d" % (name, j), "wb")
		except Exception, e:
			raise e

		while i<length and c!="":
			outfile.write(c)
			c=infile.read(1)
			i+=1

		outfile.close()
		i=0
		j+=1

	infile.close()

def joinFiles(filenames):
	if len(filenames)<1:
		raise Exception

	filenames.sort() # ...0 must be first

	origFilename=filenames[0][0:-2] # take ".0" off the first file name
	origFile=None

	try:
		origFile=open(origFilename, "wb")
	except Exception, e:
		raise e

	c="&" # dummy

	for filename in filenames:
		smallFile=None
		try:
			smallFile=open(filename, "rb")
		except Exception, e:
			raise e

		c=smallFile.read(1)
		while c!="":
			origFile.write(c)
			c=smallFile.read(1)

		smallFile.close()

	origFile.close()

def usage():
	print "Usage: %s [options] [files]" % (sys.argv[0])
	print "\n--split <file1 file 2 file 3...>"
	print "--join <dir1 dir2 dir3 ...>"
	print "--maxlength <bytes>"
	print "--maxfiles <number>"
	print "--help (usage)"

	sys.exit()

def main():
	global SPLIT_MODE
	global JOIN_MODE

	mode=SPLIT_MODE
	filenames=[]
	maxlength=1024
	maxfiles=None

	systemArgs=sys.argv[1:] # ignore program name

	optlist=[]
	args=[]

	try:
		optlist, args=getopt(systemArgs, None, ["split", "join", "maxlength=", "maxfiles=", "help"])
	except Exception, e:
		usage()

	if len(optlist)<1 or len(args)<1:
		usage()

	for option, value in optlist:
		if option=="--help":
			usage()

		elif option=="--split":
			mode=SPLIT_MODE
		elif option=="--join":
			mode=JOIN_MODE
		elif option=="--maxlength":
			try:
				maxlength=int(value)
				if maxlength<1:
					raise Exception
				maxfiles=None
			except Exception, e:
				raise "Length must be at least one"
		elif option=="--maxfiles":
			try:
				maxfiles=int(value)
				if maxfiles<1:
					raise Exception
				maxlength=None
			except Exception, e:
				raise "Number must be at least one"

	filenames=args

	if mode==SPLIT_MODE:
		for filename in filenames:
			try:
				splitFile(filename, maxlength, maxfiles)
			except Exception, e:
				raise e

	elif mode==JOIN_MODE:
		for directory in filenames:
			files=["%s%s%s" % (directory, os.sep, file) for file in os.listdir(directory)]

			try:
				joinFiles(files)
			except Exception, e:
				raise e

if __name__=="__main__":
	main()

Split array into smaller arrays of equal size

Split an array of elements into a set of smaller arrays of equal size. Extra elements are preferentially assigned to earlier arrays. If there are no elements in a given returned array it will be [] (empty array)

# use as standalone function
def chunk_array(array, pieces=2)
  len = array.length;
  mid = (len/pieces)
  chunks = []
  start = 0
  1.upto(pieces) do |i|
    last = start+mid
    last = last-1 unless len%pieces >= i
    chunks << array[start..last] || []
    start = last+1
  end
  chunks
end

# use as array.chunk
class Array
  def chunk(pieces=2)
    len = self.length;
    mid = (len/pieces)
    chunks = []
    start = 0
    1.upto(pieces) do |i|
      last = start+mid
      last = last-1 unless len%pieces >= i
      chunks << self[start..last] || []
      start = last+1
    end
    chunks
  end
end



Examples of use:

>> chunk_array [1,2,3,4,5,6], 2
=> [[1, 2, 3], [4, 5, 6]]

>> chunk_array [1,2,3,4,5,6], 3
=> [[1, 2], [3, 4], [5, 6]]

>> chunk_array [1,2,3,4,5,6], 4
=> [[1, 2], [3, 4], [5], [6]]

>> chunk_array [1,2,3,4,5,6,7,8,9,10], 4
=> [[1, 2, 3], [4, 5, 6], [7, 8], [9, 10]]

>> chunk_array [1,2,3], 4
=> [[1], [2], [3], []]

>> chunk_array [], 2
=> [[], []]


if you prefer the second form (more ruby-ish, but not always appropriate)

>> [1,2,3,4,5,6,7,8,9,10].chunk
=> [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]

>> [1,2,3,4,5,6,7,8,9,10].chunk 3
=> [[1, 2, 3, 4], [5, 6, 7], [8, 9, 10]]


This is handy when used with a splat because you can do things like:

left, right = *chunk_array(all,2)

SPLIT-UNIQUE - split a block into unique and duplicate values

    split-unique: func [block [any-block!] /local uniq dupe dest] [
        uniq: copy []
        dupe: copy []
        foreach item block [
            dest: either find/only uniq item [dupe] [uniq]
            append/only dest item
        ]
        reduce [uniq dupe]
    ]

GROUP - group like elements in a block

    group: func [
        {Returns a block of sub-blocks with items partitioned by value.}
        block  [any-block!]
        /local result
    ][
        result: copy []
        ; First, build up a list of keys, with a place for values
        ; to go with each key.
        foreach item block [
            if not find/only/skip result item 2 [
                repend result [item copy []]
            ]
        ]
        ; Add items to the block associated with each key.
        foreach item block [append/only select result item item]
        result
    ]

Split String into roughly equal-sized chunks.

Split a string into an array of roughly equal sized chunks based on a string or regular expression delimiter.
Delimiter is preserved in output.

class String
  def chunk_string(average_segment_size = 40, sclice_on = /\s+/)
    out = []
    slices_estimate = self.size.divmod(average_segment_size)
    slice_count = (slices_estimate[1] > 0 ? slices_estimate[0] + 1 : slices_estimate[0])
    slice_guess = self.size / slice_count
    previous_slice_location = 0
    (1..slice_count - 1).each do
      |i|
      slice_location = self.nearest_split(slice_guess * i, sclice_on)
      out << self.slice(previous_slice_location..slice_location)
      previous_slice_location = slice_location + 1
    end
    out << self.slice(previous_slice_location..self.size)
    out
  end

  def nearest_split(slice_start, slice_on)
    left_scan_location  = (self.slice(0..slice_start).rindex(slice_on)).to_i
    right_scan_location = (self.slice((slice_start+1)..self.size).index(slice_on)).to_i + slice_start
    ((slice_start - left_scan_location) < (right_scan_location - slice_start) ? left_scan_location : right_scan_location)
  end
end
« Newer Snippets
Older Snippets »
Showing 1-10 of 16 total  RSS