Never been to DZone Snippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

« Newer Snippets
Older Snippets »
Showing 1-9 of 9 total  RSS 

Making md5sum.exe Work with Paths in Python

On Windows systems, md5sum.exe is a nice little program to generate MD5 sums. However, whoever created this application forgot to add the ability to recognize paths in the file given to md5sum.exe. For example, say you want to md5 a file called test.txt.

C:\test>dir
Directory of C:\test

06/13/2007 03:52 PM <DIR> .
06/13/2007 03:52 PM <DIR> ..
06/13/2007 03:52 PM 0 test.txt
1 File(s) 0 bytes

C:\test>md5sum test.txt
d41d8cd98f00b204e9800998ecf8427e *test.txt

Groovy, cool...

However, if you try the same thing from some other directory (where text.txt does not live) you get the following:

C:\>md5sum c:/test/test.txt
md5sum: test.txt: No such file or directory

Sucky, eh?
You could always find a different program to do md5sum. Or, if you don't want to do this, you can spawn a pipe to "cmd", navigate to the correct directory, and then run md5sum.

Here's the code in python to do just that.

# define your file name and path
file_name = "afile.txt"
file_path = "c:\\afolder"

# setup the md5sum command (assuming md5sum.exe location is in PATH)
md5_cmd = "md5sum \"" + file_path "\\" + file_name + "\"\n"

# open the command shell
fromchild, tochild = popen2.popen4("cmd")

#push the directory change and md5sum commands to the shell 
tochild.write("c:\nchdir " + my_path + "\n" + md5_cmd + "exit\n")
tochild.close()

#get the output from the shell session
out = fromchild.read()

#split the output so that we may extract the md5sum
output = string.split(out)

#grab the md5sum  
md5sum_local = ""
for item in output:
    matchmd5local = re.match("([0-9a-fA-F]{32})",item)
    if matchmd5local:
        md5sum_local = matchmd5local.groups()
        md5sum_local = md5sum_local[0]

if md5sum_local:
    print_status("MD5: Local checksum = " + md5sum_local + "\n")
else:
    print_status("ERROR: could not obtain local md5sum for " + my_file + "\n")

How to identify files which are identical in a directory

I use this script to find identical files with different names on a folder. It is very usefull to when you want to check your media collection for duplicates items. Basically it scans the directory and its subfolders, computes the md5 sum for each file, and keep them in a hash so they can be compared. The buffer size can be bigger or smaller, regarding the amount of memory you want to use. For large directories it can take a while to run, but it worked fine, at least for my 20+GB pictures collection.
#!/usr/local/bin/ruby
require 'digest/md5'
require 'digest/sha1'

$BUF_SIZE = 1024*1024*1024

class Folder_Md5
	
	def initialize(folder)
		@md5_to_files = Hash.new	
		@folder = folder
	end	
		
	def scan
		@md5_to_files.clear
		compute_md5(@folder)
	end		
	
	def md5_for_file(file_path)
		@md5_to_files[file_path]
	end	
	
	def identical_count
		total = 0
		@md5_to_files.each_value do |value|
				if value.size >= 2
					total+= value.size		
				end 
			end
		return total	
	end	
	
	def list_identical
		total = 0
		identities = 0
		puts 'The List of identical files'
		@md5_to_files.each_value do |value|
				if value.size >= 2
					identities+=1
					total+= value.size
					puts 'Idenitical files:'
					value.each{|file_name| puts file_name}
				end 
			end
		puts "got #{identities} identities impling #{total} files"
	end
	
	private	
		def compute_md5(file_path)
			if File.directory?(file_path)
				crt_dir = Dir.new(file_path)
				crt_dir.each do |file_name|
					if file_name != '.' &&  file_name != '..'				
						compute_md5("#{crt_dir.path}#{file_name}")		
					end
				end	
			else
				md5_val = md5(file_path)
				if @md5_to_files[md5_val] == nil
					@md5_to_files[md5_val]  = [file_path]	
				else					
					 @md5_to_files[md5_val] << file_path
				end		
			end
		end
		
		def md5(file_path)
			hasher = Digest::MD5.new
			open(file_path, "r") do |io|
				counter = 0
				while (!io.eof)
					readBuf = io.readpartial($BUF_SIZE)
					putc '.' if ((counter+=1) % 3 == 0)
					hasher.update(readBuf)
				end
			end			
			return hasher.hexdigest
		end	
end

worker = Folder_Md5.new(ARGV[0])
worker.scan
worker.list_identical

CRC16.py

#!/usr/bin/env python

__author__="Andrew Pennebaker (andrew.pennebaker@gmail.com)"
__date__="21 Dec 2005 - 17 Jul 2006"
__copyright__="Copyright 2006 Andrew Pennebaker"
__license__="GPL"
__version__="0.4"
__URL__="http://snippets.dzone.com/posts/show/3544"

import HashFunction

class CRC16(HashFunction.HashFunction):
	BLOCK_SIZE=1
	DIGEST_SIZE=2

	INIT=0x0000
	SUM_REQ="Sum >= 0"

	TEST_DATA="abc"
	TEST_HASH=0xa8b6

	TABLE=[
		0x0000, 0xc0c1, 0xc181, 0x0140, 0xc301, 0x03c0, 0x0280, 0xc241,
		0xc601, 0x06c0, 0x0780, 0xc741, 0x0500, 0xc5c1, 0xc481, 0x0440,
		0xcc01, 0x0cc0, 0x0d80, 0xcd41, 0x0f00, 0xcfc1, 0xce81, 0x0e40,
		0x0a00, 0xcac1, 0xcb81, 0x0b40, 0xc901, 0x09c0, 0x0880, 0xc841,
		0xd801, 0x18c0, 0x1980, 0xd941, 0x1b00, 0xdbc1, 0xda81, 0x1a40,
		0x1e00, 0xdec1, 0xdf81, 0x1f40, 0xdd01, 0x1dc0, 0x1c80, 0xdc41,
		0x1400, 0xd4c1, 0xd581, 0x1540, 0xd701, 0x17c0, 0x1680, 0xd641,
		0xd201, 0x12c0, 0x1380, 0xd341, 0x1100, 0xd1c1, 0xd081, 0x1040,
		0xf001, 0x30c0, 0x3180, 0xf141, 0x3300, 0xf3c1, 0xf281, 0x3240,
		0x3600, 0xf6c1, 0xf781, 0x3740, 0xf501, 0x35c0, 0x3480, 0xf441,
		0x3c00, 0xfcc1, 0xfd81, 0x3d40, 0xff01, 0x3fc0, 0x3e80, 0xfe41,
		0xfa01, 0x3ac0, 0x3b80, 0xfb41, 0x3900, 0xf9c1, 0xf881, 0x3840,
		0x2800, 0xe8c1, 0xe981, 0x2940, 0xeb01, 0x2bc0, 0x2a80, 0xea41,
		0xee01, 0x2ec0, 0x2f80, 0xef41, 0x2d00, 0xedc1, 0xec81, 0x2c40,
		0xe401, 0x24c0, 0x2580, 0xe541, 0x2700, 0xe7c1, 0xe681, 0x2640,
		0x2200, 0xe2c1, 0xe381, 0x2340, 0xe101, 0x21c0, 0x2080, 0xe041,
		0xa001, 0x60c0, 0x6180, 0xa141, 0x6300, 0xa3c1, 0xa281, 0x6240,
		0x6600, 0xa6c1, 0xa781, 0x6740, 0xa501, 0x65c0, 0x6480, 0xa441,
		0x6c00, 0xacc1, 0xad81, 0x6d40, 0xaf01, 0x6fc0, 0x6e80, 0xae41,
		0xaa01, 0x6ac0, 0x6b80, 0xab41, 0x6900, 0xa9c1, 0xa881, 0x6840,
		0x7800, 0xb8c1, 0xb981, 0x7940, 0xbb01, 0x7bc0, 0x7a80, 0xba41,
		0xbe01, 0x7ec0, 0x7f80, 0xbf41, 0x7d00, 0xbdc1, 0xbc81, 0x7c40,
		0xb401, 0x74c0, 0x7580, 0xb541, 0x7700, 0xb7c1, 0xb681, 0x7640,
		0x7200, 0xb2c1, 0xb381, 0x7340, 0xb101, 0x71c0, 0x7080, 0xb041,
		0x5000, 0x90c1, 0x9181, 0x5140, 0x9301, 0x53c0, 0x5280, 0x9241,
		0x9601, 0x56c0, 0x5780, 0x9741, 0x5500, 0x95c1, 0x9481, 0x5440,
		0x9c01, 0x5cc0, 0x5d80, 0x9d41, 0x5f00, 0x9fc1, 0x9e81, 0x5e40,
		0x5a00, 0x9ac1, 0x9b81, 0x5b40, 0x9901, 0x59c0, 0x5880, 0x9841,
		0x8801, 0x48c0, 0x4980, 0x8941, 0x4b00, 0x8bc1, 0x8a81, 0x4a40,
		0x4e00, 0x8ec1, 0x8f81, 0x4f40, 0x8d01, 0x4dc0, 0x4c80, 0x8c41,
		0x4400, 0x84c1, 0x8581, 0x4540, 0x8701, 0x47c0, 0x4680, 0x8641,
		0x8201, 0x42c0, 0x4380, 0x8341, 0x4100, 0x81c1, 0x8081, 0x4040
	]

	def __init__(self, sum=0x0000):
		self.sum=sum^0xffff

	def sumValid(self, sum):
		return sum>=0

	def _update(self, b):
		self.sum=(self.sum>>8)^self.TABLE[(self.sum^(b&0xff))&0xff]

	def digest(self):
		return self.sum^0xffff

	def format(self, data):
		return "%02x" % (data)

	def unformat(self, hash):
		return int(hash, 16)

if __name__=="__main__":
	HashFunction.main(CRC16)

CRC8.py

#!/usr/bin/env python

__author__="Andrew Pennebaker (andrew.pennebaker@gmail.com)"
__date__="23 Dec 2005 - 17 Jul 2006"
__copyright__="Copyright 2006 Andrew Pennebaker"
__license__="GPL"
__version__="0.3"
__credits__="From the PyPy project"
__URL__="http://snippets.dzone.com/posts/show/3543"

import HashFunction

class CRC8(HashFunction.HashFunction):
	BLOCK_SIZE=1
	DIGEST_SIZE=1

	INIT=0x00
	SUM_REQ="Sum >= 0"

	TEST_DATA="abc"
	TEST_HASH=0x8b

	TABLE=[
		0x00, 0x07, 0x0e, 0x09, 0x1c, 0x1b, 0x12, 0x15,
		0x38, 0x3f, 0x36, 0x31, 0x24, 0x23, 0x2a, 0x2d,
		0x70, 0x77, 0x7e, 0x79, 0x6c, 0x6b, 0x62, 0x65,
		0x48, 0x4f, 0x46, 0x41, 0x54, 0x53, 0x5a, 0x5d,
		0xe0, 0xe7, 0xee, 0xe9, 0xfc, 0xfb, 0xf2, 0xf5,
		0xd8, 0xdf, 0xd6, 0xd1, 0xc4, 0xc3, 0xca, 0xcd,
		0x90, 0x97, 0x9e, 0x99, 0x8c, 0x8b, 0x82, 0x85,
		0xa8, 0xaf, 0xa6, 0xa1, 0xb4, 0xb3, 0xba, 0xbd,
		0xc7, 0xc0, 0xc9, 0xce, 0xdb, 0xdc, 0xd5, 0xd2,
		0xff, 0xf8, 0xf1, 0xf6, 0xe3, 0xe4, 0xed, 0xea,
		0xb7, 0xb0, 0xb9, 0xbe, 0xab, 0xac, 0xa5, 0xa2,
		0x8f, 0x88, 0x81, 0x86, 0x93, 0x94, 0x9d, 0x9a,
		0x27, 0x20, 0x29, 0x2e, 0x3b, 0x3c, 0x35, 0x32,
		0x1f, 0x18, 0x11, 0x16, 0x03, 0x04, 0x0d, 0x0a,
		0x57, 0x50, 0x59, 0x5e, 0x4b, 0x4c, 0x45, 0x42,
		0x6f, 0x68, 0x61, 0x66, 0x73, 0x74, 0x7d, 0x7a,
		0x89, 0x8e, 0x87, 0x80, 0x95, 0x92, 0x9b, 0x9c,
		0xb1, 0xb6, 0xbf, 0xb8, 0xad, 0xaa, 0xa3, 0xa4,
		0xf9, 0xfe, 0xf7, 0xf0, 0xe5, 0xe2, 0xeb, 0xec,
		0xc1, 0xc6, 0xcf, 0xc8, 0xdd, 0xda, 0xd3, 0xd4,
		0x69, 0x6e, 0x67, 0x60, 0x75, 0x72, 0x7b, 0x7c,
		0x51, 0x56, 0x5f, 0x58, 0x4d, 0x4a, 0x43, 0x44,
		0x19, 0x1e, 0x17, 0x10, 0x05, 0x02, 0x0b, 0x0c,
		0x21, 0x26, 0x2f, 0x28, 0x3d, 0x3a, 0x33, 0x34,
		0x4e, 0x49, 0x40, 0x47, 0x52, 0x55, 0x5c, 0x5b,
		0x76, 0x71, 0x78, 0x7f, 0x6a, 0x6d, 0x64, 0x63,
		0x3e, 0x39, 0x30, 0x37, 0x22, 0x25, 0x2c, 0x2b,
		0x06, 0x01, 0x08, 0x0f, 0x1a, 0x1d, 0x14, 0x13,
		0xae, 0xa9, 0xa0, 0xa7, 0xb2, 0xb5, 0xbc, 0xbb,
		0x96, 0x91, 0x98, 0x9f, 0x8a, 0x8d, 0x84, 0x83,
		0xde, 0xd9, 0xd0, 0xd7, 0xc2, 0xc5, 0xcc, 0xcb,
		0xe6, 0xe1, 0xe8, 0xef, 0xfa, 0xfd, 0xf4, 0xf3
	]

	def __init__(self, sum=0x00):
		self.sum=sum^0xff

	def sumValid(self, sum):
		return sum>=0

	def _update(self, b):
		self.sum=self.TABLE[self.sum^b]

	def digest(self):
		return self.sum^0xff

	def format(self, data):
		return "%02x" % (data)

	def unformat(self, hash):
		return int(hash, 16)

if __name__=="__main__":
	HashFunction.main(CRC8)

BSD.py

// BSD Unix TCP/IP Checksum

#!/usr/bin/env python

__author__="Andrew Pennebaker (andrew.pennebaker@gmail.com)"
__date__="21 Dec 2005 - 3 May 2006"
__copyright__="Copyright 2006 Andrew Pennebaker"
__license__="GPL"
__version__="0.3"
__URL__="http://snippets.dzone.com/posts/show/3542"

import HashFunction

class BSD(HashFunction.HashFunction):
	BLOCK_SIZE=1
	DIGEST_SIZE=2

	INIT=0x00
	SUM_REQ="Sum >= 0"

	TEST_DATA="abc"
	TEST_HASH=0x40ac

	def __init__(self, sum=0x00):
		self.sum=sum

	def sumValid(self, sum):
		return sum>=0

	def rotate(self, b):
		if (b&1)!=0:
			return (b>>1)+0x8000

		return b>>1

	def _update(self, b):
		self.sum=(self.rotate(self.sum)+b)&0xffff

	def digest(self):
		return self.sum

	def format(self, data):
		return "%05d" % (data)

	def unformat(self, hash):
		return int(hash)

if __name__=="__main__":
	HashFunction.main(BSD)

CRC32.py

#!/usr/bin/env python

__author__="Andrew Pennebaker (andrew.pennebaker@gmail.com)"
__date__="10 Oct 2005 - 17 Jul 2006"
__copyright__="Copyright 2006 Andrew Pennebaker"
__license__="GPL"
__version__="0.4"
__credits__="From the PyPy project"
__URL__="http://snippets.dzone.com/posts/show/3540"

import HashFunction

class CRC32(HashFunction.HashFunction):
	BLOCK_SIZE=1
	DIGEST_SIZE=4

	INIT=0x00000000
	SUM_REQ="Sum >= 0"

	TEST_DATA="abc"
	TEST_HASH=0x352441c2

	TABLE=[
		0x00000000, 0x77073096, 0xee0e612c, 0x990951ba,
		0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3,
		0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988,
		0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91,
		0x1db71064, 0x6ab020f2, 0xf3b97148, 0x84be41de,
		0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7,
		0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec,
		0x14015c4f, 0x63066cd9, 0xfa0f3d63, 0x8d080df5,
		0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172,
		0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b,
		0x35b5a8fa, 0x42b2986c, 0xdbbbc9d6, 0xacbcf940,
		0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59,
		0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116,
		0x21b4f4b5, 0x56b3c423, 0xcfba9599, 0xb8bda50f,
		0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924,
		0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d,
		0x76dc4190, 0x01db7106, 0x98d220bc, 0xefd5102a,
		0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433,
		0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818,
		0x7f6a0dbb, 0x086d3d2d, 0x91646c97, 0xe6635c01,
		0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e,
		0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457,
		0x65b0d9c6, 0x12b7e950, 0x8bbeb8ea, 0xfcb9887c,
		0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65,
		0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2,
		0x4adfa541, 0x3dd895d7, 0xa4d1c46d, 0xd3d6f4fb,
		0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0,
		0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9,
		0x5005713c, 0x270241aa, 0xbe0b1010, 0xc90c2086,
		0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f,
		0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4,
		0x59b33d17, 0x2eb40d81, 0xb7bd5c3b, 0xc0ba6cad,
		0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a,
		0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683,
		0xe3630b12, 0x94643b84, 0x0d6d6a3e, 0x7a6a5aa8,
		0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1,
		0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe,
		0xf762575d, 0x806567cb, 0x196c3671, 0x6e6b06e7,
		0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc,
		0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5,
		0xd6d6a3e8, 0xa1d1937e, 0x38d8c2c4, 0x4fdff252,
		0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b,
		0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60,
		0xdf60efc3, 0xa867df55, 0x316e8eef, 0x4669be79,
		0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236,
		0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f,
		0xc5ba3bbe, 0xb2bd0b28, 0x2bb45a92, 0x5cb36a04,
		0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d,
		0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a,
		0x9c0906a9, 0xeb0e363f, 0x72076785, 0x05005713,
		0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38,
		0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21,
		0x86d3d2d4, 0xf1d4e242, 0x68ddb3f8, 0x1fda836e,
		0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777,
		0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c,
		0x8f659eff, 0xf862ae69, 0x616bffd3, 0x166ccf45,
		0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2,
		0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db,
		0xaed16a4a, 0xd9d65adc, 0x40df0b66, 0x37d83bf0,
		0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9,
		0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6,
		0xbad03605, 0xcdd70693, 0x54de5729, 0x23d967bf,
		0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94,
		0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d
	]

	def __init__(self, sum=0x00000000):
		self.sum=sum^0xffffffff

	def sumValid(self, sum):
		return sum>=0

	def _update(self, b):
		self.sum=self.TABLE[(self.sum^b)&0xff]^(self.sum>>8)

	def digest(self):
		return self.sum^0xffffffff

	def format(self, data):
		return "%02x" % (data)

	def unformat(self, hash):
		return long(hash, 16)

if __name__=="__main__":
	HashFunction.main(CRC32)

HMAC-SHA1 - calc HMAC/SHA-1 checksum

hmac-sha1: func [val key] [checksum/method/key val 'sha1 key]

Luhn Credit card checksum function

luhn: func [    ; tested out OK.
    card-num [string!]
    /local cksum flag val
] [
    cksum: 0
    flag: even? length? card-num
    foreach digit card-num [
        val: to integer! form digit
        if flag [
            val: val * 2
            if val > 9 [val: val - 9]
            ; - alt -
            ;val: pick [0 2 4 6 8 1 3 5 7 9] val + 1
        ]
        cksum: cksum + val
        flag: not flag
    ]
    print cksum
    0 = remainder cksum 10
]

;print luhn "4005550000000019"

Credit card checksum

From David Shaw's recipe
def cardLuhnChecksumIsValid(card_number):
    sum = 0
    num_digits = len(card_number)
    oddeven = num_digits & 1
    for count in range(0, num_digits):
        digit = int(card_number[count])
        if not (( count & 1 ) ^ oddeven ):
            digit = digit * 2
        if digit > 9:
            digit = digit - 9
        sum = sum + digit
    return ( (sum % 10) == 0 )
« Newer Snippets
Older Snippets »
Showing 1-9 of 9 total  RSS