Never been to DZone Snippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

About this user

Romulus Pasca

« Newer Snippets
Older Snippets »
Showing 1-2 of 2 total  RSS 

Organize photos and other files on an year/month/day folders structures

This script can organize you media colelction in folders by year/month/day.
For jpeg files, looks in exif for the creation date, if that file has that kind of metadadata. For any other files it only checks the creation time.
The script takes 3 option on the command line. A command which may be -m, -c or -f, a source folder and a destination folder.
Using -c will copy the files from souce to destination, -m will move them, - f will also move them, by making first a copy then a delete. -f option may used when -m doesn't work, the most common situation beeing when you want to move the files from one file system to another(e.g. from your ext3 local hard drive to an external fat32 usb harddrive)
After you organize the files, you can find duplicate items using my previous snippet.
This script have been inspired from here
#!/usr/local/bin/ruby
require 'rubygems'
require 'ftools'
require 'exifr'


def process_file(file_path,destination_dir)
	if File.directory?(file_path)
		crt_dir = Dir.new(file_path)
		crt_dir.each do |file_name|
			if file_name != '.' &&  file_name != '..'				
				process_file("#{crt_dir.path}/#{file_name}",destination_dir)		
			end
		end	
	else
			
		if  File.fnmatch('*.jpg',file_path) ||  File.fnmatch('*.jpeg',file_path)
			picture = EXIFR::JPEG.new(file_path)
			if picture != nil && picture.exif != nil
				file_date = picture.date_time		
			else
				f = File.new(file_path)
				file_date = f.mtime
			end							
		end
		if file_date == nil
			f = File.new(file_path)
			file_date = f.mtime	
		end		
		year_dir =  destination_dir + file_date.strftime("%Y")
		month_dir = destination_dir + file_date.strftime("%Y/%m-%b")
		day_dir = destination_dir + file_date.strftime("%Y/%m-%b/%d")
		new_file_name = day_dir + "/" + File.basename(file_path)
		begin
			Dir.mkdir(year_dir) unless File.exists?(year_dir)
			Dir.mkdir(month_dir) unless File.exists?(month_dir)
			Dir.mkdir(day_dir) unless File.exists?(day_dir)
			if ARGV[0 ] =='-m' #move the files
				File.rename(file_path, new_file_name)
			elsif ARGV[0]  =='-c' #copy the files
				File.cp(file_path, new_file_name)	
			elsif ARGV[0]  =='-f' #copy and delete, acts like a move between thw different file systems	
				File.cp(file_path, new_file_name)
				File.delete(file_path)	
			else
				puts "Unknown option #{ARGV[0]}"
				exit	
			end	
		end	
		
	end	
end


if ARGV.length != 3
	puts "Three arguments are required to run the script, -c|-m|-f <source_folder_or_file>  <destination_folder>"
	exit
end

if ARGV[0] !='-c'  && ARGV[0]!='-m' && ARGV[0]!='-f'
	puts "Unknown running option: #{ARGV[0]}"
	exit
end

if not File.exists?(ARGV[1]) 
	puts "Source file does not exists: #{ARGV[1]}"
	exit
end


if not File.directory?(ARGV[2]) 
	puts "Destination file is not a directory #{ARGV[2]}"
	exit
end

if ARGV[1]==ARGV[2]
	puts "Source and destination must be different"
	exit
end	
	
	
process_file(ARGV[1], ARGV[2])

How to identify files which are identical in a directory

I use this script to find identical files with different names on a folder. It is very usefull to when you want to check your media collection for duplicates items. Basically it scans the directory and its subfolders, computes the md5 sum for each file, and keep them in a hash so they can be compared. The buffer size can be bigger or smaller, regarding the amount of memory you want to use. For large directories it can take a while to run, but it worked fine, at least for my 20+GB pictures collection.
#!/usr/local/bin/ruby
require 'digest/md5'
require 'digest/sha1'

$BUF_SIZE = 1024*1024*1024

class Folder_Md5
	
	def initialize(folder)
		@md5_to_files = Hash.new	
		@folder = folder
	end	
		
	def scan
		@md5_to_files.clear
		compute_md5(@folder)
	end		
	
	def md5_for_file(file_path)
		@md5_to_files[file_path]
	end	
	
	def identical_count
		total = 0
		@md5_to_files.each_value do |value|
				if value.size >= 2
					total+= value.size		
				end 
			end
		return total	
	end	
	
	def list_identical
		total = 0
		identities = 0
		puts 'The List of identical files'
		@md5_to_files.each_value do |value|
				if value.size >= 2
					identities+=1
					total+= value.size
					puts 'Idenitical files:'
					value.each{|file_name| puts file_name}
				end 
			end
		puts "got #{identities} identities impling #{total} files"
	end
	
	private	
		def compute_md5(file_path)
			if File.directory?(file_path)
				crt_dir = Dir.new(file_path)
				crt_dir.each do |file_name|
					if file_name != '.' &&  file_name != '..'				
						compute_md5("#{crt_dir.path}#{file_name}")		
					end
				end	
			else
				md5_val = md5(file_path)
				if @md5_to_files[md5_val] == nil
					@md5_to_files[md5_val]  = [file_path]	
				else					
					 @md5_to_files[md5_val] << file_path
				end		
			end
		end
		
		def md5(file_path)
			hasher = Digest::MD5.new
			open(file_path, "r") do |io|
				counter = 0
				while (!io.eof)
					readBuf = io.readpartial($BUF_SIZE)
					putc '.' if ((counter+=1) % 3 == 0)
					hasher.update(readBuf)
				end
			end			
			return hasher.hexdigest
		end	
end

worker = Folder_Md5.new(ARGV[0])
worker.scan
worker.list_identical
« Newer Snippets
Older Snippets »
Showing 1-2 of 2 total  RSS