<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DZone Snippets: Nicferrier's Code Snippets</title>
    <link>http://snippets.dzone.com/posts</link>
    <pubDate>Sun, 27 Jul 2008 08:57:46 GMT</pubDate>
    <description>DZone Snippets: Nicferrier's Code Snippets</description>
    <item>
      <title>process email files like unix find</title>
      <link>http://snippets.dzone.com/posts/show/5248</link>
      <description>I call this program whitelist. It lets you run a command on a bunch of files depending on whether the file is an email and has a from address in a whitelist.&lt;br /&gt;&lt;br /&gt;It's useful for maintaining whitelisted mailboxes and analysing mailboxes. With a few more tests it might be a generically useful tool.&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;#!/usr/bin/python&lt;br /&gt;# Copyright (C) 2008 by Tapsell-Ferrier Limited&lt;br /&gt;&lt;br /&gt;# This program is free software; you can redistribute it and/or modify&lt;br /&gt;# it under the terms of the GNU General Public License as published by&lt;br /&gt;# the Free Software Foundation; either version 2, or (at your option)&lt;br /&gt;# any later version.&lt;br /&gt;&lt;br /&gt;# This program is distributed in the hope that it will be useful,&lt;br /&gt;# but WITHOUT ANY WARRANTY; without even the implied warranty of&lt;br /&gt;# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the&lt;br /&gt;# GNU General Public License for more details.&lt;br /&gt;&lt;br /&gt;# You should have received a copy of the GNU General Public License&lt;br /&gt;# along with this program; see the file COPYING.  If not, write to the&lt;br /&gt;# Free Software Foundation, Inc.,   51 Franklin Street, Fifth Floor,&lt;br /&gt;# Boston, MA  02110-1301  USA&lt;br /&gt;&lt;br /&gt;import commands&lt;br /&gt;import email.Parser&lt;br /&gt;import sys&lt;br /&gt;import re&lt;br /&gt;import getopt&lt;br /&gt;import os&lt;br /&gt;import os.path&lt;br /&gt;&lt;br /&gt;try:&lt;br /&gt;    from email.utils import parseaddr&lt;br /&gt;except:&lt;br /&gt;    from rfc822 import parseaddr&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;def help():&lt;br /&gt;    print """whitelist.py -h&lt;br /&gt;whitelist.py [-v] [-f whitelist filename] command ; filelist [-]&lt;br /&gt;&lt;br /&gt;Execute the specified command (which must be shell escaped if calling&lt;br /&gt;from shell) on all the files in the filelist or, if - is present in&lt;br /&gt;the filelist, read from stdin (like xargs) whenever the file is an&lt;br /&gt;email that contains a from address specified in the whitelist.&lt;br /&gt;&lt;br /&gt;Like xargs, or find, the command can include {} as a replacement token&lt;br /&gt;for the matched filename.&lt;br /&gt;&lt;br /&gt;The command can also be a header reference, for example:&lt;br /&gt;&lt;br /&gt;  $FROM&lt;br /&gt;&lt;br /&gt;will print the specified mails From address.&lt;br /&gt;&lt;br /&gt;Options:&lt;br /&gt;&lt;br /&gt; -v   specifies that the test is to be negated, executing the action if&lt;br /&gt;      the file does NOT contain a from address in the whiltelist.&lt;br /&gt;&lt;br /&gt; -f   specifies a whitelist, the default is $HOME/.addresses&lt;br /&gt;&lt;br /&gt;For example:&lt;br /&gt;&lt;br /&gt; whitelist.py -f .wlist wc \{} \: maildir/cur/*&lt;br /&gt;&lt;br /&gt;runs wc on each file in maildir/cur with a FROM address matching&lt;br /&gt;something in the whitelist; or:&lt;br /&gt;&lt;br /&gt; find maildir/INBOX/cur -type f | whitelist.py -v mv \{} mailbox/TRASH/cur \; -&lt;br /&gt;&lt;br /&gt;mv's all files in the INBOX with FROMs not matching the whitelist into&lt;br /&gt;a TRASH folder.&lt;br /&gt;&lt;br /&gt;  find maildir/Greylist/new -type f | whitelist.py -v $TO \; -&lt;br /&gt;&lt;br /&gt;displays the TO address of all messages where the from didn't match&lt;br /&gt;the whitelist.&lt;br /&gt;"""&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;def read_whitelisted(filename):&lt;br /&gt;    fd = open(filename)&lt;br /&gt;    data = fd.read()&lt;br /&gt;    fd.close()&lt;br /&gt;    return data.split()&lt;br /&gt;&lt;br /&gt;def get_msg(filename):&lt;br /&gt;    fd = open(filename)&lt;br /&gt;    try:&lt;br /&gt;        msg = email.Parser.HeaderParser().parse(fd, True)&lt;br /&gt;        return msg&lt;br /&gt;    finally:&lt;br /&gt;        fd.close()&lt;br /&gt;&lt;br /&gt;action_re = re.compile("\{}")&lt;br /&gt;&lt;br /&gt;def handle(filenames_fn, action, whitelist, negate=False):&lt;br /&gt;    for filename in filenames_fn():&lt;br /&gt;        msg = get_msg(filename)&lt;br /&gt;        realname, addr = parseaddr(msg["from"])&lt;br /&gt;        result = addr in whitelist&lt;br /&gt;&lt;br /&gt;        if negate:&lt;br /&gt;            result = not result&lt;br /&gt;&lt;br /&gt;        if result:&lt;br /&gt;            try:&lt;br /&gt;                m = re.match("\$(.+)", action)&lt;br /&gt;                result = msg[m.group(1)]&lt;br /&gt;            except Exception:&lt;br /&gt;                cmd_str = action_re.sub(filename, action)&lt;br /&gt;                os.system(cmd_str)&lt;br /&gt;            else:&lt;br /&gt;                print result&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;def main(args):&lt;br /&gt;    negate = False&lt;br /&gt;    whitelist_filename = os.path.join(os.environ["HOME"], ".addresses")&lt;br /&gt;    opts, args = getopt.getopt(args, "hv")&lt;br /&gt;    for o,a in opts:&lt;br /&gt;        if o == "-h":&lt;br /&gt;            help()&lt;br /&gt;            sys.exit(0)&lt;br /&gt;&lt;br /&gt;        elif o == "-v":&lt;br /&gt;            negate = True&lt;br /&gt;&lt;br /&gt;        elif o == "-f":&lt;br /&gt;            whitelist_filename = a&lt;br /&gt;&lt;br /&gt;    if not os.access(whitelist_filename, os.F_OK):&lt;br /&gt;        print &gt;&gt;sys.stderr, "whitelist.py   -  no whitelist filename\n"&lt;br /&gt;        help()&lt;br /&gt;        sys.exit(1)&lt;br /&gt;&lt;br /&gt;    cmdstr = " ".join(args)&lt;br /&gt;    m = re.match("(.*) ;([ ]*.*)", cmdstr)&lt;br /&gt;    if not m:&lt;br /&gt;        sys.exit(1)&lt;br /&gt;&lt;br /&gt;    cmd = m.group(1)&lt;br /&gt;    files = m.group(2).strip().split(" ")&lt;br /&gt;&lt;br /&gt;    def ffn():&lt;br /&gt;        for f in files:&lt;br /&gt;            if f == "-":&lt;br /&gt;                for innerf in sys.stdin:&lt;br /&gt;                    yield innerf.strip()&lt;br /&gt;            else:&lt;br /&gt;                yield f&lt;br /&gt;        return&lt;br /&gt;&lt;br /&gt;    whitelist = read_whitelisted(whitelist_filename)&lt;br /&gt;    handle(ffn, cmd, whitelist, negate)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;if __name__ == "__main__":&lt;br /&gt;    main(sys.argv[1:])&lt;br /&gt;&lt;br /&gt;# End&lt;br /&gt;&lt;/code&gt;</description>
      <pubDate>Tue, 18 Mar 2008 16:54:19 GMT</pubDate>
      <guid>http://snippets.dzone.com/posts/show/5248</guid>
      <author>nicferrier (Nic Ferrier)</author>
    </item>
  </channel>
</rss>
