Personal tools

Contact Us 24/7 > 1 866.SIX FEET
Sections

Skip to content. | Skip to navigation

Home > Blog > Building Python Command Line Tools, Part 4: CSV Importing and Time Zones
12/07/16

EVERYONE.NET SCHEDULED MAINTENANCE 

Everyone.net will be performing maintenance on their databases Friday, December 9th, 2016 between 9:00PM PT to 3:00AM PT / 12:00AM ET to 06:00AM ET. During this time, all services including web mail, POP, IMAP, and SMTP relay may experience degraded performance and inbound mail delivery delays. We apologize for any inconvenience.

Blog

Building Python Command Line Tools, Part 4: CSV Importing and Time Zones

written by Calvin Hendryx-Parker on Monday March 16, 2015
Comments | Filed under: , ,

python command line header-2

Over the past few months here at Six Feet Up I have noticed an increasing need to demonstrate the usefulness and efficiency of Python command line tools. Command line tools can be very handy, and, making your own using Python is easier than you might think! The information in this article is part four of our series on building a command line app and will show you how to import the CSV module and handle timezones post import . Did you miss part three on bootstrapping the Pyramid app? Read it here.

Let's read in some CSV and do something useful with it

We have laid the groundwork to build a command line utility for our Pyramid application up to now, but let's do some real work and leverage another great part of the standard library. Python makes this very easy to do since it has the "batteries included" to handle CSV files. The PyMOTW page on the CSV module is a great place to start getting familiar with how to use it, but we want to be able to make use of it in our Pyramid app.

Excel to Python - CSV formatting

Character encoding can be tricky for CSV files that could contain almost any data. In our case, some of the speakers names have accents in them so we made sure that the file was encoded with UTF-8 and that the CSV reader was told to use the excel dialect to read the file. Below in the sample code, the UnicodeReader class is a wrapper around the csv.reader class that handles this enforcement.

import codecs
import csv

class UTF8Recoder:
    """
    Iterator that reads an encoded stream and reencodes the input to UTF-8
    """
    def __init__(self, f, encoding):
        self.reader = codecs.getreader(encoding)(f)

    def __iter__(self):
        return self

    def next(self):
        return self.reader.next().encode("utf-8")


class UnicodeReader:
    """
    A CSV reader which will iterate over lines in the CSV file "f",
    which is encoded in the given encoding.
    """
    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        f = UTF8Recoder(f, encoding)
        self.reader = csv.reader(f, dialect=dialect, **kwds)

    def next(self):
        row = self.reader.next()
        return [unicode(s, "utf-8") for s in row]

    def __iter__(self):
        return self

Now we can use this to handle the file.

Reading CSV into Python

So now, we can process our CSV file and save the data into the database in the app:

try:
    with open(csv_uri, 'rb') as f:
        reader = UnicodeReader(f)
        for row in reader:
            title, room, start, end, speakers = row
            startaware = tz.localize(datetime.strptime(start, DATETIME_FMT))
            start = universify_datetime(startaware)
            endaware = tz.localize(datetime.strptime(end, DATETIME_FMT))
            end = universify_datetime(endaware)
            speakers = [speaker.strip() for speaker
                        in speakers.split(',')]
            sessionid = make_unique_name(sessions, title.strip())
            session = Session(title.strip(), start, end, room.strip(),
                              speakers, sessionid)
            sessions[session.id] = session
            session.__parent__ = sessions
            print session.id, row
    transaction.commit()
finally:
    closer()

Cool Python Trick: Python 2.7/3 context manager

We are using the new context managers to simplify the code a bit here. The with statement will open and then handle closing the file for us later.

Mapping the data from the CSV to our Objects

We loop over each line of the CSV file and grab out the title, room, start, end and speakers. The next lines clean up the data and make sure the time is stored in a sane way using a method called universify_datetime() that I explain below. Then we use our reference to the sessions object in the database to store a new instance of a Session.

Timezone Snafus

This post isn't really about timezones, but since they are so important, I wanted to just note that we are taking our data and converting it to UTC with no timezone information stored in it. This is the only sane way to have an application that can support multiple timezones. See Armin's blog post on this subject. If you don't read it, you will still need to follow this one rule:

So here the rule of thumb which never shall be broken:

Always measure and store time in UTC. If you need to record where the time was taken, store that separately. Do not store the local time + timezone information!

Here's how to go about following this rule:

import pytz

def universify_datetime(dt):
    """Makes a `datetime` object a tz naive object
    """
    utc = pytz.timezone('UTC')
    utc_dt = dt.astimezone(utc)
    utc_dt = utc_dt.replace(tzinfo=None)
    return utc_dt

Enjoy!

Was this series useful? Do you have a topic you'd like to see us write about? Let us know in the comments! Be sure to stay tuned for more Python posts and sign up for our Python How-To digests to receive more how-to guides as soon as they are published!

 
Add comment

You can add a comment by filling out the form below. Plain text formatting.

puzzle
Calvin Hendryx-Parker
Chief Technology Officer
Calvin's Recent Posts:
Django CMS vs Plone (10/31/2016)

Next Steps


Select a type of support:

Contact our sales team

First name:
Last name:
Email:
Phone Number:
Message:
Fight spam:
What is + ?
 
Call Us 1 866.SIX FEET
Sections