Contact Us
24/7
Python BlogDjango BlogSearch for Kubernetes Big DataSearch for Kubernetes AWS BlogCloud Services

Blog

<< ALL BLOG POSTS

Building Python Command Line Tools, Part 4: CSV Importing and Time Zones

|
March 13, 2015

Over the past few months here at Six Feet Up, I have noticed an increasing need to demonstrate the usefulness and efficiency of Python command line tools. Command line tools can be very handy, and, making your own using Python is easier than you might think! The information in this article is part four of our series on building a command line app and will show you how to import the CSV module and handle timezones post import . Did you miss part three on bootstrapping the Pyramid app? Read it here.

Let's read in some CSV and do something useful with it

We have laid the groundwork to build a command line utility for our Pyramid application up to now, but let's do some real work and leverage another great part of the standard library. Python makes this very easy to do since it has the "batteries included" to handle CSV files. The PyMOTW page on the CSV module is a great place to start getting familiar with how to use it, but we want to be able to make use of it in our Pyramid app.

Excel to Python - CSV formatting

Character encoding can be tricky for CSV files that could contain almost any data. In our case, some of the speakers names have accents in them so we made sure that the file was encoded with UTF-8 and that the CSV reader was told to use the excel dialect to read the file. Below in the sample code, the UnicodeReader class is a wrapper around the csv.reader class that handles this enforcement.

import codecs
import csv

class UTF8Recoder:
    """
    Iterator that reads an encoded stream and reencodes the input to UTF-8
    """
    def __init__(self, f, encoding):
        self.reader = codecs.getreader(encoding)(f)

    def __iter__(self):
        return self

    def next(self):
        return self.reader.next().encode("utf-8")


class UnicodeReader:
    """
    A CSV reader which will iterate over lines in the CSV file "f",
    which is encoded in the given encoding.
    """
    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        f = UTF8Recoder(f, encoding)
        self.reader = csv.reader(f, dialect=dialect, **kwds)

    def next(self):
        row = self.reader.next()
        return [unicode(s, "utf-8") for s in row]

    def __iter__(self):
        return self

Now we can use this to handle the file.

Reading CSV into Python

So now, we can process our CSV file and save the data into the database in the app:

try:
    with open(csv_uri, 'rb') as f:
        reader = UnicodeReader(f)
        for row in reader:
            title, room, start, end, speakers = row
            startaware = tz.localize(datetime.strptime(start, DATETIME_FMT))
            start = universify_datetime(startaware)
            endaware = tz.localize(datetime.strptime(end, DATETIME_FMT))
            end = universify_datetime(endaware)
            speakers = [speaker.strip() for speaker
                        in speakers.split(',')]
            sessionid = make_unique_name(sessions, title.strip())
            session = Session(title.strip(), start, end, room.strip(),
                              speakers, sessionid)
            sessions[session.id] = session
            session.__parent__ = sessions
            print session.id, row
    transaction.commit()
finally:
    closer()

Cool Python Trick: Python 2.7/3 context manager

We are using the new context managers to simplify the code a bit here. The with statement will open and then handle closing the file for us later.

Mapping the data from the CSV to our Objects

We loop over each line of the CSV file and grab out the title, room, start, end and speakers. The next lines clean up the data and make sure the time is stored in a same way using a method called universify_datetime() that I explain below. Then we use our reference to the sessions object in the database to store a new instance of a Session.

Timezone Snafus

This post isn't really about time zones, but since they are so important, I wanted to just note that we are taking our data and converting it to UTC with no timezone information stored in it. This is the only sane way to have an application that can support multiple time zones. See Armin's blog post on this subject. If you don't read it, you will still need to follow this one rule:

So here the rule of thumb which never shall be broken:

Always measure and store time in UTC. If you need to record where the time was taken, store that separately. Do not store the local time + timezone information!

Here's how to go about following this rule:

import pytz

def universify_datetime(dt):
    """Makes a `datetime` object a tz naive object
    """
    utc = pytz.timezone('UTC')
    utc_dt = dt.astimezone(utc)
    utc_dt = utc_dt.replace(tzinfo=None)
    return utc_dt

RELATED POSTS:

  • Part 1: Console Scripts
    • The first post of this series focuses on  how to get started with building a command line app.
  • Part 2: Bootstrapping Pyramid
    • In this second part of our series on Python Command Line Tools,  we show you how to make the agrparse script executable and ready to be installed as a console script when someone installs your Pyramid or Django app.
  • Part 3: Bootstrapping Pyramid
    • In the third post in this series, we put some meat into our script that will actually interact with our Pyramid application. We show bootstrapping the app so we have the full environment ready to use with the database.

Enjoy!

Was this series useful? Do you have a topic you'd like to see us write about? Let us know in the comments! Be sure to stay tuned for more Python posts and sign up for our Python How-To digests to receive more how-to guides as soon as they are published!

How can we assist you in reaching your objectives?
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.