11/26/2013

A little side project for big results

Python, Jekyll

I was recently given the task of creating 800 static HTML files based on data from a CSV file. Every page was based on the same template, but specific elements needed to be swapped out according to the data in the file. Producing all of those HTML files by hand would be incredibly time-consuming and boring, so I needed to create a way to automate the process.

Jekyll

Creating static HTML based on a template is a job for Jekyll: http://jekyllrb.com/

Jekyll is a ruby gem designed to produce static HTML websites without requiring a database. It provides you with the full power of a template system that’s normally only available in database-driven framework, but it can export everything to static HTML when you’re done.

This was my first time playing with Jekyll, and I was initially very excited that I had found a tool that could do exactly what I needed. But as I dug in, I found some limitations. First, I needed to add a plugin for creating custom data models instead of just blog pages. Jekyll Models https://github.com/krazykylep/Jekyll-Models was easy to add on, and allows for making custom models in Jekyll.

Once I was familiar with how the Jekyll Models test project works, I realized I still needed to find a way to extract the data out of the CSV into individual files. Jekyll Models expects to receive a separate YAML file for each page. I needed to turn each row of my CSV into an individual file with one line per column in YAML format. The conversion I needed was very simple: each file should contain one row from the CSV file. Each line should start with the column heading, followed by a colon, followed by the cell data.

Python

I recently joined a PyLadies meetup group where we have been working our way through Learn Python the Hard Way http://learnpythonthehardway.org/. This issue of converting CSV to YAML reminded me of one of our first lessons, which was all about reading and writing files, manipulating strings, and working with libraries. The python script I wrote uses all of those concepts, starting with importing the python csv library, then reading the data file, opening a new file for each row, and reformatting the text into the format I wanted. The comments explain it all:

# Takes a CSV file called “data.csv” and outputs each row as a numbered YAML file.

# Data in the first row of the CSV is assumed to be the column heading.

# Import the python library for parsing CSV files. import csv

# Open our data file in read-mode. csvfile = open(‘data.csv’, ‘r’)

# Save a CSV Reader object. datareader = csv.reader(csvfile, delimiter=’,’, quotechar='”‘)

# Empty array for data headings, which we will fill with the first row from our CSV. data_headings = []

# Loop through each row… for row_index, row in enumerate(datareader):

# If this is the first row, populate our data_headings variable. if row_index == 0: data_headings = row

# Othrwise, create a YAML file from the data in this row… else:

# Open a new file with filename based on index number of our current row.

filename = str(row_index)  + ‘.yml’

new_yaml = open(filename, ‘w’)

# Empty string that we will fill with YAML formatted text based on data extracted from our CSV.                yaml_text = “”

# Loop through each cell in this row… for cell_index, cell in enumerate(row):

# Compile a line of YAML text from our headings list and the text of the current cell, followed by a             linebreak.

# Heading text is converted to lowercase. Spaces are converted to underscores and hyphens are              removed.

# In the cell text, line endings are replaced with commas.

cell_heading =  data_headings[cell_index].lower().replace(” “, “_”).replace(“-“, “”)

cell_text =  cell_heading + “: ” + cell.replace(“n”, “, “) + “n”

# Add this line of text to the current YAML string. yaml_text += cell_text

# Write our YAML string to the new text file and close it. new_yaml.write(yaml_text)                 new_yaml.close()

# We’re done! Close the CSV file. csvfile.close()

Find my CSV to YAML converter on GitHub: https://github.com/hfionte/csv_to_yaml

Need help with your next digital project?

Click on that button and fill in the simple form.