With local councils in the UK being required to release local spending data, and advised to do so in a common way (common columns, CSV format etc) I thought it might be interesting to start trying to aggregate the data in a Google Fusion Table.

Uploading new CSV docs to the same table is just a matter of mapping like columns onto like columns.

Unfortunately, many of the councils don't include a column for the name of the council (in any single council's case, all the cells in this column would be the same), which means the CSV doc needs preparing before it can be uploaded to Google Fusion tables if we want to be able to identify items by council.

Does anyone know a simple way of adding a column to a csv/txt file with the same default cell entry on each row (e.g. the name of the council that created a particular CSV file)? A command line tool would be great (OS/X, Linux), though solutions for Windows platform might also be handy:-)

Examples problem cases are:

  • given a file councilData.csv, how do I add a leading column (column 1) where all the entries are "Isle of Wight Council"
  • given a file councilData.csv, how do I add a leading column (col 1) where the first (title) row entry in that column has the value "Council Name" and all the other entries are "Isle of Wight Council"
  • is there a way of introducing the column elsewhere in the data, eg as column 2 (where the first column is column 1), or as the final column?

asked 29 Jan '11, 12:57

psychemedia's gravatar image

psychemedia ♦♦
1.1k323858
accept rate: 11%

edited 29 Jan '11, 13:23


sed "s/^/council,/" test.csv where test.csv is your comma separated file, and council is your council name. This sticks at the front of each line. sed "s/$/,council/" test.csv will stick it at the end of each line. If you want it somewhere in between it can be done, but it may be better to do it with awk. This would also allow you to grab the council name from the filename if it is in there, or from another file if necessary.

Of course, those rely on having sed or awk. I tested those sed versions on my Win7 box, but I suspect I had installed the unix-like applications from somewhere a long time ago!

Quick update to allow for header line:

sed "2,$s/^/council,/;1s/^/rbc,/" test.csv

the 2,$s says make those changes for all lines after the first one

the 1s says make that change only on the first line

link

answered 29 Jan '11, 13:19

PatParslow's gravatar image

PatParslow
8614
accept rate: 33%

edited 29 Jan '11, 13:34

So the first example just adds: council, to each row? If I wanted to add: Isle of Wight Council, I'd presumably use: sed "s/^/Isle of Wight Council,/" test.csv but how about if I wanted to add: "Isle of WIght Council" (I.e. with the quotes?)

(29 Jan '11, 13:25) psychemedia ♦♦

sed "2,$s/^/council,/;1s/^/"Isle of wight",/" test.csv

The use of the " allows the " to be used ('escapes' the character)

(29 Jan '11, 13:33) PatParslow

If I had a directory containing multiple .csv files, e.g. in directory ./thisCouncil, or ./thatCouncil, how would I run the same command over all the csv files in one of those directories?

(29 Jan '11, 13:34) psychemedia ♦♦

A different way :-) Personally, I would use ruby. Could do it in awk (awkwardly) or you could use this sed version in combination with a shell script to control which file is being processed.

(29 Jan '11, 13:37) PatParslow
1

OK let me try that again, taking account of various ways this site might sanitize the comment :-)

sed "2,$s/^/council,/;1s/^/\"Isle of wight\",/" test*.csv

(29 Jan '11, 14:23) PatParslow

The multiple file version should be doable with something like this as a bash script:

for f in *.csv ; do sed "2,$s/^/council,/;1s/^/"Isle of wight",/" "$f" > "${f%}_edited.csv" ; done

Can't think, offhand, how to extract council name from parent directory name, though if all your council folders were subdirectories in the same place, a nested loop could do it - can't remember off hand how to get the value of the variable from the directory-iterating loop into the string used to tell sed what to do though (and don't have a unixy box to hand to try it out!)

(29 Jan '11, 14:42) PatParslow
showing 5 of 6 show all

maybe more windows friendly...if you have notepad++ installed you can do an extended search for things like line breaks (nr or the opposite, you can view all characters and do a search to figure out which), then just replace the line breaks. You'd still have to do line 1 though. I do this all the time and find it more convenient since I use notead++ as my primary editor...it can also do regex, but I spend more time on the syntax :)

Example: when viewing your file with all characters on it will look like this: col1,col2,col3CRLF

with extended search on do a find and replace Find: nr Replace with: nrIsle of Wight, Results in: col1,col2,col3CRLF Isle of Wight,col1,col2,col3CRLF

link

answered 29 Jan '11, 13:36

ntderosu's gravatar image

ntderosu
1
accept rate: 0%

obviously not as scalable if doing batch processing...but I generally do this 2-3x per week

(29 Jan '11, 13:37) ntderosu

Can do it easily in TextPad on windows too...

(29 Jan '11, 13:38) PatParslow

Just for reference, my quick and dirty, no-error-checking approach to doing this in Ruby would be something like:

 Dir.glob("*/*.csv").each {|f|
     data=File.open(f).readlines
     m=f.match /(.*)\\/(.*)\\.csv/
     council=m[1]
     ofile=m[1] + "/" + m[2] + "_edit.csv"
     g=File.open(ofile,"w")
     lineno=0
     data.each {|r|
      if lineno==0
       g.write("Council, #{r}")
       lineno += 1
      else
       g.write("\\"#{council}\\",#{r}")
      end
     }
     g.close
    }

Which could be done more succinctly, but I think it handles the multi-file thing quite nicely. Name the folders with council files with the name of the council. Obviously it would be better if it took command line arguments and allowed for the insertion in a different column etc etc, but it is a starting place (oh, and it used Unix style file separators, not Windows ones - I should use File.join or some function like that but I can never remember it!)

link

answered 29 Jan '11, 15:07

PatParslow's gravatar image

PatParslow
8614
accept rate: 33%

edited 29 Jan '11, 15:07

In python it's like this:


import csv
fileobj = open('path-to-file')
outobj = open('outfilepath', 'w')
for count,row in enumerate(csv.reader(fileobj)):
     # row is a list [col1, col2, ...]
     if count == 0: # first row is headings
          newval = 'Name of Council'
     else:
          newval = 'nameofcouncil'
     row.append(newval)
     writer.write_row(row)

link

answered 29 Jan '11, 17:14

rgrp's gravatar image

rgrp ♦♦
501122027
accept rate: 14%

In Google Refine, start a new project with your council csv file. Once open, do the following:

  1. Go to the 'undo/redo' tab
  2. Click the 'apply' button
  3. Paste the following code in there (and replace "isle of wight council" with whatever council's name you're dealing with, and "x" with whatever your current first column is called)

    [{ "op":"core/column-addition", "description":"Create column council at index 0 based on column data.gov.uk uri using expression grel:"isle of wight council"", "engineConfig":{ "facets":[], "mode":"record-based" }, "newColumnName":"council", "columnInsertIndex":0, "baseColumnName":"x", "expression":"grel:"isle of wight council"", "onError":"set-to-blank" } ]

  4. Click "perform operations"

  5. Done
link

answered 29 Jan '11, 17:44

Wilbert's gravatar image

Wilbert
463
accept rate: 0%

edited 29 Jan '11, 18:05

This appears to be code for the creation of a new column based on another one (in this case, "data.gov.uk uri". I don't want to base the column on an old one (and don't know what the side effects in Google Refine may be of creating a new column: a) based on another one; b) based on one that doesn't exist (e.g. my spreadsheet doesn't have a "data.gov.uk uri" column?)

Presumably the description field is a comment and to all intents an purposes can be ignored/filled with arbitrary text. But is the baseColumnName argument important? Or can it be ignored/set to '' (or some other null equivalent)?

(29 Jan '11, 18:04) psychemedia ♦♦

You're quite right: hard to do QA in 2 minutes.

Edited the above to warn that you'll have to manually change the first column name. Have wasted several more minutes in finding a way round it, to no avail. Will probably waste many more while I fool myself that I need a solution too ;-)

(29 Jan '11, 18:41) Wilbert

My original draft of the question said: just use Google Refine... then I tried it and couldn't see an obvious way, though wasn't convinced I hadn't missed something obvious!

(29 Jan '11, 19:49) psychemedia ♦♦

Open up the csv file in Jedit or any other text editor with regular expressions.

Select Find/replace, switch 'regular expressions' on

Find: ^

Replace: Isle of Wight Council,

Hit "replace all"

Done

link

answered 29 Jan '11, 22:20

Wilbert's gravatar image

Wilbert
463
accept rate: 0%

edited 29 Jan '11, 22:21

-1

also give Google Refine a look, great tool for data munging in many formats...

link

answered 29 Jan '11, 13:30

dartdog's gravatar image

dartdog
0
accept rate: 0%

Once you have done a transform with Google refine you can re-apply it to any number of files...

(29 Jan '11, 13:52) dartdog

So how do you actually use Google Refine to do this task?

(29 Jan '11, 15:31) psychemedia ♦♦
-1

The Nike Air Classic BW has been popping up Nike Requin en masse it seems and the latest Nike Tn colorway is this one that Nike TN Chaussures makes use of a healthy dose Nike Tn Pas Cher of team orange. The hard to Nike Chaussures deny colorway Tn Pas Cher features a white leather and mesh Nike Tn upper trimmed in black with a hint of grey Tn Requin around its rim while team orange highlights the entire affair. The leather takes these FiveFingers to Vibram 5 Fingers the next level by making them Five Finger Shoes both super durable, Vibram Five Fingers a bit warmer, and a bit classier.? Five Finger Shoes They look great with jeans, too.? 5 Five Shoes While I'm sure you could run or Vibram 5 Fingers hike in the Trek LS, I'm not sure you're going to want to Vibram 5 Fingers as these are just nice and meant more Vibram Shoes for everyday wear though the leather Vibram Five Fingers can assuredly take a beating. Honestly, can totally designer handbags evening show Coach Outlet Store Online your individual style Coach Factory Outlet and originality. If you are an attractive focal point in the game, Coach Outlet Online then you should better choose a Coach Outlet handbag latest collection of exquisite.

link

answered 21 hours ago

ylf0316's gravatar image

ylf0316
-101
accept rate: 0%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×4
×1

Asked: 29 Jan '11, 12:57

Seen: 1,304 times

Last updated: 21 hours ago

powered by OSQA