|
With local councils in the UK being required to release local spending data, and advised to do so in a common way (common columns, CSV format etc) I thought it might be interesting to start trying to aggregate the data in a Google Fusion Table. Uploading new CSV docs to the same table is just a matter of mapping like columns onto like columns. Unfortunately, many of the councils don't include a column for the name of the council (in any single council's case, all the cells in this column would be the same), which means the CSV doc needs preparing before it can be uploaded to Google Fusion tables if we want to be able to identify items by council. Does anyone know a simple way of adding a column to a csv/txt file with the same default cell entry on each row (e.g. the name of the council that created a particular CSV file)? A command line tool would be great (OS/X, Linux), though solutions for Windows platform might also be handy:-) Examples problem cases are:
|
|
sed "s/^/council,/" test.csv where test.csv is your comma separated file, and council is your council name. This sticks at the front of each line. sed "s/$/,council/" test.csv will stick it at the end of each line. If you want it somewhere in between it can be done, but it may be better to do it with awk. This would also allow you to grab the council name from the filename if it is in there, or from another file if necessary. Of course, those rely on having sed or awk. I tested those sed versions on my Win7 box, but I suspect I had installed the unix-like applications from somewhere a long time ago! Quick update to allow for header line: sed "2,$s/^/council,/;1s/^/rbc,/" test.csv the 2,$s says make those changes for all lines after the first one the 1s says make that change only on the first line So the first example just adds: council, to each row? If I wanted to add: Isle of Wight Council, I'd presumably use: sed "s/^/Isle of Wight Council,/" test.csv but how about if I wanted to add: "Isle of WIght Council" (I.e. with the quotes?)
(29 Jan '11, 13:25)
psychemedia ♦♦
sed "2,$s/^/council,/;1s/^/"Isle of wight",/" test.csv The use of the " allows the " to be used ('escapes' the character)
(29 Jan '11, 13:33)
PatParslow
If I had a directory containing multiple .csv files, e.g. in directory ./thisCouncil, or ./thatCouncil, how would I run the same command over all the csv files in one of those directories?
(29 Jan '11, 13:34)
psychemedia ♦♦
A different way :-) Personally, I would use ruby. Could do it in awk (awkwardly) or you could use this sed version in combination with a shell script to control which file is being processed.
(29 Jan '11, 13:37)
PatParslow
1
OK let me try that again, taking account of various ways this site might sanitize the comment :-) sed "2,$s/^/council,/;1s/^/\"Isle of wight\",/" test*.csv
(29 Jan '11, 14:23)
PatParslow
The multiple file version should be doable with something like this as a bash script: for f in *.csv ; do sed "2,$s/^/council,/;1s/^/"Isle of wight",/" "$f" > "${f%}_edited.csv" ; done Can't think, offhand, how to extract council name from parent directory name, though if all your council folders were subdirectories in the same place, a nested loop could do it - can't remember off hand how to get the value of the variable from the directory-iterating loop into the string used to tell sed what to do though (and don't have a unixy box to hand to try it out!)
(29 Jan '11, 14:42)
PatParslow
showing 5 of 6
show all
|
|
maybe more windows friendly...if you have notepad++ installed you can do an extended search for things like line breaks (nr or the opposite, you can view all characters and do a search to figure out which), then just replace the line breaks. You'd still have to do line 1 though. I do this all the time and find it more convenient since I use notead++ as my primary editor...it can also do regex, but I spend more time on the syntax :) Example: when viewing your file with all characters on it will look like this: col1,col2,col3CRLF with extended search on do a find and replace Find: nr Replace with: nrIsle of Wight, Results in: col1,col2,col3CRLF Isle of Wight,col1,col2,col3CRLF obviously not as scalable if doing batch processing...but I generally do this 2-3x per week
(29 Jan '11, 13:37)
ntderosu
Can do it easily in TextPad on windows too...
(29 Jan '11, 13:38)
PatParslow
|
|
Just for reference, my quick and dirty, no-error-checking approach to doing this in Ruby would be something like:
Which could be done more succinctly, but I think it handles the multi-file thing quite nicely. Name the folders with council files with the name of the council. Obviously it would be better if it took command line arguments and allowed for the insertion in a different column etc etc, but it is a starting place (oh, and it used Unix style file separators, not Windows ones - I should use File.join or some function like that but I can never remember it!) |
|
In python it's like this:
|
|
In Google Refine, start a new project with your council csv file. Once open, do the following:
This appears to be code for the creation of a new column based on another one (in this case, "data.gov.uk uri". I don't want to base the column on an old one (and don't know what the side effects in Google Refine may be of creating a new column: a) based on another one; b) based on one that doesn't exist (e.g. my spreadsheet doesn't have a "data.gov.uk uri" column?) Presumably the description field is a comment and to all intents an purposes can be ignored/filled with arbitrary text. But is the baseColumnName argument important? Or can it be ignored/set to '' (or some other null equivalent)?
(29 Jan '11, 18:04)
psychemedia ♦♦
You're quite right: hard to do QA in 2 minutes. Edited the above to warn that you'll have to manually change the first column name. Have wasted several more minutes in finding a way round it, to no avail. Will probably waste many more while I fool myself that I need a solution too ;-)
(29 Jan '11, 18:41)
Wilbert
My original draft of the question said: just use Google Refine... then I tried it and couldn't see an obvious way, though wasn't convinced I hadn't missed something obvious!
(29 Jan '11, 19:49)
psychemedia ♦♦
|
|
Open up the csv file in Jedit or any other text editor with regular expressions. Select Find/replace, switch 'regular expressions' on Find: ^ Replace: Isle of Wight Council, Hit "replace all" Done |
|
also give Google Refine a look, great tool for data munging in many formats... Once you have done a transform with Google refine you can re-apply it to any number of files...
(29 Jan '11, 13:52)
dartdog
So how do you actually use Google Refine to do this task?
(29 Jan '11, 15:31)
psychemedia ♦♦
|
|
The Nike Air Classic BW has been popping up Nike Requin en masse it seems and the latest Nike Tn colorway is this one that Nike TN Chaussures makes use of a healthy dose Nike Tn Pas Cher of team orange. The hard to Nike Chaussures deny colorway Tn Pas Cher features a white leather and mesh Nike Tn upper trimmed in black with a hint of grey Tn Requin around its rim while team orange highlights the entire affair. The leather takes these FiveFingers to Vibram 5 Fingers the next level by making them Five Finger Shoes both super durable, Vibram Five Fingers a bit warmer, and a bit classier.? Five Finger Shoes They look great with jeans, too.? 5 Five Shoes While I'm sure you could run or Vibram 5 Fingers hike in the Trek LS, I'm not sure you're going to want to Vibram 5 Fingers as these are just nice and meant more Vibram Shoes for everyday wear though the leather Vibram Five Fingers can assuredly take a beating. Honestly, can totally designer handbags evening show Coach Outlet Store Online your individual style Coach Factory Outlet and originality. If you are an attractive focal point in the game, Coach Outlet Online then you should better choose a Coach Outlet handbag latest collection of exquisite. |
Get the Data