|
![]() |
Overclock.net - Overclocking.net > Software, Programming and Coding > Coding and Programming | |
[Python] Noob code sanity check
|
||
![]() |
|
|
LinkBack | Thread Tools |
|
|
#1 (permalink) | ||||||||||||||
|
PC Gamer
![]() |
I'm writing a program in Python for work, that takes in a CSV file, reformats it (loses some columns, and re-orders the rest), and then writes that to a new variable/object (this will be written to a new CSV file, once I've coded that part). The code is below, I'm fairly confident that I've not gone about it in the most efficient way - actually, I'm not sure it will even work, but it's my first ever Python program!
Anyway, your thoughts? Code:
import csv, sys
def arranger(a):
b = []
b.append(a[15])
b.append(a[1])
b.append(a[0])
b.append(a[1])
b.append(a[3])
b.append(a[4])
b.append(a[10])
b.append(a[9])
b.append(a[7])
b.append(a[5])
b.append(a[0])
b.append(a[0])
b.append('P')
b.append(a[6])
b.append(a[14])
return b
filename = sys.argv
readerobj = csv.reader(open(filename))
for row in readerobj:
templist = list(row)
print templist
writelist = arranger(templist)
print writelist
print
readerobj.next()
__________________
Quote:
|
||||||||||||||
|
|
|
|
|
#2 (permalink) |
|
New to Overclock.net
|
Ideally I would want to do this with a list comprehension, but the 'P' in the list of numbers precludes that.
This is not beautiful, but it's a little shorter (probably doesn't perform a whole lot better, if at all). Code:
# "constant" for your field indices
ORDERX = [15, 1, 0, 1, 3, 4, 10, 9, 7,
5, 0, 0, 'P', 6, 14]
# test array
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
11, 12, 13, 14, 15, 16]
b = []
for num in ORDERX:
try:
b.append(a[num])
except TypeError:
b.append(num)
Hope this helps some. |
|
|
|
|
|
#3 (permalink) | ||||||||||||||
|
PC Gamer
![]() |
I did test it, and it threw an exception around the CSV reader object creation, as sys.argv was returning a list rather than a string. So I converted this before passing it, and it threw a different exception saying it couldn't find the file, so I gave up for the day. I think it's a path issue, I'll have a go tomorrow.
Infixum - thanks for the code mate, rep+ and I'll try to work it through in my head and replace my code where appropriate. This is very much a learning exercise for me - it's my very first Python program outside of the silly 5 line code practice you do to learn new features from books! I'll post the latest revised code tomorrow (after lunch probably, as that's when I get the coding done) It's just occurred to me that I haven't got a single comment in my code - that's a real bad habit to get into, gotta be more disciplined than that!
__________________
Quote:
|
||||||||||||||
|
|
|
|
|
#4 (permalink) | |||||||||||||||
|
PC Gamer
![]() |
Ok, I've got it to do something now! Although it's doing completely the wrong thing, at least it doesn't fall at the first hurdle! Latest code:
Code:
import csv, sys
def arranger(a):
#b = []
#b.append(a[15])
#b.append(a[1])
#b.append(a[0])
#b.append(a[1])
#b.append(a[3])
#b.append(a[4])
#b.append(a[10])
#b.append(a[9])
#b.append(a[7])
#b.append(a[5])
#b.append(a[0])
#b.append(a[0])
#b.append('P')
#b.append(a[6])
#b.append(a[14])
#return b
# "constant" for your field indices
ORDERX = [15, 1, 0, 1, 3, 4, 10, 9, 7, 5, 0, 0, 'P', 6, 14]
# test array
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
b = []
for num in ORDERX:
try:
b.append(a[num])
except TypeError:
b.append(num)
return b
filename = open(sys.argv[0])
readerobj = csv.reader(filename)
for row in readerobj:
templist = list(row)
print templist
writelist = arranger(templist)
print writelist
print
readerobj.next()
Quote:
Scratch that, I've just realised why - the sys.argv is returning the script name, so it's iterating through that. Although I don't know why the function just returns the values of ORDERX? Anyway, my issue is still the same - sys.argv can't find my CSV file, even though it's in the same folder! If I don't use the str() function, it throws a TypeError complaining that it's a list, not a string, and if I do use it, it throws an IOError saying that it can't find it - in this example, it says ['test.csv'] That's it for today, back to work
__________________
Quote:
|
|||||||||||||||
|
|
|
|
|
#5 (permalink) |
|
New to Overclock.net
|
OK,
I understand a little better what has gotten you stuck. What I gave you before trimmed down the code, but didn't solve your problem. Let's try this again: This is a mock up file I've made called testcsv.csv for purposes of testing the script: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 Here is the modified code (my code, you may want to do some things differently. Although I've done a lot of parsing of csv files, I haven't used the csv module a lot, because I'm often working with python 2.2, which doesn't have it): Code:
import csv
import sys
import pprint
# "constant" for your field indices
ORDERX = [15, 1, 0, 1, 3, 4, 10, 9, 7, 5, 0, 0, 'P', 6, 14]
def arranger(linex):
b = []
for num in ORDERX:
try:
b.append(linex[num])
except TypeError:
b.append(num)
return b
# sys.argv[1] is your filename
filex = open(sys.argv[1], 'r')
readerobj = csv.reader(filex)
# get records as lists
newlist =[list(rowx) for rowx in readerobj]
# close file object
filex.close()
# get the fields you actually want
newlist = [arranger(rowy) for rowy in newlist]
# have a look to see if it's what you want
pprint.pprint(newlist)
# if it is, now you're ready to write the values to another file
Code:
Z:\>python Z:\csvtest\testcsv.py Z:\csvtest\testcsv.csv Here are some links that may help: Official python.org csv documentation: http://docs.python.org/library/csv.html Software carpentry homepage: If you're going to be doing this sort of thing more often, this website may help you out considerably. YMMV (your mileage may vary.) http://software-carpentry.org/ Good luck in your data crunching and Python adventures! |
|
|
|
|
|
#6 (permalink) |
|
New to Overclock.net
|
One last thing - someone recently described my code to me as "overwrought", and he was right.
Turns out you don't have to convert each record from the csv file reader object to a list - this eliminates a few lines of code: Code:
import csv
import sys
import pprint
# "constant" for your field indices
ORDERX = [15, 1, 0, 1, 3, 4, 10, 9, 7, 5, 0, 0, 'P', 6, 14]
def arranger(linex):
b = []
for num in ORDERX:
try:
b.append(linex[num])
except TypeError:
b.append(num)
return b
# sys.argv[1] is your filename
filex = open(sys.argv[1], 'r')
readerobj = csv.reader(filex)
# get records as lists and get the fields you actually want
newlist = [arranger(rowx) for rowx in readerobj]
# close file object
filex.close()
# use pretty print to see if what you've got is what you want
pprint.pprint(newlist)
# if it is, now you're ready to write the values to another file
|
|
|
|
|
|
#7 (permalink) | ||||||||||||||
|
PC Gamer
![]() |
Partial win!!!
I've added the 'r' flag into the sys.argv statement (is that the right word? OO is still new to me!), and now it will successfully import the CSV file! The function doesn't work as I'm still using the old code which just returns ORDERX - I'll work on this later, as I should be working right now
__________________
Quote:
|
||||||||||||||
|
|
|
|
|
#8 (permalink) | ||||||||||||||
|
PC Gamer
![]() |
Ok, I've got the code working now!! I still need to build in error handling, the new csv writer, and tidy up some bits (like the function!). It turns out the spec was wrong, and that 'P' assignment is no longer required. Anyway, the current code is below:
Code:
import csv, sys
def arranger(a):
b = []
b.append(a[15])
b.append(a[1])
b.append(a[0])
b.append(a[1])
b.append(a[3])
b.append(a[4])
b.append(a[10])
b.append(a[9])
b.append(a[7])
b.append(a[5])
b.append(a[0])
b.append(a[0])
b.append(a[6])
b.append(a[14])
return b
filename = open(sys.argv[1],'r')
readerobj = csv.reader(filename)
for row in readerobj:
templist = list(row)
print templist
writelist = arranger(templist)
print writelist
print
try:
readerobj.next()
except:
sys.exit()
__________________
Quote:
|
||||||||||||||
|
|
|
|
|
#9 (permalink) |
|
New to Overclock.net
|
chemicalfan,
Glad things are coming along. Way to stick to it. Since you've taken out the 'P' of the list, things are much simpler. I've added a nested list comprehension - some people are OK with this for readability; some would prefer a nested loop. If you've got good input (the csv file is consistent with no dropped digits or missing commas), my code below will work. If you have problems with getting clean data, you will need to put try/except blocks in. A couple important chunks of advice: 1) The arranger function is a write once/run once function. It's hard coded for the fields you need now. If you're doing this sort of parsing now, chances are you'll have to do it a million times as requirements change. Hard coding indices like that will make a lot of work for you. That's how I learned the trick of having a constant like ORDERX at the top - the hard way! 2) except without a named exception (StopIterationError?), will eventually cause you problems. If the script crashes for another reason other than the one you intended to catch, you may be surprised. Again, learned through hard experience, take it FWIW. Good luck on this project. I think you're well on your way. Carl T. Code:
import csv
import sys
import pprint
# "constant" for your field indices
ORDERX = [15, 1, 0, 1, 3, 4, 10, 9, 7, 5, 0, 0, 6, 14]
# sys.argv[1] is your filename
filex = open(sys.argv[1], 'r')
readerobj = csv.reader(filex)
# get fields you actually want
finallist = [[rowx[num] for num in ORDERX]
for rowx in readerobj]
# close file object
filex.close()
# use pretty print to see if what you've got is what you want
pprint.pprint(finallist)
# if it is, now you're ready to write the values to another file
Last edited by infixum : 10-02-09 at 03:25 PM Reason: left out import statements at top of code; found them at the top of the post and removed them |
|
|
|
|
|
#10 (permalink) |
|
New to Overclock.net
|
One other thing - your mentioned the argv statement. sys.argv are just the words or items you typed in on the command line (argv = argument vector).
The 'r' applies to the open function that opens the csv file for reading; "read" is what the 'r' stands for. You don't actually want to change the original csv file, just use the information in it. |
|
|
|
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|