Overclock.net › Forums › Software, Programming and Coding › Coding and Programming › How do I extract strings from a CSV in Unix?
New Posts  All Forums:Forum Nav:

How do I extract strings from a CSV in Unix?

post #1 of 19
Thread Starter 
Okay, so I have a CSV. Let's say it has the following entries in it:

Jackie Chan,1954,M
Chuck Norris,1930,M
Bruce Lee,1940,M


How would I, for example, extract the gender out of a certain person, maybe based on the year of birth?

I'm new to unix but I want to use "awk -F," but I don't know what to do next.
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
post #2 of 19
Quote:
Originally Posted by K10 View Post
Okay, so I have a CSV. Let's say it has the following entries in it:

Jackie Chan,1954,M
Chuck Norris,1930,M
Bruce Lee,1940,M


How would I, for example, extract the gender out of a certain person, maybe based on the year of birth?

I'm new to unix but I want to use "awk -F," but I don't know what to do next.
awk is probably not the best way to do this in my opinion.

Try a command like grep <file 1954 | sed -e "s/.*1954,\\(**\\)/\\1/g" to get the genders of people from 1954

edit: if you're really set on awk I can help you that too...
post #3 of 19
Quote:
Originally Posted by K10 View Post
Okay, so I have a CSV. Let's say it has the following entries in it:

Jackie Chan,1954,M
Chuck Norris,1930,M
Bruce Lee,1940,M


How would I, for example, extract the gender out of a certain person, maybe based on the year of birth?

I'm new to unix but I want to use "awk -F," but I don't know what to do next.
You were on the right track with awk:

awk -F "," '{ print $1 $2 $3 }' /path/to/file.csv

Where the "," is your delimiter (comma, dur) and $1 is the name, $2 is DoB, $3 is the gender.

If you just want the gender, it's simple enough: awk -F "," '{ print $3 }' /path/to/file.csv

The rest is obvious.

[edit]

And here are some good examples for awk:

http://www.ibm.com/developerworks/library/l-awk1.html
Edited by Warhaven - 2/16/11 at 12:14pm
post #4 of 19
Thread Starter 
Quote:
Originally Posted by B-80 View Post
awk is probably not the best way to do this in my opinion.

Try a command like grep <file 1954 | sed -e "s/.*1954,\\(**\\)/\\1/g" to get the genders of people from 1954

edit: if you're really set on awk I can help you that too...
Quote:
Originally Posted by Warhaven View Post
You were on the right track with awk:

awk -F "," '{ print $1 $2 $3 }' /path/to/file.csv

Where the "," is your delimiter (comma, dur) and $1 is the name, $2 is DoB, $3 is the gender.

If you just want the gender, it's simple enough: awk -F "," '{ print $3 }' /path/to/file.csv

The rest is obvious.

[edit]

And here are some good examples for awk:

http://www.ibm.com/developerworks/library/l-awk1.html
Thanks both of you,

I did it like this -

Code:
awk -F, '{ if ($2 == 1940) print $3}' "$1" > gender.txt
Note: This is in a script
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
post #5 of 19
Code:
grep 1940 file | cut -f 3 -d ,
Underground
(14 items)
 
  
CPUMotherboardGraphicsRAM
Core i7 920 C0 ASUS P6T6 WS Revolution GTX 460 TR3X6G1600C8D 
Hard DriveOptical DriveCoolingOS
WD1001FALS SAMSUNG SH-S223F 22X DVD MULTI Corsair H50 Fedora 16 KDE x86_64 
MonitorKeyboardPowerCase
HP w19b Microsoft Comfort Curve Corsair CX600 Thermaltake Armor VA8003BWS 
MouseMouse Pad
Razer DeathAdder Black 
  hide details  
Reply
Underground
(14 items)
 
  
CPUMotherboardGraphicsRAM
Core i7 920 C0 ASUS P6T6 WS Revolution GTX 460 TR3X6G1600C8D 
Hard DriveOptical DriveCoolingOS
WD1001FALS SAMSUNG SH-S223F 22X DVD MULTI Corsair H50 Fedora 16 KDE x86_64 
MonitorKeyboardPowerCase
HP w19b Microsoft Comfort Curve Corsair CX600 Thermaltake Armor VA8003BWS 
MouseMouse Pad
Razer DeathAdder Black 
  hide details  
Reply
post #6 of 19
Thread Starter 
Quote:
Originally Posted by error10 View Post
Code:
grep 1940 file | cut -f 3 -d ,
can you explain to me what that does?

I'm very new to Unix and I don't understand most commands
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
post #7 of 19
Quote:
Originally Posted by K10 View Post
can you explain to me what that does?

I'm very new to Unix and I don't understand most commands
OK.

First command to use is "man". If you type man followed by the name of a command, it will display info about the command and the options that it uses.

Second is "grep". This searches a file for lines matching a particular pattern, in this case 1940, and outputs them.

Third is "|". This is the pipe. It pipes data from one command to the next.

Fourth is "cut". Cut removes columns from tabular data and returns those columns. In this case, we want the third column, and columns are separated with a comma.
Underground
(14 items)
 
  
CPUMotherboardGraphicsRAM
Core i7 920 C0 ASUS P6T6 WS Revolution GTX 460 TR3X6G1600C8D 
Hard DriveOptical DriveCoolingOS
WD1001FALS SAMSUNG SH-S223F 22X DVD MULTI Corsair H50 Fedora 16 KDE x86_64 
MonitorKeyboardPowerCase
HP w19b Microsoft Comfort Curve Corsair CX600 Thermaltake Armor VA8003BWS 
MouseMouse Pad
Razer DeathAdder Black 
  hide details  
Reply
Underground
(14 items)
 
  
CPUMotherboardGraphicsRAM
Core i7 920 C0 ASUS P6T6 WS Revolution GTX 460 TR3X6G1600C8D 
Hard DriveOptical DriveCoolingOS
WD1001FALS SAMSUNG SH-S223F 22X DVD MULTI Corsair H50 Fedora 16 KDE x86_64 
MonitorKeyboardPowerCase
HP w19b Microsoft Comfort Curve Corsair CX600 Thermaltake Armor VA8003BWS 
MouseMouse Pad
Razer DeathAdder Black 
  hide details  
Reply
post #8 of 19
Thread Starter 
Quote:
Originally Posted by error10 View Post
OK.

First command to use is "man". If you type man followed by the name of a command, it will display info about the command and the options that it uses.

Second is "grep". This searches a file for lines matching a particular pattern, in this case 1940, and outputs them.

Third is "|". This is the pipe. It pipes data from one command to the next.

Fourth is "cut". Cut removes columns from tabular data and returns those columns. In this case, we want the third column, and columns are separated with a comma.
What if, in some other example, I wanted multiple columns? I know how to do it with awk but how would I do it using this method?
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
post #9 of 19
Quote:
Originally Posted by K10 View Post
What if, in some other example, I wanted multiple columns? I know how to do it with awk but how would I do it using this method?
as error 10 said, you should learn to use the man pages, they are invaluable resources and will always answer question on usage.

but from the man pages

Quote:
NAME
cut - remove sections from each line of files

SYNOPSIS
cut OPTION... [FILE]...

DESCRIPTION
Print selected parts of lines from each FILE to standard output.

Mandatory arguments to long options are mandatory for short options too.

-b, --bytes=LIST
select only these bytes

-c, --characters=LIST
select only these characters

-d, --delimiter=DELIM
use DELIM instead of TAB for field delimiter

-f, --fields=LIST
select only these fields; also print any line that contains no delimiter
character, unless the -s option is specified

type man cut into your terminal to bring this up, or man grep for the page on grep, ect...

to get column 1 and 3 the command would be cut -f 1,3 -d ,
post #10 of 19
Thread Starter 
Thanks, +rep to those who helped.

I have one last question though.

How would I use what we discussed in this thread to compare strings from two files and output other fields(in the same line) from both files to one other?

Say I have
Code:
#names.csv
Jackie Chan,1954,O+
Chuck Norris,1930,A-
Bruce Lee,1940,O+
Where the 1st column is the name, 2nd is DOB, and 3rd is blood type.

And I had a second file
Code:
#o_pos_ssn.csv
Jackie Chan,O+,123-45-6789
Bruce Lee,O+,095-34-1647
Jet Li,O+,067-127-5791
Where the 1st column is name, 2nd is blood type, and 3rd is SSN.

How could I compare all entries of the two files based on blood type and output something like:
Code:
#result.csv
Jackie Chan,1954,O+,123-45-6789
Bruce Lee,1940,O+,095-34-1647
(in other words, if columns 1 and 3 in the first file are equal to columns 1 and 2, respectively, output all columns from both files on a single line).
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Coding and Programming
Overclock.net › Forums › Software, Programming and Coding › Coding and Programming › How do I extract strings from a CSV in Unix?