Overclock.net › Forums › Software, Programming and Coding › Coding and Programming › How do I extract strings from a CSV in Unix?
New Posts  All Forums:Forum Nav:

How do I extract strings from a CSV in Unix? - Page 2

post #11 of 19
the "join" command might work, but it seems to depend on each record appearing in both files. i.e. if you only have chuck norris in one file, it work work.

Code:
echo -e "1\A\
2\B\
3\C\
4\D\
" > one
echo -e "1\P\
2\Q\
3\R\
4\S\
" > two
join -j1 one two
1 A P
2 B Q
3 C R
4 D S

but as soon as i take out the "4" record from either one it complains about order, however they _are sorted. If i add --nocheck-order, it runs but does an "inner" join, omitting the 4 A record in the other file. You probably are looking for an "outer" join where it makes up a null value if a field is not present. But using -e ? to sub in a ? for not present doesn't work either...

So, I would use awk for this, but not sure how to read two input files at the same time off of the top of my head. you'll probably have to resort to a weird < pipe and getline.. I need to go to bed...

EDIT
it looks like you can do an outer join on both ends if you add
--nocheck-order -a 1 -a 2

Code:
echo -e "1\A\
2\B\
3\C" > one
echo -e "1\P\
3\R\
4\S" > two
join -j1 -a1 -a2 --nocheck-order one two
that's still not quite right as it doesn't put the missing column in the right spot. bad data..

ok have to specify output format, and what fields to use the empty place holder for.. interesting. -v option is interesting too...
Code:
join -j1 -a1 -a2 -o0 1.2 2.2 -e "?" --nocheck-order one two
http://www.albany.edu/~ig4895/join.htm

BED immediately
Edited by lloyd mcclendon - 2/17/11 at 11:09pm
stable again
(25 items)
 
  
CPUCPUMotherboardGraphics
E5-2687W E5-2687W ASUS Z9PED8-WS EVGA GTX 570 (Linux host) 
GraphicsRAMHard DriveHard Drive
EVGA GTX 970 FTW (win7 guest) 64GB G.SKILL 2133 2x Crucial M4 256GB raid1 4x 3TB raid 10 
CoolingCoolingCoolingCooling
2x Apogee HD  2x RX 480 2x MCP 655 RP-452x2 rev2 (new) 
CoolingCoolingOSOS
16x Cougar Turbine CFT12SB4 (new) EK FC 580 Gentoo (host) Gentoo (x23 guests) 
OSMonitorMonitorPower
windows 7 (guest w/ vfio-pci) Viewsonic 23" 1080P Viewsonic 19" Antec HCP Platinum 1000 (new) 
CaseOtherOther
Case Labs TH10 (still the best ever) 2x Lamptron FC-5 IOGEAR 2 way DVI KVM Switch 
  hide details  
Reply
stable again
(25 items)
 
  
CPUCPUMotherboardGraphics
E5-2687W E5-2687W ASUS Z9PED8-WS EVGA GTX 570 (Linux host) 
GraphicsRAMHard DriveHard Drive
EVGA GTX 970 FTW (win7 guest) 64GB G.SKILL 2133 2x Crucial M4 256GB raid1 4x 3TB raid 10 
CoolingCoolingCoolingCooling
2x Apogee HD  2x RX 480 2x MCP 655 RP-452x2 rev2 (new) 
CoolingCoolingOSOS
16x Cougar Turbine CFT12SB4 (new) EK FC 580 Gentoo (host) Gentoo (x23 guests) 
OSMonitorMonitorPower
windows 7 (guest w/ vfio-pci) Viewsonic 23" 1080P Viewsonic 19" Antec HCP Platinum 1000 (new) 
CaseOtherOther
Case Labs TH10 (still the best ever) 2x Lamptron FC-5 IOGEAR 2 way DVI KVM Switch 
  hide details  
Reply
post #12 of 19
Thread Starter 
I can get the join command to work but it will only output one line. I would like it to go through every line.
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
post #13 of 19
it will .. check the arguments and your input

looks like it doesn't support , as a field separator however. so you'll have to resort to some dirty string replace of " " with "@#$@#$@#$" and "," with " ", join, and then " " with "," and finally "@#$@#$#@" with " " .. i hate that. i can't believe they dont support changing the field separator

i would probably use awk...

the line issue you are encountering is due to mis match key because " " is the field separator.

"jackie"
"chuck"
stable again
(25 items)
 
  
CPUCPUMotherboardGraphics
E5-2687W E5-2687W ASUS Z9PED8-WS EVGA GTX 570 (Linux host) 
GraphicsRAMHard DriveHard Drive
EVGA GTX 970 FTW (win7 guest) 64GB G.SKILL 2133 2x Crucial M4 256GB raid1 4x 3TB raid 10 
CoolingCoolingCoolingCooling
2x Apogee HD  2x RX 480 2x MCP 655 RP-452x2 rev2 (new) 
CoolingCoolingOSOS
16x Cougar Turbine CFT12SB4 (new) EK FC 580 Gentoo (host) Gentoo (x23 guests) 
OSMonitorMonitorPower
windows 7 (guest w/ vfio-pci) Viewsonic 23" 1080P Viewsonic 19" Antec HCP Platinum 1000 (new) 
CaseOtherOther
Case Labs TH10 (still the best ever) 2x Lamptron FC-5 IOGEAR 2 way DVI KVM Switch 
  hide details  
Reply
stable again
(25 items)
 
  
CPUCPUMotherboardGraphics
E5-2687W E5-2687W ASUS Z9PED8-WS EVGA GTX 570 (Linux host) 
GraphicsRAMHard DriveHard Drive
EVGA GTX 970 FTW (win7 guest) 64GB G.SKILL 2133 2x Crucial M4 256GB raid1 4x 3TB raid 10 
CoolingCoolingCoolingCooling
2x Apogee HD  2x RX 480 2x MCP 655 RP-452x2 rev2 (new) 
CoolingCoolingOSOS
16x Cougar Turbine CFT12SB4 (new) EK FC 580 Gentoo (host) Gentoo (x23 guests) 
OSMonitorMonitorPower
windows 7 (guest w/ vfio-pci) Viewsonic 23" 1080P Viewsonic 19" Antec HCP Platinum 1000 (new) 
CaseOtherOther
Case Labs TH10 (still the best ever) 2x Lamptron FC-5 IOGEAR 2 way DVI KVM Switch 
  hide details  
Reply
post #14 of 19
Thread Starter 
Say I have
Code:
#names.csv
Jackie Chan,1954,O+
Chuck Norris,1930,A-
Bruce Lee,1940,O+
Where the 1st column is the name, 2nd is DOB, and 3rd is blood type.

And I had a second file
Code:
#o_pos_ssn.csv
Jackie Chan,O+,123-45-6789
Bruce Lee,O+,095-34-1647
Jet Li,O+,067-127-5791
Where the 1st column is name, 2nd is blood type, and 3rd is SSN.

If I use this:
Code:
join -t, -13 -22 -o1.1,1.2,2.2,2.3 names.csv o_pos_ssn.csv > result.csv
The output will be:
Code:
#result.csv
Jackie Chan,1954,O+,123-45-6789
I want it to output ALL lines that fit the requirements and get the code below:
Code:
#result.csv
Jackie Chan,1954,O+,123-45-6789
Bruce Lee,1940,O+,095-34-1647
How do I make the join command go through all lines?
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
post #15 of 19
space is the field separator

2.3 does not even exist.

2.2 does....
stable again
(25 items)
 
  
CPUCPUMotherboardGraphics
E5-2687W E5-2687W ASUS Z9PED8-WS EVGA GTX 570 (Linux host) 
GraphicsRAMHard DriveHard Drive
EVGA GTX 970 FTW (win7 guest) 64GB G.SKILL 2133 2x Crucial M4 256GB raid1 4x 3TB raid 10 
CoolingCoolingCoolingCooling
2x Apogee HD  2x RX 480 2x MCP 655 RP-452x2 rev2 (new) 
CoolingCoolingOSOS
16x Cougar Turbine CFT12SB4 (new) EK FC 580 Gentoo (host) Gentoo (x23 guests) 
OSMonitorMonitorPower
windows 7 (guest w/ vfio-pci) Viewsonic 23" 1080P Viewsonic 19" Antec HCP Platinum 1000 (new) 
CaseOtherOther
Case Labs TH10 (still the best ever) 2x Lamptron FC-5 IOGEAR 2 way DVI KVM Switch 
  hide details  
Reply
stable again
(25 items)
 
  
CPUCPUMotherboardGraphics
E5-2687W E5-2687W ASUS Z9PED8-WS EVGA GTX 570 (Linux host) 
GraphicsRAMHard DriveHard Drive
EVGA GTX 970 FTW (win7 guest) 64GB G.SKILL 2133 2x Crucial M4 256GB raid1 4x 3TB raid 10 
CoolingCoolingCoolingCooling
2x Apogee HD  2x RX 480 2x MCP 655 RP-452x2 rev2 (new) 
CoolingCoolingOSOS
16x Cougar Turbine CFT12SB4 (new) EK FC 580 Gentoo (host) Gentoo (x23 guests) 
OSMonitorMonitorPower
windows 7 (guest w/ vfio-pci) Viewsonic 23" 1080P Viewsonic 19" Antec HCP Platinum 1000 (new) 
CaseOtherOther
Case Labs TH10 (still the best ever) 2x Lamptron FC-5 IOGEAR 2 way DVI KVM Switch 
  hide details  
Reply
post #16 of 19
Thread Starter 
Quote:
Originally Posted by lloyd mcclendon View Post
space is the field separator

2.3 does not even exist.

2.2 does....
2.3 does exist. It is the ssn field.
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
post #17 of 19
Thread Starter 
join -t, -11 -21 -o1.1,1.2,2.2,2.3 <(sort names.csv) <(sort o_pos_ssn.csv)>results.csv

this worked for my specific example but wouldn't work if chuck norris was in the 2nd file. I got what I wanted though. Just used this code and modified it a bit for my actual files. Thanks to everyone who helped.

/thread
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
post #18 of 19
Quote:
Originally Posted by K10 View Post
2.3 does exist. It is the ssn field.
NO, space is the field separator... join only uses space and it's not looking at your commas......

Code:
Jackie Chan,O+,123-45-6789
|----| |-------------------|
2.1           2.2

Bruce Lee,O+,095-34-1647
|---| |-------------------|
2.1           2.2

Jet Li,O+,067-127-5791
|-| |-----------------|
2.1           2.2
glad you at least got something working. For something quick and dirty i would lean towards awk. To me it looks like you need a sql database.
stable again
(25 items)
 
  
CPUCPUMotherboardGraphics
E5-2687W E5-2687W ASUS Z9PED8-WS EVGA GTX 570 (Linux host) 
GraphicsRAMHard DriveHard Drive
EVGA GTX 970 FTW (win7 guest) 64GB G.SKILL 2133 2x Crucial M4 256GB raid1 4x 3TB raid 10 
CoolingCoolingCoolingCooling
2x Apogee HD  2x RX 480 2x MCP 655 RP-452x2 rev2 (new) 
CoolingCoolingOSOS
16x Cougar Turbine CFT12SB4 (new) EK FC 580 Gentoo (host) Gentoo (x23 guests) 
OSMonitorMonitorPower
windows 7 (guest w/ vfio-pci) Viewsonic 23" 1080P Viewsonic 19" Antec HCP Platinum 1000 (new) 
CaseOtherOther
Case Labs TH10 (still the best ever) 2x Lamptron FC-5 IOGEAR 2 way DVI KVM Switch 
  hide details  
Reply
stable again
(25 items)
 
  
CPUCPUMotherboardGraphics
E5-2687W E5-2687W ASUS Z9PED8-WS EVGA GTX 570 (Linux host) 
GraphicsRAMHard DriveHard Drive
EVGA GTX 970 FTW (win7 guest) 64GB G.SKILL 2133 2x Crucial M4 256GB raid1 4x 3TB raid 10 
CoolingCoolingCoolingCooling
2x Apogee HD  2x RX 480 2x MCP 655 RP-452x2 rev2 (new) 
CoolingCoolingOSOS
16x Cougar Turbine CFT12SB4 (new) EK FC 580 Gentoo (host) Gentoo (x23 guests) 
OSMonitorMonitorPower
windows 7 (guest w/ vfio-pci) Viewsonic 23" 1080P Viewsonic 19" Antec HCP Platinum 1000 (new) 
CaseOtherOther
Case Labs TH10 (still the best ever) 2x Lamptron FC-5 IOGEAR 2 way DVI KVM Switch 
  hide details  
Reply
post #19 of 19
Thread Starter 
Quote:
Originally Posted by lloyd mcclendon View Post
NO, space is the field separator... join only uses space and it's not looking at your commas......
http://www.computerhope.com/unix/ujoin.htm

I used
Code:
join -t,
so mine was separated by commas.
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
Skylake
(10 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core-i7 6700K ASUS ROG Maximus VIII Hero EVGA GTX 970 4GB G.SKILL Ripjaws 4 32GB DDR4 2800 
Hard DriveHard DriveCoolingOS
SAMSUNG SM951 M.2 SAMSUNG 850 EVO Corsair H100i Windows 10 
PowerCase
SeaSonic G-750 Fractal Design Define R4 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Coding and Programming
Overclock.net › Forums › Software, Programming and Coding › Coding and Programming › How do I extract strings from a CSV in Unix?